Editorial Introduction to the ICCV 2021 Special Section Learning to Answer Visual Questions From Web Videos Ordinal Unsupervised Domain Adaptation With Recursively Conditional Gaussian Imposed Variational Disentanglement OpenGAN: Open-Set Recognition via Open Data Generation Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation of Indoor Scenes MCTS With Refinement for Proposals Selection Games in Scene Understanding Revisiting Viewing Graph Solvability: An Effective Approach Based on Cycle Consistency Towards JPEG-Resistant Image Forgery Detection and Localization Via Self-Supervised Domain Adaptation Pixel-Perfect Structure-From-Motion With Featuremetric Refinement Baking Neural Radiance Fields for Real-Time View Synthesis Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning Aligning, Autoencoding and Prompting Large Language Models for Novel Disease Reporting VD-NeRF: Visibility-Aware Decoupled Neural Radiance Fields for View-Consistent Editing and High-Frequency Relighting Understand Layout and Translate Text: Unified Feature-Conductive End-to-End Document Image Translation Enhanced Multi-Scale Cross-Attention for Person Image Generation Predicting and Enhancing the Fairness of DNNs With the Curvature of Perceptual Manifolds SinDiffusion: Learning a Diffusion Model From a Single Natural Image Uni-MoE: Scaling Unified Multimodal LLMs With Mixture of Experts Unconstrained Fuzzy C-Means Algorithm Learning High-Quality Dynamic Memory for Video Object Segmentation Towards Robust Probabilistic Modeling on SO(3) via Rotation Laplace Distribution Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Progressive 4D Grouping BossNAS Family: Block-Wisely Self-Supervised Neural Architecture Search RelationLMM: Large Multimodal Model as Open and Versatile Visual Relationship Generalist W-DOE: Wasserstein Distribution-Agnostic Outlier Exposure CCDPlus: Towards Accurate Character to Character Distillation for Text Recognition T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation Quantity-Quality Enhanced Self-Training Network for Weakly Supervised Point Cloud Semantic Segmentation Referring Camouflaged Object Detection Generalized Time Warping Invariant Dictionary Learning for Time Series Classification and Clustering Semi-Supervised Counting via Pixel-by-Pixel Density Distribution Modeling Structural and Statistical Texture Knowledge Distillation and Learning for Segmentation BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-Task Dense Predictions MulFS-CAP: Multimodal Fusion-Supervised Cross-Modality Alignment Perception for Unregistered Infrared-Visible Image Fusion Implicit Shape and Appearance Priors for Few-Shot Full Head Reconstruction Data-Driven Feature Tracking for Event Cameras With and Without Frames Temporally-Consistent Surface Reconstruction Using Metrically-Consistent Atlases VMarker-Pro: Probabilistic 3D Human Mesh Estimation From Virtual Markers Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving MS-NeRF: Multi-Space Neural Radiance Fields Fair Representation Learning for Continuous Sensitive Attributes Using Expectation of Integral Probability Metrics Concept Neural Network Based on Time-Delay Regret for Dynamic Stream Learning Dual-Level Matching With Outlier Filtering for Unsupervised Visible-Infrared Person Re-Identification A Causality-Aware Paradigm for Evaluating Creativity of Multimodal Large Language Models Generalized Conditional Similarity Learning via Semantic Matching Transferable Unintentional Action Localization With Language-Guided Intention Translation Benchmarking and Improving Bird’s Eye View Perception Robustness in Autonomous Driving Revisiting Flatness-Aware Optimization in Continual Learning With Orthogonal Gradient Projection A Decentralized Framework for Kernel PCA With Projection Consensus Constraints Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields