Editorial Introduction to the ICCV 2021 Special Section
Learning to Answer Visual Questions From Web Videos
Ordinal Unsupervised Domain Adaptation With Recursively Conditional Gaussian Imposed Variational Disentanglement
OpenGAN: Open-Set Recognition via Open Data Generation
Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation of Indoor Scenes
MCTS With Refinement for Proposals Selection Games in Scene Understanding
Revisiting Viewing Graph Solvability: An Effective Approach Based on Cycle Consistency
Towards JPEG-Resistant Image Forgery Detection and Localization Via Self-Supervised Domain Adaptation
Pixel-Perfect Structure-From-Motion With Featuremetric Refinement
Baking Neural Radiance Fields for Real-Time View Synthesis
Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning
Aligning, Autoencoding and Prompting Large Language Models for Novel Disease Reporting
VD-NeRF: Visibility-Aware Decoupled Neural Radiance Fields for View-Consistent Editing and High-Frequency Relighting
Understand Layout and Translate Text: Unified Feature-Conductive End-to-End Document Image Translation
Enhanced Multi-Scale Cross-Attention for Person Image Generation
Predicting and Enhancing the Fairness of DNNs With the Curvature of Perceptual Manifolds
SinDiffusion: Learning a Diffusion Model From a Single Natural Image
Uni-MoE: Scaling Unified Multimodal LLMs With Mixture of Experts
Unconstrained Fuzzy C-Means Algorithm
Learning High-Quality Dynamic Memory for Video Object Segmentation
Towards Robust Probabilistic Modeling on SO(3) via Rotation Laplace Distribution
Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Progressive 4D Grouping
BossNAS Family: Block-Wisely Self-Supervised Neural Architecture Search
RelationLMM: Large Multimodal Model as Open and Versatile Visual Relationship Generalist
W-DOE: Wasserstein Distribution-Agnostic Outlier Exposure
CCDPlus: Towards Accurate Character to Character Distillation for Text Recognition
T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation
Quantity-Quality Enhanced Self-Training Network for Weakly Supervised Point Cloud Semantic Segmentation
Referring Camouflaged Object Detection
Generalized Time Warping Invariant Dictionary Learning for Time Series Classification and Clustering
Semi-Supervised Counting via Pixel-by-Pixel Density Distribution Modeling
Structural and Statistical Texture Knowledge Distillation and Learning for Segmentation
BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-Task Dense Predictions
MulFS-CAP: Multimodal Fusion-Supervised Cross-Modality Alignment Perception for Unregistered Infrared-Visible Image Fusion
Implicit Shape and Appearance Priors for Few-Shot Full Head Reconstruction
Data-Driven Feature Tracking for Event Cameras With and Without Frames
Temporally-Consistent Surface Reconstruction Using Metrically-Consistent Atlases
VMarker-Pro: Probabilistic 3D Human Mesh Estimation From Virtual Markers
Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving
MS-NeRF: Multi-Space Neural Radiance Fields
Fair Representation Learning for Continuous Sensitive Attributes Using Expectation of Integral Probability Metrics
Concept Neural Network Based on Time-Delay Regret for Dynamic Stream Learning
Dual-Level Matching With Outlier Filtering for Unsupervised Visible-Infrared Person Re-Identification
A Causality-Aware Paradigm for Evaluating Creativity of Multimodal Large Language Models
Generalized Conditional Similarity Learning via Semantic Matching
Transferable Unintentional Action Localization With Language-Guided Intention Translation
Benchmarking and Improving Bird’s Eye View Perception Robustness in Autonomous Driving
Revisiting Flatness-Aware Optimization in Continual Learning With Orthogonal Gradient Projection
A Decentralized Framework for Kernel PCA With Projection Consensus Constraints
Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields