Guest Editorial: Introduction to the Special Section on Large-Scale Multimodal Learning: Universality, Robustness, Efficiency, and Beyond ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments PoseScript: Linking 3D Human Poses and Natural Language Unpaired Image-Text Matching via Multimodal Aligned Conceptual Knowledge Single-Frame Supervision for Spatio-Temporal Video Grounding MoIL: Momentum Imitation Learning for Efficient Vision-Language Adaptation UniDetector: Towards Universal Object Detection With Heterogeneous Supervision Cap4Video++: Enhancing Video Understanding With Auxiliary Captions Language-Aware Vision Transformer for Referring Segmentation NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation Neural Prompt Search A General Spatial-Frequency Learning Framework for Multimodal Image Fusion Self-Supervised Multimodal Learning: A Survey Instance-Consistent Fair Face Recognition Autonomous Clustering by Fast Find of Mass and Distance Peaks C2P-Net: Comprehensive Depth Map to Planar Depth Conversion for Room Layout Estimation Calibration-Free Raw Image Denoising via Fine-Grained Noise Estimation Deformable Graph Transformer SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation Towards Unified Deep Image Deraining: A Survey and a New Benchmark LMP-GAN: Out-of-Distribution Detection for Non-Control Data Malware Attacks Learning to Explore Sample Relationships MB-RACS: Measurement-Bounds-Based Rate-Adaptive Image Compressed Sensing Network Semantic-Aware Pseudo-Labeling for Unsupervised Meta-Learning Re-Fed+: A Better Replay Strategy for Federated Incremental Learning Rate-Distortion Theory in Coding for Machines and Its Applications Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation Low-Shot Video Object Segmentation Pushing the Limit of Post-Training Quantization DIST+: Knowledge Distillation From a Stronger Adaptive Teacher Reason and Discovery: A New Paradigm for Open Set Recognition Generalizable Multi-Modal Adversarial Imitation Learning for Non-Stationary Dynamics Revisiting Stochastic Multi-Level Compositional Optimization HandRT: Simultaneous Hand Shape and Appearance Reconstruction With Pose Tracking From Monocular RGB-D Video ED-Pose++: Enhanced Explicit Box Detection for Conventional and Interactive Multi-Object Keypoint Detection Hard-Aware Instance Adaptive Self-Training for Unsupervised Cross-Domain Semantic Segmentation Hulk: A Universal Knowledge Translator for Human-Centric Tasks Impact of Noisy Supervision in Foundation Model Learning Learning Efficient Deep Discriminative Spatial and Temporal Networks for Video Deblurring Addressing Information Asymmetry: Deep Temporal Causality Discovery for Mixed Time Series GDRNPP: A Geometry-Guided and Fully Learning-Based Object Pose Estimator WAGE: Weight-Sharing Attribute-Missing Graph Autoencoder Generating Inverse Feature Space for Class Imbalance in Point Cloud Semantic Segmentation Graph Prompt Clustering ONNXPruner: ONNX-Based General Model Pruning Adapter Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization Scalable High-Fidelity 3D Hand Shape Reconstruction via Graph-Image Frequency Mapping and Graph Frequency Decomposition Cross-Modality Distillation for Multi-Modal Tracking Aesthetics-Guided Low-Light Enhancement Human as Points: Explicit Point-Based 3D Human Reconstruction From Single-View RGB Images