S2E | Vision and Autonomy Intelligence Lab

TL;DR

S2E

unified

Residual-Attention Module

NavBench-GS

S2E Model architecture

S2E pipeline consists of two key components:
(1) Anchor-Guided Distribution Matching: A framework that uses anchor-conditioned architecture to learn multi-modal trajectory distributions from offline real-world videos, improving model capability from the side of representation.
(2) Residual Attention Module: A lightweight residual design that fine-tunes pretrained attention blocks via reinforcement learning in simulation, enabling new behaviors (e.g., obstacle avoidance) while preserving general visual-motor priors.

Environments for Pretraining and Finetuning

Video-Action Pretraining

URBAN-SIM Closed-loop Finetuning

We build NavBench-GS, a 3D Gaussian Splatting-based benchmark for evaluating navigation policies in closed-loop, visually reconstructed urban environments with simulated objects and pedestrians.

Real-World Deployment

Obstacle Avoidance

Cross-Embodiment

Comparison

Reference

@article{he2026seeing,
    title={From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning},
    author={He, Honglin and Ma, Yukai and Squicciarini, Brad and Wu, Wayne and Zhou, Bolei},
    journal={The Fourteenth International Conference on Learning Representations},
    year={2026}
}

From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning

ICLR 2026

Honglin He* ¹ , Yukai Ma* ¹ , Brad Squicciarini ² , Wayne Wu ¹ , Bolei Zhou ¹

¹ University of California, Los Angeles , ² Coco Robotics

Code | Paper