MIMIC | Vision and Autonomy Intelligence Lab

We propose a method for training a sidewalk autopilot model that enables mobile robots to navigate autonomously with obstacle avoidance, lane following, and pedestrian awareness. The hardware platform is developed by Coco Robotics.

TL;DR

MIMIC

corrective behavior expansion

generative data augmentation

multi-scale imitation learning

corrective behavior expansion

generative data augmentation

MIMIC Framework

MIMIC adopts an encoder-decoder architecture that processes RGB observations, and goal signals into a spatiotemporal representation. The action decoder leverages time-horizon-specific anchors to produce actions parameterized by GMMs across multiple horizons, enabling the model to learn both fine-grained reactivity and long-term planning in a unified framework.

Corrective Behavior Expansion

We synthesize failure-correction scenarios by deliberately generating trajectories in which the robot deviates from the intended route, and then provide corrective actions as supervision.

Based on TrajectoryCrafter, we perturb the trajectory using a deviation-recovery noise sequence, and re-render novel observations, pairing each perturbed trajectory with a corrective recovery maneuver.

Original

Deviation-Recovery

Original

Deviation-Recovery

Generative Data Augmentation

We adopt a fore-background relighting model to enrich visual diversity while preserving scene geometry.

Based on Light-A-Video, we disentangle foreground objects from the background, and apply prompt-based relighting with different strength coefficients to synthesize novel lighting conditions.

Original	"icy road with strong reflections from frozen surface"	"commercial street at night with shop signs lit"
Original	"snowfall reducing visibility"	"dusk with half-lit sky"
Original	"rain streaks on glass facades reflecting light"	"sunny afternoon with strong shadows"
Original	"after rain with wet ground reflections"	"evening twilight with streetlights turning on"

Real-World Deployment

We evaluate the learned autopilot policy on a wheeled delivery robot developed by Coco Robotics. It uses a front monocular RGB camera as its sole perception input for sidewalk navigation. The policy runs in real time, producing trajectory to generate steering and velocity commands from images and GPS without any HD maps or LiDAR.

Sidewalk Lane Following — The robot stays centered within the sidewalk lane, handling narrow paths.

Pedestrian Awareness — The robot yields to oncoming pedestrians, adjusting its trajectory to maintain safe clearance.

Complex Real-World Scenario — The robot navigates a cluttered sidewalk with mixed obstacles.

Acknowledgement

We build our pipelines upon TrajectoryCrafter for novel-view trajectory synthesis and Light-A-Video for video relighting. We thank the authors for open-sourcing their work.

We thank Coco Robotics for providing the robot platform and teleoperation data used in this work.

Reference

@inproceedings{he2026learning,
    title={Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion},
    author={Honglin He and Yukai Ma and Brad Squicciarini and Wayne Wu and Bolei Zhou},
    booktitle={2026 IEEE International Conference on Robotics and Automation (ICRA)},
    year={2026},
    organization={IEEE}
}

Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion

ICRA 2026

Honglin He ¹ , Yukai Ma ¹ , Brad Squicciarini ² , Wayne Wu ¹ , Bolei Zhou ¹

¹ University of California, Los Angeles , ² Coco Robotics

Code | Paper