FlowPilot | Vision and Autonomy Intelligence Lab

FlowPilot is validated for real-world experiments on the mobile robot platform developed by Coco Robotics.

TL;DR

FlowPilot

from imitation to alignment

Anchored Flow Matching

human-in-the-loop preference learning

FlowPilot-Base

FlowPilot-HP

FlowPilot Model Architecture

FlowPilot consists of two key components:
(1) Anchored Flow Matching: A conditional flow-matching policy anchored to clustered prototypical behaviors, learning smooth, multi-modal trajectories from offline demonstrations, with gated cross-attention that grounds decisions in scene context and avoids goal-driven shortcuts.
(2) Human-Preference Alignment: A reward-free, human-in-the-loop scheme that fine-tunes the pretrained policy from corrective interventions toward safe, socially compliant behavior while preserving the imitation prior.

Daytime (6X Speed)

Nighttime (6X Speed)

Long-horizon results in real-world sidewalk environments: using only a monocular RGB camera and coarse GPS, FlowPilot stays on the walkway while avoiding obstacles and pedestrians.

Capability Demonstrations

All videos in this section are played at 6× speed.

Sidewalk Lane Keeping

Dusk Commercial District

FlowPilot keeps the robot centered on the sidewalk, smoothly following the walkable path through curves and intersections while staying clear of the road and grass margins.

Obstacle Avoidance

Daytime Campus

Dusk Campus

FlowPilot detects obstacles ahead like parked scooters and steers smoothly around them before returning to the sidewalk, without stalling or veering into the road.

Pedestrian Awareness

Daytime Commercial District

Dusk Residential Neighborhood

Nighttime Commercial District

When pedestrians share or cross the walkway, FlowPilot anticipates their motion and responds in a socially compliant way: slowing, yielding, and keeping a safe clearance.

Robustness under Varying Lighting

Dusk

Nighttime

At night, headlight glare, streetlamp halos, deep shadows, and low contrast severely degrade monocular RGB perception. Without any depth sensor, LiDAR, or pre-built map, FlowPilot still follows the sidewalk and avoids obstacles and pedestrians, holding stable trajectories across these challenging illumination conditions.

Comparison with State-of-the-Art Methods

NoMaD

FlowPilot-HP

CityWalker

FlowPilot-HP

Under identical conditions, FlowPilot-HP stays centered on the walkway and progresses smoothly toward the goal, while the NoMaD and CityWalker baselines drift off the sidewalk or stall.

Cross-Embodiment Generalization

FlowPilot generalizes across robot embodiments: the same policy controls robots with different dynamics, footprints, and camera viewpoints, maintaining consistent behaviors.

Ablation Studies

Effectiveness of Robot-Agnostic Pretraining

Scaling of web-scale visual-odometry pretraining

Pretraining on the large-scale robot-agnostic dataset with diverse dynamics improves downstream navigation for both goal-less and point-goal navigation, showing that robot-agnostic dataset is an effective, scalable pretraining signal.

Effectiveness of Gated Attention

Goal-point attention with and without gated attention

Fraction of attention placed on the goal token across decoder layers. Without gating, attention increasingly concentrates on the goal (an attention sink) that encourages goal-driven shortcuts; gated attention markedly reduces this concentration in both mean and max, letting the policy attend to scene context.

Effectiveness of Preference Learning

Preference Data Collection-1

Preference Data Collection-2

FlowPilot-Base (Collision)

FlowPilot-HP (Success)

Top: preference data is gathered from brief human interventions during teleoperation. Bottom: starting from the same imitation prior, the preference-aligned FlowPilot-HP behaves more cautiously and is more socially compliant than FlowPilot-Base, requiring fewer interventions while retaining the base policy's navigation skills.

Reference

@article{he2026from,
         title={From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation},
         author={He, Honglin and Liu, Zhizheng and Ma, Yukai and Zhou, Bolei},
         journal={arXiv preprint},
         year={2026},
}

Acknowledgement

We thank Brad Squicciarini and Akshat Pandya for providing comments and feedback.

From Imitation to Alignment:
Human-Preference Flow Policies for
Long-Horizon Sidewalk Navigation

Honglin He , Zhizheng Liu , Yukai Ma , Bolei Zhou

University of California, Los Angeles

Code | Paper