SidewalkBench | Vision and Autonomy Intelligence Lab

Safely navigating complex city streets remains a significant challenge: robots must traverse long distances with varied layouts, avoid static obstacles, and interact safely with dynamic pedestrians. While recent visual navigation models offer promising solutions, the lack of a unified benchmark has hindered quantitative and reproducible evaluation. SidewalkBench bridges this gap with a comprehensive simulation platform for standardized model evaluation.

TL;DR

SidewalkBench

procedurally generated scenes

real-world scanned scenes

event-based high-level behaviors

SMPL-based animation pipeline

330 unit-test

800 pedestrian-reactive

105 long-horizon scenarios

SidewalkBench Overview

SidewalkBench is built on NVIDIA Isaac Sim, leveraging GPU-accelerated physics and realistic camera rendering. It includes two complementary scene types:
(1) Procedurally Generated Scenes: We define 7 primitive block types (straight, curve, intersection, etc.) and connect them via spline-based routing to form continuous urban topologies. Each block is divided into 5 functional zones (roads, sidewalks, curbs, road verges, frontage zones) with randomized layouts. We leverage UrbanVerse-100K, a large-scale urban asset database, to populate scenes with diverse sky HDRIs, ground textures, and static objects. This yields 100 large-scale environments, each covering 2 km×2 km.
(2) Real-world Scanned Scenes: Using a XGRIDS spatial camera with LiDAR and four cameras, we scan and reconstruct street blocks with photorealistic 3DGS appearance and accurate mesh geometry. We collect 11 real-world scanned scenes with an average scale of 150 m×150 m, annotated with sidewalk and crosswalk regions.

Pedestrian Simulation

SidewalkBench adopts a two-level approach for pedestrian simulation:
(1) Event-based High-level Behaviors: We classify common sidewalk interaction behaviors (obstructing, conversing, queueing, frontal/lateral approaching, overtaking, ped-crossing, gesturing), each triggered by the pedestrian’s relative position to the robot via a behavior state machine. This enables standardized, reproducible human-interactive scenarios.
(2) Flexible and Efficient Low-level Animation: We represent all pedestrians using the SMPL human body model, enabling full motion control via human motion generation models and datasets. Our custom Nvdiffrast-based pedestrian renderer achieves a 60x improvement in rendering efficiency compared to the native Isaac Sim human animation module, enabling large-scale evaluation in parallel environments.

Unit-test Scenarios

Unit-test scenarios evaluate model performance across three basic sidewalk structures. All videos are played at 4× speed.

Straight

Procedurally Generated

Real-world Scanned

Curve

Procedurally Generated

Real-world Scanned

Crosswalk

Procedurally Generated

Real-world Scanned

Pedestrian-reactive Scenarios

We evaluate 8 types of event-based pedestrian behaviors in procedurally generated scenes. All videos are played at 4× speed.

Obstructing

Conversing

Queueing

Frontal Approaching

Lateral Approaching

Overtaking

Ped-Crossing

Gesturing

Long-horizon Scenarios

Long-horizon scenarios require traversing large-scale environments (>100m start-to-goal distance). All videos are played at 4× speed.

Procedurally Generated

Real-world Scanned

Finetuning from Synthetic Data

Our simulation platform can serve as a scalable synthetic data generator for model finetuning. All videos are played at 4× speed.

Ped-Crossing

Before Finetuning

After Finetuning

Gesturing

Before Finetuning

After Finetuning

Additional Demos

Other Robot Embodiments

Visualization of Real-world Scanned Scenes

Reference

@article{liu2026sidewalkbench,
         title={SidewalkBench: Benchmarking Visual Navigation on Urban Sidewalks},
         author={Liu, Zhizheng and He, Honglin and Alumootil, Vivek and Pandya, Akshat and Squicciarini, Brad and Wu, Wayne and Zhou, Bolei},
         journal={arXiv preprint},
         year={2026},
}

Relevant Work

Zhizheng Liu, Joe Lin, Wayne Wu, Bolei Zhou. Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels. ICLR 2025.
Comment: This work proposes a model PedGen for context-aware pedestrian movement generation from pseudo-labels of web videos. We can use PedGen to generate diverse pedestrian movements in SidewalkBench.

Zhizheng Liu, Joe Lin, Wayne Wu, Bolei Zhou. Joint Optimization for 4D Human-Scene Reconstruction in the Wild. ICLR 2026.
Comment: This work proposes a method JOSH for reconstructing global human motion and the surrounding environment from in-the-wild videos. We can use JOSH to reconstruct novel pedestrian motion like a stopping gesture and directly use it in SidewalkBench.

SidewalkBench: Benchmarking Visual Navigation on Urban Sidewalks

Zhizheng Liu ^{1, } , Honglin He ^{1, } , Vivek Alumootil ^{1, *}

Akshat Pandya ² , Brad Squicciarini ² , Wayne Wu ¹ , Bolei Zhou ¹

¹ University of California, Los Angeles , ² Coco Robotics , ^* Equal Contribution

Code | Paper

TL;DR

SidewalkBench Overview

Pedestrian Simulation

Unit-test Scenarios

Straight

Curve

Crosswalk

Pedestrian-reactive Scenarios

Obstructing

Conversing

Queueing

Frontal Approaching

Lateral Approaching

Overtaking

Ped-Crossing

Gesturing

Long-horizon Scenarios

Finetuning from Synthetic Data

Ped-Crossing

Gesturing

Additional Demos

Other Robot Embodiments

Visualization of Real-world Scanned Scenes

Reference

Relevant Work