Adaptive Human Trajectory Prediction via Latent Corridors

Adaptive trajectory prediction. (left) Given a history of human behavior (shown in black), the pre-trained predictor is unable to understand scene-specific behavior trends, like people entering a subterranean subway entrance (bottom row). (right) When adapting, the number of people and amount of time determine the total number of trajectories observed, and we denote this time-dependent quantity human-seconds. Here, the three columns correspond to our method trained for a very small (left), medium (middle), and large amount of human-seconds. Our adaptive latent corridors approach enables the predictor to quickly learn scene-specific trends, improving predictions with even small amounts of data, and closing the gap between the ground-truth (green) and predicted behavior (orange). For example, in the middle row, we see that the base predictor predicts the person will move towards the camera, but as our adaptive predictor sees more human-seconds of data, it adapts to the trend that people in this plaza scene tend to avoid the center of the plaza and instead move diagonally across it.

Abstract

Human trajectory prediction is typically posed as a zero-shot generalization problem: a predictor is learnt on a dataset of human motion in training scenes, and then deployed on unseen test scenes. While this paradigm has yielded tremendous progress, it fundamentally assumes that trends in human behavior within the deployment scene are constant over time. As such, current prediction models are unable to adapt to scene-specific transient human behaviors, such as crowds temporarily gathering to see buskers, pedestrians hurrying through the rain and avoiding puddles, or a protest breaking out. We formalize the problem of scene-specific adaptive trajectory prediction (ATP) and propose a new adaptation approach inspired by prompt tuning called latent corridors. By augmenting the input of any pre-trained human trajectory predictor with learnable image prompts, the predictor can improve in the deployment scene by inferring trends from extremely small amounts of new data (e.g., 2 humans observed for 30 seconds). With less than 0.1% additional model parameters, we see up to 23.9% ADE improvement in MOTSynth simulated data and 16.4% ADE in MOT and Wildtrack real pedestrian data. Qualitatively, we observe that latent corridors imbue predictors with an awareness of scene geometry and scene-specific human behaviors that non-adaptive predictors struggle to capture.

The below video shows results of our latent corridors approach to adaptive trajectory prediction on synthetic MOTSynth data and real MOT and webcam data.

Qualitative examples of the types of awareness our latent corridors approach can imbue an adapted predictor with:

Fine-grained awareness of obstacles.
Awareness that pedestrians will turn towards stairs.
Better understanding of human behaviour, such as which ways pedestrians turn in a plaza, the fact that humans don't normally walk into water, and awareness that humans on a billboard stay in the billboard.
Awareness of where the 3D ground plane lies in the 2D image.

Acknowledgments

Project webpage based on StyleGAN3.