FutrTrack: A Camera-LiDAR Fusion Transformer
for 3D Multiple Object Tracking
Martha Teiko Teye1,2 • Ori Maoz2 • Matthias Rottmann3
1 Department of Mathematics, University of Wuppertal, Wuppertal, Germany
2 Aptiv Services Deutschland GmbH
3 Institute of Computer Science, Osnabrück University, Germany
Abstract
FutrTrack is a modular camera-LiDAR multi-object tracking framework that builds on existing 3D detectors by introducing a transformer-based smoother and a fusion-driven tracker. It employs a multimodal two-stage transformer refinement and tracking pipeline, integrating bounding boxes with multimodal BEV fusion features from multiple cameras and LiDAR. The tracker assigns and propagates identities across frames using geometric and semantic cues, improving robustness under occlusion and viewpoint changes. Evaluated on nuScenes and KITTI, FutrTrack achieves an aMOTA of 74.7 on nuScenes, reducing identity switches and improving spatial consistency.
Method Overview
FutrTrack consists of a transformer-based smoother and a fusion-driven tracker for 3D MOT.
Step 1: Apply temporal smoothing to bounding box sequences.
Step 2: Fuse camera and LiDAR features in BEV space.
Step 3: Use transformer-based tracking with query-based identity propagation.
Metrics at a Glance
Tracking Metrics on NuScenes data
Overall Tracking results on Nuscenes LeaderBoard
Qualitative Results
Compare tracking outputs across occlusion and sensor fusion scenarios.
Resources & Links
Acknowledgments
Citation
@article{teye2025futrtrack,
title={FutrTrack: A Camera-LiDAR Fusion Transformer for 3D Multiple Object Tracking},
author={Teye, Martha Teiko and Maoz, Ori and Rottmann, Matthias},
journal={arXiv preprint arXiv:2510.19981},
year={2025}
}
Contact
Have questions or want to collaborate? Reach out: