Menu

Paper Links

FutrTrack: A Camera-LiDAR Fusion Transformer
for 3D Multiple Object Tracking

Martha Teiko Teye1,2Ori Maoz2Matthias Rottmann3

1 Department of Mathematics, University of Wuppertal, Wuppertal, Germany
2 Aptiv Services Deutschland GmbH
3 Institute of Computer Science, Osnabrück University, Germany

Title image 1

Abstract

FutrTrack is a modular camera-LiDAR multi-object tracking framework that builds on existing 3D detectors by introducing a transformer-based smoother and a fusion-driven tracker. It employs a multimodal two-stage transformer refinement and tracking pipeline, integrating bounding boxes with multimodal BEV fusion features from multiple cameras and LiDAR. The tracker assigns and propagates identities across frames using geometric and semantic cues, improving robustness under occlusion and viewpoint changes. Evaluated on nuScenes and KITTI, FutrTrack achieves an aMOTA of 74.7 on nuScenes, reducing identity switches and improving spatial consistency.

Method Overview

FutrTrack consists of a transformer-based smoother and a fusion-driven tracker for 3D MOT.

Fusion Tracker Architecture

Step 1: Apply temporal smoothing to bounding box sequences.
Step 2: Fuse camera and LiDAR features in BEV space.
Step 3: Use transformer-based tracking with query-based identity propagation.

Metrics at a Glance

Tracking Metrics on NuScenes data

Segmentation metrics summary

Overall Tracking results on Nuscenes LeaderBoard

Classification metrics summary

Qualitative Results

Compare tracking outputs across occlusion and sensor fusion scenarios.

figure 5

Resources & Links

Acknowledgments

Citation

@article{teye2025futrtrack,
        title={FutrTrack: A Camera-LiDAR Fusion Transformer for 3D Multiple Object Tracking},
        author={Teye, Martha Teiko and Maoz, Ori and Rottmann, Matthias},
        journal={arXiv preprint arXiv:2510.19981},
        year={2025}
      }
          

Contact

Have questions or want to collaborate? Reach out: