Menu

Paper Links

LiDAR MOT-DETR: A LiDAR-based Two-Stage
Transformer for 3D Multiple Object Tracking

Martha Teiko Teye1,2Ori Maoz2Matthias Rottmann3

1 Department of Mathematics, University of Wuppertal, Wuppertal, Germany
2 Aptiv, Germany
3 Institute of Computer Science, Osnabrück University, Osnabrück, Germany

British Machine Vision Conference (BMVC), 2025

Abstract

Multi-object tracking from LiDAR point clouds presents unique challenges due to the sparse and irregular nature of the data, compounded by the need for temporal coherence across frames. Traditional tracking systems often rely on hand-crafted features and motion models, which can struggle to maintain consistent object identities in crowded or fast-moving scenes. We present a lidar-based two-staged DETR inspired transformer; a smoother and tracker. The smoother stage refines lidar object detections, from any off-the-shelf detector, across a moving temporal window. The tracker stage uses a DETR-based attention block to maintain tracks across time by associating tracked objects with the refined detections using the point cloud as context. The model is trained on the datasets nuScenes and KITTI in both online and offline (forward peeking) modes demonstrating strong performance across metrics such as ID-switch and multiple object tracking accuracy (MOTA). The numerical results indicate that the online mode outperforms the lidar-only baseline and SOTA models on the nuScenes dataset, with an aMOTA of 0.724 and an aMOTP of 0.475, while the offline mode provides an additional 3 pp aMOTP.

Method Overview

The pipeline consists of a smoother transformer followed by a tracking transformer.

Overall Architecture

Step 1: Detect objects using off-the-shelf object detector.
Step 2: Refine object identities and trajectories using Smoother temporal attention.
Step 3: Train the tracker model using features from lidar point clouds, detections and track queries from previous frame.

Metrics at a Glance

Overall tracking results on nuScenes Test Set

Comparison of LiDAR only tracking methods. Best method in bold, second best underlined.

table1

Detection metrics on nuScenes validation set

Best method in bold.

table2

Qualitative Results

Comparison between our method(LiDAR MOT-DETR) and 3DMOTFormer.
Both methods use centerpoint detections.

additional_image_0

Resources & Links

Acknowledgments

Citation

                      @inproceedings{Teye_2025_BMVC,
                        author    = {Martha Teiko Teye and Ori Maoz and Matthias Rottmann},
                        title     = {LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking},
                        booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
                        publisher = {BMVA},
                        year      = {2025},
                        url       = {https://bmva-archive.org.uk/bmvc/2025/papers/Paper_220/paper.pdf}
                        }
          

Contact

Have questions or want to collaborate? Reach out: