Add benchmark Makefile for eval and Codabench submission. by AlexBodner · Pull Request #436 · roboflow/trackers

AlexBodner · 2026-05-25T17:34:01Z

What does this PR do?

Introduces benchmark/ directory with makefile for automatic benchamarking on MOT17, SportsMOT, and DanceTrack. Eval uses workaround for per-tracker CLI parameters (workaround to what was mentioned that would be fixed with CLI refactor). Soccernet is supported with local evaluation.

Type of Change

New feature (non-breaking change that adds functionality)
Documentation update

Testing

I have tested this change locally
[] I have added/updated tests for this change

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

Introduces benchmark/ with make targets for setup, tune, eval, submit, and upload-codabench on MOT17, SportsMOT, and DanceTrack. Submit uses submit_yolox.py with library defaults; eval uses tracker_flags.py for per-tracker CLI parameters. Co-authored-by: Cursor <cursoragent@cursor.com>

…com/roboflow/trackers into feat/benchmark-codabench-submission

Copilot

Pull request overview

Adds a repo-local benchmarking workflow intended to reproduce/refresh the tracker comparison numbers by orchestrating data preparation, tuning, tracking, evaluation, and (where applicable) Codabench submissions. It also updates the docs comparison page to reflect updated detection sources (notably for DanceTrack).

Changes:

Introduce a new benchmark/ directory with a Makefile-driven pipeline and helper scripts for MOT-format prep, tracking, formatting, uploads, and score aggregation.
Add Codabench submission + polling tooling (pure stdlib HTTP client) and MOT17-specific submission formatting.
Update docs/trackers/comparison.md wording about which datasets use YOLOX vs oracle detections.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
docs/trackers/comparison.md	Updates benchmark/detection-source notes and DanceTrack detection wording.
benchmark/Makefile	Orchestrates setup, prep, tune, track, eval, Codabench upload, and collection targets.
benchmark/README.md	Documents required dataset layout, workflow steps, and Codabench token setup.
benchmark/.gitignore	Ignores local benchmark data/output artifacts.
benchmark/scripts/datasets.py	Centralizes dataset splits, paths, and Codabench IDs used by the workflow.
benchmark/scripts/data_check.py	Verifies expected dataset assets exist under `DATA_ROOT`.
benchmark/scripts/prep_data.py	Flattens vendor detections/GT into per-sequence MOT `.txt` files under `benchmark_prep/`.
benchmark/scripts/track_split.py	Runs a selected tracker over prepared detections and writes MOT prediction files.
benchmark/scripts/mot_format.py	Normalizes and packages predictions into Codabench-compatible submission zips (incl. MOT17 triplication/stubs).
benchmark/scripts/codabench_submit.py	Uploads/polls Codabench submissions and optionally writes a JSON summary of results.
benchmark/scripts/collect.py	Aggregates per-dataset JSON scores into a markdown table + summary JSON.
benchmark/scripts/align_mot17_val_gt.py	Filters MOT17 val GT to match the frame range covered by the YOLOX val detections.

AlexBodner · 2026-05-26T18:18:04Z

+from trackers.core.base import BaseTracker
+from trackers.tune.tuner import _run_tracker_on_detections
+


mmhh, we could make them public or still use them like this

…mark

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…com/roboflow/trackers into feat/benchmark-codabench-submission

Aligns the benchmark with the train+val+test methodology: optimize on val, score on Codabench test. Co-authored-by: Cursor <cursoragent@cursor.com>

…ing. Include cbiou in COMPARISON_TRACKERS for tune/benchmark/collect workflows, and retry submission polling on transient DNS and connection errors. Co-authored-by: Cursor <cursoragent@cursor.com>

Bring in C-BIoU tracker from develop and align DanceTrack comparison notes with val-tune / Codabench test scoring. Co-authored-by: Cursor <cursoragent@cursor.com>

AlexBodner and others added 2 commits May 25, 2026 11:51

Refactor benchmark workflow into Makefile plus focused scripts.

043b2d7

AlexBodner requested a review from SkalskiP as a code owner May 25, 2026 17:34

pre-commit-ci Bot and others added 8 commits May 25, 2026 17:34

fix(pre_commit): 🎨 auto format pre-commit hooks

3501222

fixed dancetrack dets origin in docs

720360d

added mot17 val half filter

901563a

added mot17 val half filter

3e0401a

Merge branch 'feat/benchmark-codabench-submission' of https://github.…

6e4444b

…com/roboflow/trackers into feat/benchmark-codabench-submission

fix(pre_commit): 🎨 auto format pre-commit hooks

59565b4

fixed ruff error of unsafe http requests

43b5da4

Merge branch 'feat/benchmark-codabench-submission' of https://github.…

aefe434

…com/roboflow/trackers into feat/benchmark-codabench-submission

Borda requested a review from Copilot May 25, 2026 18:20

Copilot started reviewing on behalf of Borda May 25, 2026 18:20 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

AlexBodner and others added 12 commits May 26, 2026 11:09

Add multi-tracker comparison tables and Codabench poll/retry to bench…

1559739

…mark

fix(pre_commit): 🎨 auto format pre-commit hooks

9f402d4

Potential fix for pull request finding

8c7226b

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Address benchmark PR review feedback

6477570

Address benchmark PR review feedback

27890f7

Potential fix for pull request finding

825cebd

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

fixed ruff

0fb6927

Merge branch 'feat/benchmark-codabench-submission' of https://github.…

4f3548d

…com/roboflow/trackers into feat/benchmark-codabench-submission

space change for botsort parameters

d615514

Tune DanceTrack on validation split instead of train.

cb9b1f0

Aligns the benchmark with the train+val+test methodology: optimize on val, score on Codabench test. Co-authored-by: Cursor <cursoragent@cursor.com>

Add C-BIoU to benchmark comparison trackers and harden Codabench poll…

3c3476a

…ing. Include cbiou in COMPARISON_TRACKERS for tune/benchmark/collect workflows, and retry submission polling on transient DNS and connection errors. Co-authored-by: Cursor <cursoragent@cursor.com>

Merge branch 'develop' into feat/benchmark-codabench-submission.

95ef54a

Bring in C-BIoU tracker from develop and align DanceTrack comparison notes with val-tune / Codabench test scoring. Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark Makefile for eval and Codabench submission.#436

Add benchmark Makefile for eval and Codabench submission.#436
AlexBodner wants to merge 22 commits into
developfrom
feat/benchmark-codabench-submission

AlexBodner commented May 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

AlexBodner May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from trackers.core.base import BaseTracker
		from trackers.tune.tuner import _run_tracker_on_detections

Conversation

AlexBodner commented May 25, 2026

What does this PR do?

Type of Change

Testing

Checklist

Additional Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

AlexBodner May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexBodner May 26, 2026 •

edited

Loading