Add benchmark Makefile for eval and Codabench submission.#436
Open
AlexBodner wants to merge 22 commits into
Open
Add benchmark Makefile for eval and Codabench submission.#436AlexBodner wants to merge 22 commits into
AlexBodner wants to merge 22 commits into
Conversation
Introduces benchmark/ with make targets for setup, tune, eval, submit, and upload-codabench on MOT17, SportsMOT, and DanceTrack. Submit uses submit_yolox.py with library defaults; eval uses tracker_flags.py for per-tracker CLI parameters. Co-authored-by: Cursor <cursoragent@cursor.com>
…com/roboflow/trackers into feat/benchmark-codabench-submission
…com/roboflow/trackers into feat/benchmark-codabench-submission
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a repo-local benchmarking workflow intended to reproduce/refresh the tracker comparison numbers by orchestrating data preparation, tuning, tracking, evaluation, and (where applicable) Codabench submissions. It also updates the docs comparison page to reflect updated detection sources (notably for DanceTrack).
Changes:
- Introduce a new
benchmark/directory with a Makefile-driven pipeline and helper scripts for MOT-format prep, tracking, formatting, uploads, and score aggregation. - Add Codabench submission + polling tooling (pure stdlib HTTP client) and MOT17-specific submission formatting.
- Update
docs/trackers/comparison.mdwording about which datasets use YOLOX vs oracle detections.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/trackers/comparison.md | Updates benchmark/detection-source notes and DanceTrack detection wording. |
| benchmark/Makefile | Orchestrates setup, prep, tune, track, eval, Codabench upload, and collection targets. |
| benchmark/README.md | Documents required dataset layout, workflow steps, and Codabench token setup. |
| benchmark/.gitignore | Ignores local benchmark data/output artifacts. |
| benchmark/scripts/datasets.py | Centralizes dataset splits, paths, and Codabench IDs used by the workflow. |
| benchmark/scripts/data_check.py | Verifies expected dataset assets exist under DATA_ROOT. |
| benchmark/scripts/prep_data.py | Flattens vendor detections/GT into per-sequence MOT .txt files under benchmark_prep/. |
| benchmark/scripts/track_split.py | Runs a selected tracker over prepared detections and writes MOT prediction files. |
| benchmark/scripts/mot_format.py | Normalizes and packages predictions into Codabench-compatible submission zips (incl. MOT17 triplication/stubs). |
| benchmark/scripts/codabench_submit.py | Uploads/polls Codabench submissions and optionally writes a JSON summary of results. |
| benchmark/scripts/collect.py | Aggregates per-dataset JSON scores into a markdown table + summary JSON. |
| benchmark/scripts/align_mot17_val_gt.py | Filters MOT17 val GT to match the frame range covered by the YOLOX val detections. |
Comment on lines
+39
to
+41
| from trackers.core.base import BaseTracker | ||
| from trackers.tune.tuner import _run_tracker_on_detections | ||
|
|
Collaborator
Author
There was a problem hiding this comment.
mmhh, we could make them public or still use them like this
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…com/roboflow/trackers into feat/benchmark-codabench-submission
Aligns the benchmark with the train+val+test methodology: optimize on val, score on Codabench test. Co-authored-by: Cursor <cursoragent@cursor.com>
…ing. Include cbiou in COMPARISON_TRACKERS for tune/benchmark/collect workflows, and retry submission polling on transient DNS and connection errors. Co-authored-by: Cursor <cursoragent@cursor.com>
Bring in C-BIoU tracker from develop and align DanceTrack comparison notes with val-tune / Codabench test scoring. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Introduces benchmark/ directory with makefile for automatic benchamarking on MOT17, SportsMOT, and DanceTrack. Eval uses workaround for per-tracker CLI parameters (workaround to what was mentioned that would be fixed with CLI refactor). Soccernet is supported with local evaluation.
Type of Change
Testing
Checklist
Additional Context