Skip to content

Add benchmark Makefile for eval and Codabench submission.#436

Open
AlexBodner wants to merge 22 commits into
developfrom
feat/benchmark-codabench-submission
Open

Add benchmark Makefile for eval and Codabench submission.#436
AlexBodner wants to merge 22 commits into
developfrom
feat/benchmark-codabench-submission

Conversation

@AlexBodner

Copy link
Copy Markdown
Collaborator

What does this PR do?

Introduces benchmark/ directory with makefile for automatic benchamarking on MOT17, SportsMOT, and DanceTrack. Eval uses workaround for per-tracker CLI parameters (workaround to what was mentioned that would be fixed with CLI refactor). Soccernet is supported with local evaluation.

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Documentation update

Testing

  • I have tested this change locally
  • [] I have added/updated tests for this change

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors
  • I have updated the documentation accordingly (if applicable)

Additional Context

AlexBodner and others added 2 commits May 25, 2026 11:51
Introduces benchmark/ with make targets for setup, tune, eval, submit,
and upload-codabench on MOT17, SportsMOT, and DanceTrack. Submit uses
submit_yolox.py with library defaults; eval uses tracker_flags.py for
per-tracker CLI parameters.

Co-authored-by: Cursor <cursoragent@cursor.com>
@AlexBodner AlexBodner requested a review from SkalskiP as a code owner May 25, 2026 17:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a repo-local benchmarking workflow intended to reproduce/refresh the tracker comparison numbers by orchestrating data preparation, tuning, tracking, evaluation, and (where applicable) Codabench submissions. It also updates the docs comparison page to reflect updated detection sources (notably for DanceTrack).

Changes:

  • Introduce a new benchmark/ directory with a Makefile-driven pipeline and helper scripts for MOT-format prep, tracking, formatting, uploads, and score aggregation.
  • Add Codabench submission + polling tooling (pure stdlib HTTP client) and MOT17-specific submission formatting.
  • Update docs/trackers/comparison.md wording about which datasets use YOLOX vs oracle detections.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
docs/trackers/comparison.md Updates benchmark/detection-source notes and DanceTrack detection wording.
benchmark/Makefile Orchestrates setup, prep, tune, track, eval, Codabench upload, and collection targets.
benchmark/README.md Documents required dataset layout, workflow steps, and Codabench token setup.
benchmark/.gitignore Ignores local benchmark data/output artifacts.
benchmark/scripts/datasets.py Centralizes dataset splits, paths, and Codabench IDs used by the workflow.
benchmark/scripts/data_check.py Verifies expected dataset assets exist under DATA_ROOT.
benchmark/scripts/prep_data.py Flattens vendor detections/GT into per-sequence MOT .txt files under benchmark_prep/.
benchmark/scripts/track_split.py Runs a selected tracker over prepared detections and writes MOT prediction files.
benchmark/scripts/mot_format.py Normalizes and packages predictions into Codabench-compatible submission zips (incl. MOT17 triplication/stubs).
benchmark/scripts/codabench_submit.py Uploads/polls Codabench submissions and optionally writes a JSON summary of results.
benchmark/scripts/collect.py Aggregates per-dataset JSON scores into a markdown table + summary JSON.
benchmark/scripts/align_mot17_val_gt.py Filters MOT17 val GT to match the frame range covered by the YOLOX val detections.

Comment thread docs/trackers/comparison.md
Comment thread benchmark/scripts/track_split.py Outdated
Comment on lines +39 to +41
from trackers.core.base import BaseTracker
from trackers.tune.tuner import _run_tracker_on_detections

@AlexBodner AlexBodner May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmhh, we could make them public or still use them like this

Comment thread benchmark/Makefile Outdated
Comment thread benchmark/Makefile Outdated
Comment thread benchmark/scripts/prep_data.py Outdated
Comment thread docs/trackers/comparison.md
Comment thread benchmark/scripts/codabench_submit.py Outdated
Comment thread benchmark/scripts/codabench_submit.py Outdated
AlexBodner and others added 12 commits May 26, 2026 11:09
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Aligns the benchmark with the train+val+test methodology: optimize on val, score on Codabench test.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ing.

Include cbiou in COMPARISON_TRACKERS for tune/benchmark/collect workflows, and retry submission polling on transient DNS and connection errors.

Co-authored-by: Cursor <cursoragent@cursor.com>
Bring in C-BIoU tracker from develop and align DanceTrack comparison notes with val-tune / Codabench test scoring.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants