Skip to content

Emit Dag tags as metric tags#68568

Draft
sortega wants to merge 2 commits into
apache:mainfrom
DataDog:sortega/airflow-tagged-metrics
Draft

Emit Dag tags as metric tags#68568
sortega wants to merge 2 commits into
apache:mainfrom
DataDog:sortega/airflow-tagged-metrics

Conversation

@sortega

@sortega sortega commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Airflow's Dag-run and task-instance metrics carry a small fixed set of tags (dag_id, run_type, task_id, and team_name). Dag authors already annotate their Dags with free-form tags like production or env:prod, but those never reach the metrics — so dashboards and alerts can't be sliced by team, environment, or criticality without hardcoding dag_id lists.

This PR optionally surfaces each Dag tag as an individual metric tag on all Dag-run and task-instance metrics, across the StatsD, DogStatsD and InfluxDB backends:

  • A tag containing : (e.g. env:prod) splits into a key/value pair (envprod).
  • A plain tag (e.g. production) becomes a standalone DogStatsD tag, or production=true in InfluxDB line protocol.
  • Built-in keys (dag_id, run_type, task_id) always win on collision.

Because Dag tags are unbounded, user-defined strings, the behavior is gated behind a new [metrics] dag_tags_in_metrics option, disabled by default, to avoid surprise cardinality increases on existing installations. Tags are read from the in-memory serialized Dag, so no extra DB queries are issued and the worker never touches the metastore.

This is the metrics counterpart of the goal in #37901 (routing observability by Dag tags), and builds on the team_name work from #68108.

related: #37901


Was generative AI tooling used to co-author this PR?
  • Yes (Claude Code (Opus 4.8))

Generated-by: Claude Code (Opus 4.8) following the guidelines

Append each Dag tag to the metric tags emitted for Dag runs and task
instances. Tags containing ':' (e.g. env:prod) become a key:value pair;
plain tags (e.g. production) become standalone DogStatsD tags or
'key=true' in InfluxDB line-protocol format. Built-in metric tag keys
(dag_id, run_type, task_id) always win on collision.

Tags are read from the in-memory Dag, so no extra DB queries are issued:
- DagRun.stats_tags reads dag_model.tags (already loaded in the scheduler).
- TaskInstance.stats_tags reads dag_model.tags only when already in the
  SQLAlchemy session.
- The Task SDK worker enriches ti.start / ti.finish / ti_successes /
  operator_* metrics from ti.task.dag.tags, with no metastore access in
  the worker process.
Dag tags are free-form, user-defined strings; emitting them as metric tags
unconditionally risks unexpected cardinality. Add the [metrics]
dag_tags_in_metrics option (default False) and guard all three stats_tags
paths (DagRun, TaskInstance, RuntimeTaskInstance) on it.

The gate is the first guard, outside the try, so a config error surfaces
instead of silently disabling the feature. Only the genuine failure mode is
caught — SQLAlchemyError from the DagRun.dag_model lazy load on a detached or
expired instance — rather than a blanket Exception. The Task SDK path reads an
in-memory Dag and needs no try/except.
@sortega sortega force-pushed the sortega/airflow-tagged-metrics branch from 80ec8c9 to fe514c9 Compare June 15, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant