Emit Dag tags as metric tags#68568
Draft
sortega wants to merge 2 commits into
Draft
Conversation
Append each Dag tag to the metric tags emitted for Dag runs and task instances. Tags containing ':' (e.g. env:prod) become a key:value pair; plain tags (e.g. production) become standalone DogStatsD tags or 'key=true' in InfluxDB line-protocol format. Built-in metric tag keys (dag_id, run_type, task_id) always win on collision. Tags are read from the in-memory Dag, so no extra DB queries are issued: - DagRun.stats_tags reads dag_model.tags (already loaded in the scheduler). - TaskInstance.stats_tags reads dag_model.tags only when already in the SQLAlchemy session. - The Task SDK worker enriches ti.start / ti.finish / ti_successes / operator_* metrics from ti.task.dag.tags, with no metastore access in the worker process.
Dag tags are free-form, user-defined strings; emitting them as metric tags unconditionally risks unexpected cardinality. Add the [metrics] dag_tags_in_metrics option (default False) and guard all three stats_tags paths (DagRun, TaskInstance, RuntimeTaskInstance) on it. The gate is the first guard, outside the try, so a config error surfaces instead of silently disabling the feature. Only the genuine failure mode is caught — SQLAlchemyError from the DagRun.dag_model lazy load on a detached or expired instance — rather than a blanket Exception. The Task SDK path reads an in-memory Dag and needs no try/except.
80ec8c9 to
fe514c9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Airflow's Dag-run and task-instance metrics carry a small fixed set of tags (
dag_id,run_type,task_id, andteam_name). Dag authors already annotate their Dags with free-form tags likeproductionorenv:prod, but those never reach the metrics — so dashboards and alerts can't be sliced by team, environment, or criticality without hardcodingdag_idlists.This PR optionally surfaces each Dag tag as an individual metric tag on all Dag-run and task-instance metrics, across the StatsD, DogStatsD and InfluxDB backends:
:(e.g.env:prod) splits into a key/value pair (env→prod).production) becomes a standalone DogStatsD tag, orproduction=truein InfluxDB line protocol.dag_id,run_type,task_id) always win on collision.Because Dag tags are unbounded, user-defined strings, the behavior is gated behind a new
[metrics] dag_tags_in_metricsoption, disabled by default, to avoid surprise cardinality increases on existing installations. Tags are read from the in-memory serialized Dag, so no extra DB queries are issued and the worker never touches the metastore.This is the metrics counterpart of the goal in #37901 (routing observability by Dag tags), and builds on the
team_namework from #68108.related: #37901
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.8) following the guidelines