Skip to content

[SPARK-57444][INFRA] Document versioning and branch policy in AGENTS.md and add dev/next_version_candidates.py#56504

Open
cloud-fan wants to merge 9 commits into
apache:masterfrom
cloud-fan:SPARK-agents-version-policy
Open

[SPARK-57444][INFRA] Document versioning and branch policy in AGENTS.md and add dev/next_version_candidates.py#56504
cloud-fan wants to merge 9 commits into
apache:masterfrom
cloud-fan:SPARK-agents-version-policy

Conversation

@cloud-fan

@cloud-fan cloud-fan commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds a "Versioning and Branch Policy" section to AGENTS.md, plus a dev/next_version_candidates.py helper, so coding agents pick the correct target version for a change (used for @since annotations, config .version("...") entries, new MimaExcludes sections, etc.).

The documented policy:

  • Most PRs merge to both master and the latest rolling maintenance branch branch-<N>.x; only breaking / binary-incompatible changes and non-critical dependency upgrades go to master only.
  • A change's first-release version therefore comes from the branch it first ships in -- branch-<N>.x for a normally-backported change, master for a master-only one -- with -SNAPSHOT stripped. Deriving it from master's -SNAPSHOT alone is the common mistake.
  • Deciding whether a change is master-only is a judgement call, so the section says to ask the user when unsure.

dev/next_version_candidates.py (no arguments) reports the two candidate versions mechanically, so the caller only has to apply that judgement:

$ dev/next_version_candidates.py
master       5.0.0
branch-4.x   4.3.0

It reads from the configured apache/spark remote, selects the highest branch-<N>.x (integer-compared, so branch-10.x > branch-4.x), strips -SNAPSHOT, and reports facts only -- it does not decide which version applies. It uses only the Python standard library and fails with an actionable message if no apache/spark remote is configured (rather than fetching full ref histories over the network into the working repo).

Why are the changes needed?

AGENTS.md guides coding agents working in this repo. Without this, an agent reading only master (currently 5.0.0-SNAPSHOT) would label a normally-backported change @since 5.0.0, when it actually ships first in the branch-4.x release (4.3.0). The helper removes the manual, error-prone branch/version lookup while leaving the master-only judgement to the agent (or the user).

Does this PR introduce any user-facing change?

No (developer tooling and documentation for coding agents only).

How was this patch tested?

  • Ran dev/next_version_candidates.py against the live apache/spark remote -> master 5.0.0, branch-4.x 4.3.0, matching the pom.xml on each branch.
  • Verified the highest-branch-<N>.x selection end-to-end against a throwaway remote carrying branch-4.x, branch-5.x, and branch-10.x (picks branch-10.x), plus non-rolling decoys branch-4.2 / branch-5.0 (correctly ignored).
  • Verified the error path: in a repo with no apache/spark remote it exits non-zero immediately with the "add a remote" message and makes no network call.
  • black (the repo-pinned version, line length 100) reports the file unchanged.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

This pull request and its description were written by Isaac.

@cloud-fan cloud-fan force-pushed the SPARK-agents-version-policy branch from 7ed9956 to 119d0d9 Compare June 14, 2026 18:22
Comment thread AGENTS.md Outdated

## Versioning and Branch Policy

**If the PR is opened against a non-`master` base branch** (e.g. it targets `branch-4.x` directly), the target version is simply that base branch's version with `-SNAPSHOT` stripped — done; skip the rest of this section.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the content, this PR is blocked by the following.

@cloud-fan cloud-fan force-pushed the SPARK-agents-version-policy branch from 119d0d9 to 4dc8722 Compare June 14, 2026 22:37
@cloud-fan cloud-fan force-pushed the SPARK-agents-version-policy branch from 4dc8722 to 029cdb4 Compare June 14, 2026 22:46
@cloud-fan

Copy link
Copy Markdown
Contributor Author

#56503 merged, this is ready to review. cc @HyukjinKwon @dongjoon-hyun

The pre-flight checks only fetch `master`, so the remote-tracking ref
`<upstream>/branch-<N>.x` is normally absent and `git show
<upstream>/branch-<N>.x:pom.xml` fails with "invalid object name".
Fetch the branch first and read `FETCH_HEAD:pom.xml`, consistent with the
preceding `git ls-remote` querying the remote directly.

Co-authored-by: Isaac
Replace the manual "pick highest N and substitute" step with a shell
pipeline: `sort -V | tail -1` selects the highest branch-<N>.x (correct
for multi-digit, e.g. branch-10.x > branch-4.x), feeds it to git
fetch/show, and a trailing sed strips the XML tags and -SNAPSHOT so the
final command prints just the version (e.g. 4.3.0).

Co-authored-by: Isaac
…ookup

Replace the inline shell pipeline in AGENTS.md with a Python helper that
prints both candidate first-release versions (master and the latest
branch-<N>.x). Python avoids portability issues with sort -V (absent on
BSD/macOS) and awk field-splitting; it uses only the standard library and
pins UTF-8 decoding for deterministic behavior across locales. The script
reports facts only -- the master-only judgement stays in the AGENTS.md prose.

Co-authored-by: Isaac
…lookup

The optional remote argument is a fallback for when auto-detection fails
(e.g. a fork-only clone with no remote pointing at apache/spark); it accepts
a remote name or a URL. Document that in the script's usage and error
message, and drop the dangling "or pass the remote name" aside from the
AGENTS.md prose so the happy path is just running the script.

Co-authored-by: Isaac
…ter-base path

The sample output (master 5.0.0 / branch-4.x 4.3.0) in AGENTS.md and the
script docstring is illustrative and goes stale as branches are cut -- label
it so, to avoid anchoring on specific numbers. Also note that the helper
covers the common master-base case; a non-master base branch needs a manual
pom.xml read on that branch.

Co-authored-by: Isaac
… remote arg

The apache/spark URL is a fixed constant, so there is no reason to make the
caller pass it. When no local remote points at apache/spark, fall back to the
canonical URL automatically instead of erroring. This removes the optional
remote argument entirely: the script now takes no arguments -- it prefers a
configured apache/spark remote (to honor its transport) and otherwise uses the
canonical URL.

Co-authored-by: Isaac
…gured

Drop the canonical-URL fallback. When auto-detection finds no remote pointing
at apache/spark, exit with an actionable message instead of fetching full ref
histories over the network into the working repo (slow, and a surprising side
effect for a read-only helper). The AGENTS.md pre-flight already has you
configure `upstream`, so the local remote path is the norm; a missing remote is
a setup gap worth surfacing, not papering over.

Co-authored-by: Isaac
@cloud-fan cloud-fan changed the title [SPARK-57444][INFRA] Document versioning and branch policy in AGENTS.md [SPARK-57444][INFRA] Document versioning and branch policy in AGENTS.md and add dev/next_version_candidates.py Jun 15, 2026
Use a maintenance line (branch-4.2) as the example instead of branch-4.x:
branch-4.x is the rolling branch the master-base case already routes to, so
it does not illustrate a PR opened directly against a maintenance branch. Also
note that when checked out on the base branch, the version is just the working
tree's pom.xml -- no tooling needed for that case.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants