[SPARK-57444][INFRA] Document versioning and branch policy in AGENTS.md and add dev/next_version_candidates.py#56504
Open
cloud-fan wants to merge 9 commits into
Open
Conversation
7ed9956 to
119d0d9
Compare
|
|
||
| ## Versioning and Branch Policy | ||
|
|
||
| **If the PR is opened against a non-`master` base branch** (e.g. it targets `branch-4.x` directly), the target version is simply that base branch's version with `-SNAPSHOT` stripped — done; skip the rest of this section. |
Member
There was a problem hiding this comment.
According to the content, this PR is blocked by the following.
119d0d9 to
4dc8722
Compare
Co-authored-by: Isaac
4dc8722 to
029cdb4
Compare
Contributor
Author
|
#56503 merged, this is ready to review. cc @HyukjinKwon @dongjoon-hyun |
dongjoon-hyun
approved these changes
Jun 14, 2026
The pre-flight checks only fetch `master`, so the remote-tracking ref `<upstream>/branch-<N>.x` is normally absent and `git show <upstream>/branch-<N>.x:pom.xml` fails with "invalid object name". Fetch the branch first and read `FETCH_HEAD:pom.xml`, consistent with the preceding `git ls-remote` querying the remote directly. Co-authored-by: Isaac
Replace the manual "pick highest N and substitute" step with a shell pipeline: `sort -V | tail -1` selects the highest branch-<N>.x (correct for multi-digit, e.g. branch-10.x > branch-4.x), feeds it to git fetch/show, and a trailing sed strips the XML tags and -SNAPSHOT so the final command prints just the version (e.g. 4.3.0). Co-authored-by: Isaac
…ookup Replace the inline shell pipeline in AGENTS.md with a Python helper that prints both candidate first-release versions (master and the latest branch-<N>.x). Python avoids portability issues with sort -V (absent on BSD/macOS) and awk field-splitting; it uses only the standard library and pins UTF-8 decoding for deterministic behavior across locales. The script reports facts only -- the master-only judgement stays in the AGENTS.md prose. Co-authored-by: Isaac
…lookup The optional remote argument is a fallback for when auto-detection fails (e.g. a fork-only clone with no remote pointing at apache/spark); it accepts a remote name or a URL. Document that in the script's usage and error message, and drop the dangling "or pass the remote name" aside from the AGENTS.md prose so the happy path is just running the script. Co-authored-by: Isaac
…ter-base path The sample output (master 5.0.0 / branch-4.x 4.3.0) in AGENTS.md and the script docstring is illustrative and goes stale as branches are cut -- label it so, to avoid anchoring on specific numbers. Also note that the helper covers the common master-base case; a non-master base branch needs a manual pom.xml read on that branch. Co-authored-by: Isaac
… remote arg The apache/spark URL is a fixed constant, so there is no reason to make the caller pass it. When no local remote points at apache/spark, fall back to the canonical URL automatically instead of erroring. This removes the optional remote argument entirely: the script now takes no arguments -- it prefers a configured apache/spark remote (to honor its transport) and otherwise uses the canonical URL. Co-authored-by: Isaac
…gured Drop the canonical-URL fallback. When auto-detection finds no remote pointing at apache/spark, exit with an actionable message instead of fetching full ref histories over the network into the working repo (slow, and a surprising side effect for a read-only helper). The AGENTS.md pre-flight already has you configure `upstream`, so the local remote path is the norm; a missing remote is a setup gap worth surfacing, not papering over. Co-authored-by: Isaac
Use a maintenance line (branch-4.2) as the example instead of branch-4.x: branch-4.x is the rolling branch the master-base case already routes to, so it does not illustrate a PR opened directly against a maintenance branch. Also note that when checked out on the base branch, the version is just the working tree's pom.xml -- no tooling needed for that case. Co-authored-by: Isaac
HyukjinKwon
approved these changes
Jun 15, 2026
MaxGekk
approved these changes
Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Adds a "Versioning and Branch Policy" section to
AGENTS.md, plus adev/next_version_candidates.pyhelper, so coding agents pick the correct target version for a change (used for@sinceannotations, config.version("...")entries, newMimaExcludessections, etc.).The documented policy:
masterand the latest rolling maintenance branchbranch-<N>.x; only breaking / binary-incompatible changes and non-critical dependency upgrades go tomasteronly.branch-<N>.xfor a normally-backported change,masterfor a master-only one -- with-SNAPSHOTstripped. Deriving it frommaster's-SNAPSHOTalone is the common mistake.dev/next_version_candidates.py(no arguments) reports the two candidate versions mechanically, so the caller only has to apply that judgement:It reads from the configured
apache/sparkremote, selects the highestbranch-<N>.x(integer-compared, sobranch-10.x>branch-4.x), strips-SNAPSHOT, and reports facts only -- it does not decide which version applies. It uses only the Python standard library and fails with an actionable message if noapache/sparkremote is configured (rather than fetching full ref histories over the network into the working repo).Why are the changes needed?
AGENTS.mdguides coding agents working in this repo. Without this, an agent reading onlymaster(currently5.0.0-SNAPSHOT) would label a normally-backported change@since 5.0.0, when it actually ships first in thebranch-4.xrelease (4.3.0). The helper removes the manual, error-prone branch/version lookup while leaving the master-only judgement to the agent (or the user).Does this PR introduce any user-facing change?
No (developer tooling and documentation for coding agents only).
How was this patch tested?
dev/next_version_candidates.pyagainst the liveapache/sparkremote ->master 5.0.0,branch-4.x 4.3.0, matching thepom.xmlon each branch.branch-<N>.xselection end-to-end against a throwaway remote carryingbranch-4.x,branch-5.x, andbranch-10.x(picksbranch-10.x), plus non-rolling decoysbranch-4.2/branch-5.0(correctly ignored).apache/sparkremote it exits non-zero immediately with the "add a remote" message and makes no network call.black(the repo-pinned version, line length 100) reports the file unchanged.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)
This pull request and its description were written by Isaac.