feat: optimize duplicate-code-detector workflow token usage (~50% reduction)#5025
Conversation
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the duplicate-code-detector workflow’s token usage by moving expensive “discovery” work into pre-agent steps and reducing the size of inputs provided to the agent, with corresponding lockfile and test updates.
Changes:
- Pre-computes “existing duplicate issues” via
gh issue listbefore the agent phase and updates the prompt to forbid MCP calls for that phase. - Truncates jscpd context by generating
/tmp/gh-aw/jscpd-top.json(top 15 duplicates) and updates the prompt to use it instead of the full report. - Reduces the allowed agent turn budget (≤10 → ≤7) and updates the compiled lock workflow + workflow test expectations.
Show a summary per file
| File | Description |
|---|---|
| scripts/ci/duplicate-code-detector-workflow.test.ts | Updates assertions for new pre-step, new jscpd top file, and tighter turn budget/prompt text. |
| .github/workflows/duplicate-code-detector.md | Adds pre-steps to summarize jscpd output and pre-fetch existing issues; updates prompt to consume smaller precomputed artifacts and forbid MCP calls in Phase 5. |
| .github/workflows/duplicate-code-detector.lock.yml | Compiled workflow reflecting the new pre-steps and prompt constraints. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 2
| gh issue list \ | ||
| --repo "${{ github.repository }}" \ | ||
| --search "\"[Duplicate Code]\" in:title" \ | ||
| --state all --limit 50 \ | ||
| --json number,title,state,stateReason \ | ||
| > /tmp/gh-aw/existing-issues.json | ||
| echo "=== Existing [Duplicate Code] issues ===" >> /tmp/gh-aw/existing-issues.json | ||
| jq -r '.[] | "#\(.number) [\(.state)/\(.stateReason // "none")]: \(.title)"' \ | ||
| /tmp/gh-aw/existing-issues.json || true |
| gh issue list \ | ||
| --repo "$GH_AW_GITHUB_REPOSITORY" \ | ||
| --search "\"[Duplicate Code]\" in:title" \ | ||
| --state all --limit 50 \ | ||
| --json number,title,state,stateReason \ | ||
| > /tmp/gh-aw/existing-issues.json | ||
| echo "=== Existing [Duplicate Code] issues ===" >> /tmp/gh-aw/existing-issues.json | ||
| jq -r '.[] | "#\(.number) [\(.state)/\(.stateReason // "none")]: \(.title)"' \ | ||
| /tmp/gh-aw/existing-issues.json || true |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Print the header to stdout instead of appending to the JSON file, which corrupted the file and caused jq to fail (masked by || true). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔥 Smoke Test: Copilot PAT Auth — PASS
PR: feat: optimize duplicate-code-detector workflow token usage (~50% reduction) Overall: PASS ✅
|
Copilot BYOK Smoke Test Results✅ GitHub MCP Testing - 2 recent closed PRs fetched successfully Status: PASS
|
🔬 Smoke Test Results
Overall: FAIL PR: feat: optimize duplicate-code-detector workflow token usage (~50% reduction)
|
|
Smoke test results:
Overall: FAIL
|
Smoke Test: GitHub Actions Services Connectivity
Overall: FAIL
|
|
Gemini Smoke Test Results:
Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
|
Smoke Test Results:
Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra Overall: FAIL cc
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "www.google.com"See Network Configuration for more information.
|
|
|
The
duplicate-code-detectorworkflow was consuming ~703K tokens/run across 16 LLM turns, driven by live GitHub API issue-search calls inside the agent and reading the full jscpd JSON report mid-session.Changes
Pre-compute existing issue check (
steps:block)Check existing duplicate issuesstep runsgh issue listbefore the agent starts, writing results to/tmp/gh-aw/existing-issues.jsoncatthe pre-computed file; explicitly forbids MCP calls for this phaseTruncate jscpd output to top 15 findings
Run jscpdstep now pipesjscpd-report.jsonthroughjqto extract only the top 15 duplicates by line count intojscpd-top.jsonjscpd-top.jsoninstead of the unbounded full reportTighten turn budget
≤10to≤7(realistic with phases 1–4 now pre-computed)Lock file and tests
gh aw compile(compiler auto-extracted${{ github.repository }}into env var to prevent shell injection)Projected outcome: ~320–370K tokens/run (−47–55%), 7–8 LLM turns (−8–9 turns), ~$170–195 AIC/run (down from $374.79).