Skip to content

feat: optimize duplicate-code-detector workflow token usage (~50% reduction)#5025

Merged
lpcox merged 3 commits into
mainfrom
copilot/optimize-copilot-token-usage
Jun 15, 2026
Merged

feat: optimize duplicate-code-detector workflow token usage (~50% reduction)#5025
lpcox merged 3 commits into
mainfrom
copilot/optimize-copilot-token-usage

Conversation

Copilot AI commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

The duplicate-code-detector workflow was consuming ~703K tokens/run across 16 LLM turns, driven by live GitHub API issue-search calls inside the agent and reading the full jscpd JSON report mid-session.

Changes

Pre-compute existing issue check (steps: block)

  • New Check existing duplicate issues step runs gh issue list before the agent starts, writing results to /tmp/gh-aw/existing-issues.json
  • Eliminates 4–6 agent turns of GitHub MCP tool calls
  • Phase 5 prompt updated to cat the pre-computed file; explicitly forbids MCP calls for this phase

Truncate jscpd output to top 15 findings

  • Run jscpd step now pipes jscpd-report.json through jq to extract only the top 15 duplicates by line count into jscpd-top.json
  • Prompt updated to reference jscpd-top.json instead of the unbounded full report

Tighten turn budget

  • Turn limit reduced from ≤10 to ≤7 (realistic with phases 1–4 now pre-computed)
  • Removed the now-redundant "Skip directly to Phase 5 and Phase 6" instruction

Lock file and tests

  • Lock file recompiled via gh aw compile (compiler auto-extracted ${{ github.repository }} into env var to prevent shell injection)
  • Test assertions updated to match new step names, prompt text, and turn budget

Projected outcome: ~320–370K tokens/run (−47–55%), 7–8 LLM turns (−8–9 turns), ~$170–195 AIC/run (down from $374.79).

Copilot AI changed the title [WIP] Optimize token usage for duplicate code detector feat: optimize duplicate-code-detector workflow token usage (~50% reduction) Jun 15, 2026
Copilot finished work on behalf of lpcox June 15, 2026 13:55
Copilot AI requested a review from lpcox June 15, 2026 13:55
@lpcox lpcox marked this pull request as ready for review June 15, 2026 14:36
Copilot AI review requested due to automatic review settings June 15, 2026 14:36
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 96.86% 96.90% 📈 +0.04%
Statements 96.73% 96.77% 📈 +0.04%
Functions 98.81% 98.81% ➡️ +0.00%
Branches 91.24% 91.27% 📈 +0.03%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/workdir-setup.ts 92.6% → 94.4% (+1.85%) 92.6% → 94.4% (+1.85%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the duplicate-code-detector workflow’s token usage by moving expensive “discovery” work into pre-agent steps and reducing the size of inputs provided to the agent, with corresponding lockfile and test updates.

Changes:

  • Pre-computes “existing duplicate issues” via gh issue list before the agent phase and updates the prompt to forbid MCP calls for that phase.
  • Truncates jscpd context by generating /tmp/gh-aw/jscpd-top.json (top 15 duplicates) and updates the prompt to use it instead of the full report.
  • Reduces the allowed agent turn budget (≤10 → ≤7) and updates the compiled lock workflow + workflow test expectations.
Show a summary per file
File Description
scripts/ci/duplicate-code-detector-workflow.test.ts Updates assertions for new pre-step, new jscpd top file, and tighter turn budget/prompt text.
.github/workflows/duplicate-code-detector.md Adds pre-steps to summarize jscpd output and pre-fetch existing issues; updates prompt to consume smaller precomputed artifacts and forbid MCP calls in Phase 5.
.github/workflows/duplicate-code-detector.lock.yml Compiled workflow reflecting the new pre-steps and prompt constraints.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment on lines +86 to +94
gh issue list \
--repo "${{ github.repository }}" \
--search "\"[Duplicate Code]\" in:title" \
--state all --limit 50 \
--json number,title,state,stateReason \
> /tmp/gh-aw/existing-issues.json
echo "=== Existing [Duplicate Code] issues ===" >> /tmp/gh-aw/existing-issues.json
jq -r '.[] | "#\(.number) [\(.state)/\(.stateReason // "none")]: \(.title)"' \
/tmp/gh-aw/existing-issues.json || true
Comment on lines +451 to +459
gh issue list \
--repo "$GH_AW_GITHUB_REPOSITORY" \
--search "\"[Duplicate Code]\" in:title" \
--state all --limit 50 \
--json number,title,state,stateReason \
> /tmp/gh-aw/existing-issues.json
echo "=== Existing [Duplicate Code] issues ===" >> /tmp/gh-aw/existing-issues.json
jq -r '.[] | "#\(.number) [\(.state)/\(.stateReason // "none")]: \(.title)"' \
/tmp/gh-aw/existing-issues.json || true
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Print the header to stdout instead of appending to the JSON file,
which corrupted the file and caused jq to fail (masked by || true).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test: Copilot PAT Auth — PASS

Test Result
GitHub MCP connectivity
GitHub.com HTTP connectivity
File write/read

PR: feat: optimize duplicate-code-detector workflow token usage (~50% reduction)
Author: @Copilot | Assignees: @lpcox @Copilot
Auth mode: PAT (COPILOT_GITHUB_TOKEN)

Overall: PASS

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

Copilot BYOK Smoke Test Results

GitHub MCP Testing - 2 recent closed PRs fetched successfully
GitHub.com Connectivity - Test file created (HTTP 200)
File Write/Read - /tmp/gh-aw/agent/smoke-test-copilot-byok.txt verified
BYOK Inference - Direct BYOK mode active (COPILOT_PROVIDER_API_KEY → api-proxy → api.githubcopilot.com)

Status: PASS
Running in direct BYOK mode via api-proxy sidecar. @lpcox

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Smoke Test Results

Test Status
GitHub MCP connectivity ✅ PASS
GitHub.com HTTP ✅ PASS (200)
File write/read ❌ FAIL (pre-step template vars not expanded)

Overall: FAIL

PR: feat: optimize duplicate-code-detector workflow token usage (~50% reduction)
Author: @Copilot · Assignees: @lpcox @Copilot

⚠️ Pre-computed test data (SMOKE_FILE_PATH, SMOKE_HTTP_CODE, etc.) was not substituted — workflow step output passing may need investigation.

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

Smoke test results:

  • fix(api-proxy): add embedding model pricing to resolve unknown model rejection
  • Refactor OTEL test module-reload helper into shared utility
  • smoke-claude: token optimization — precompute result, restrict bash tools, minimize prompt
  • [Test Coverage] squid ACL security modules (acl-generator, access-rules, domain-acl)
  • GitHub CLI / safeinputs lookup ❌
  • Playwright / file write / bash / discussion ✅
  • Build ❌ (/usr/bin/env: node: No such file or directory)

Overall: FAIL

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: GitHub Actions Services Connectivity

Check Result
Redis PING (host.docker.internal:6379) ❌ Timeout
PostgreSQL pg_isready (host.docker.internal:5432) ❌ No response
PostgreSQL SELECT 1 ❌ Timeout

host.docker.internal resolves to 172.17.0.1, but connections to both ports 6379 and 5432 timed out — services appear unreachable from this runner.

Overall: FAIL

🔌 Service connectivity validated by Smoke Services

@github-actions

Copy link
Copy Markdown
Contributor

Gemini Smoke Test Results:

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results:

  • Remove unused export: CopilotModelValidationResult — ✅
  • GitHub.com connectivity — ✅
  • File I/O test — ❌
  • BYOK inference — ✅

Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra

Overall: FAIL

cc @lpcox @Copilot

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color passed ✅ PASS
Go env passed ✅ PASS
Go uuid passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "www.google.com"

See Network Configuration for more information.

Generated by Build Test Suite for issue #5025 ·

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox

  • GitHub MCP connectivity: ✅
  • GitHub.com connectivity: ✅
  • File I/O: ✅
  • BYOK inference: ✅
    Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)
    Overall: PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@lpcox lpcox merged commit de54ccb into main Jun 15, 2026
75 of 77 checks passed
@lpcox lpcox deleted the copilot/optimize-copilot-token-usage branch June 15, 2026 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants