Summary
claude-opus agent turns hang for the full harness wall-clock cap (eval: X_AGENT_STILL_RESPONDING, 0 output) when a single /v1/messages (Anthropic Messages API) streaming response stalls — the upstream/proxy returns HTTP 200 headers but never streams a terminating body and never closes the socket, and the client's SSE read loop in processResponseFromMessagesEndpoint has no idle timeout, so the turn never resolves or rejects. Found via terminalbench2 eval runs where it was the dominant error mode (~18 of 30 errored instances across 3 vscode-copilot-cli runs). The same unguarded read loop exists in the OpenAI Responses API path, so this is not Anthropic-specific in principle.
Where it hangs
extensions/copilot/src/platform/endpoint/node/messagesApi.ts → processResponseFromMessagesEndpoint:
for await (const chunk of response.body) {
parser.feed(chunk);
}
No inactivity guard. When the body delivers 200 headers but no SSE bytes (and never ends), this loop blocks forever: no message_stop is parsed, the AsyncIterableObject never calls feed.emitOne(...) or feed.reject(...), and the ToolCallingLoop awaiting the turn hangs until an outer cap (the eval harness's 60-min limit; in product, effectively indefinitely / until user cancel) kills it.
The identical pattern is in extensions/copilot/src/platform/endpoint/node/responsesApi.ts → processResponseFromChatEndpoint (line ~907), so the OpenAI Responses path shares the vulnerability.
Evidence (from eval GitHub Copilot Chat.log)
A healthy turn logs [messagesAPI] message 0 returned. finish reason:[stop] → ccreq …| success | claude-opus… → (final turn) [ToolCallingLoop] Stop hook result: shouldContinue=false → clean exit.
A hung turn:
| Instance |
Signature |
Evidence |
| circuit-fibsqrt |
3 turns OK, 4th /v1/messages never returns |
last ccreq …success…[stop] @ 20:50:54; capi-proxy sends req [10] @ 20:50:57 with no matching [messagesAPI] message returned / ccreq; silence → OTel flush 21:50:42 (60-min cap) |
| regex-chess |
2 turns OK then hang |
last return 21:08:27 → cap flush 22:08:14 |
| train-fasttext |
36 turns OK then hang |
last return 22:30:41 → flush 22:36:46 (hung at min ~54) |
| winning-avg-corewars |
60 turns OK then hang |
last return 22:40:18 → flush 22:42:54; one completed turn took 133,191ms |
The hang strikes an intermittent turn (2nd to 61st), confirming it is a per-turn streaming stall, not a prompt/content issue.
Affected eval runs (terminalbench2, vscode-copilot-cli)
| Run |
Model |
Resolved / Failed / Err |
timeout-hangs |
| 27478622081 |
opus-4.6 |
48 / 28 / 13 |
9 |
| 27511500953 |
opus-4.6 |
52 / 27 / 10 |
6 |
| 27511511587 |
opus-4.7 |
60 / 23 / 5 |
3 |
Same instances PASS on the headless Copilot CLI control runs (different harness): 27479099364 (0 err), 27512067473 (1 err).
Why GPT-5.x was not hit
GPT-5.x routes through useResponsesApi → processResponseFromChatEndpoint (OpenAI Responses upstream), which is selected before the Messages path; only models advertising Messages in supported_endpoints (Anthropic) hit processResponseFromMessagesEndpoint. The streaming stall in these runs occurred on the Anthropic /v1/messages upstream, which OpenAI models don't use. This is routing luck, not structural immunity — the Responses path has the same unguarded read loop and would hang identically if its upstream stalled the same way.
Ruled out
- OTel consent modal — present on resolved instances too; a universal end-of-run screenshot artifact, not the cause.
- "Retry storm" / repeated
x-request-id — that id is a per-session id, constant across all /v1/messages calls in a session; resolved instances repeat it 28–37× and pass.
- HMAC 401 near end-of-hang — a background model-metadata refetch firing ~55 min after the wedge; effect, not cause.
Proposed fix
- Add a streaming idle timeout to both
processResponseFromMessagesEndpoint (messagesApi.ts) and processResponseFromChatEndpoint (responsesApi.ts). Arm a watchdog that resets on each chunk; if no chunk arrives within ~120s, response.body.destroy(new Error('stream idle timeout')) so the for await throws and feed.reject(...) fails the turn fast, letting the existing fetch/retry path recover instead of hanging. Also feed.reject(...) if the stream ends without a message_stop/completion having been emitted.
- Wire the request
CancellationToken through — processResponseFromChatEndpoint and processResponseFromMessagesEndpoint are currently called without the cancellationToken that defaultChatResponseProcessor receives; pass it so user/turn cancellation can abort a stuck body read, and add a per-turn wall-clock deadline to catch dribbling keep-alive streams that never terminate.
- Upstream/proxy — investigate why streaming responses can return 200 headers and then never stream a body or close; the client timeout is the safety net, not the root remedy.
Links
- Code:
extensions/copilot/src/platform/endpoint/node/messagesApi.ts (processResponseFromMessagesEndpoint); extensions/copilot/src/platform/endpoint/node/responsesApi.ts (processResponseFromChatEndpoint)
- Eval analysis cache:
~/.agent/eval-cache/opus46-cli-compare/timeout-rootcause.md
- Related (eval-side, supersede — modal claim disproven): microsoft/vscode-copilot-evaluation#4863
Summary
claude-opus agent turns hang for the full harness wall-clock cap (eval:
X_AGENT_STILL_RESPONDING, 0 output) when a single/v1/messages(Anthropic Messages API) streaming response stalls — the upstream/proxy returns HTTP 200 headers but never streams a terminating body and never closes the socket, and the client's SSE read loop inprocessResponseFromMessagesEndpointhas no idle timeout, so the turn never resolves or rejects. Found via terminalbench2 eval runs where it was the dominant error mode (~18 of 30 errored instances across 3vscode-copilot-cliruns). The same unguarded read loop exists in the OpenAI Responses API path, so this is not Anthropic-specific in principle.Where it hangs
extensions/copilot/src/platform/endpoint/node/messagesApi.ts→processResponseFromMessagesEndpoint:No inactivity guard. When the body delivers 200 headers but no SSE bytes (and never ends), this loop blocks forever: no
message_stopis parsed, theAsyncIterableObjectnever callsfeed.emitOne(...)orfeed.reject(...), and theToolCallingLoopawaiting the turn hangs until an outer cap (the eval harness's 60-min limit; in product, effectively indefinitely / until user cancel) kills it.The identical pattern is in
extensions/copilot/src/platform/endpoint/node/responsesApi.ts→processResponseFromChatEndpoint(line ~907), so the OpenAI Responses path shares the vulnerability.Evidence (from eval
GitHub Copilot Chat.log)A healthy turn logs
[messagesAPI] message 0 returned. finish reason:[stop]→ccreq …| success | claude-opus…→ (final turn)[ToolCallingLoop] Stop hook result: shouldContinue=false→ clean exit.A hung turn:
/v1/messagesnever returnsccreq …success…[stop]@20:50:54; capi-proxy sends req[10]@20:50:57with no matching[messagesAPI] message returned/ccreq; silence → OTel flush21:50:42(60-min cap)21:08:27→ cap flush22:08:1422:30:41→ flush22:36:46(hung at min ~54)22:40:18→ flush22:42:54; one completed turn took133,191msThe hang strikes an intermittent turn (2nd to 61st), confirming it is a per-turn streaming stall, not a prompt/content issue.
Affected eval runs (terminalbench2, vscode-copilot-cli)
Same instances
PASSon the headless Copilot CLI control runs (different harness): 27479099364 (0 err), 27512067473 (1 err).Why GPT-5.x was not hit
GPT-5.x routes through
useResponsesApi→processResponseFromChatEndpoint(OpenAI Responses upstream), which is selected before the Messages path; only models advertisingMessagesinsupported_endpoints(Anthropic) hitprocessResponseFromMessagesEndpoint. The streaming stall in these runs occurred on the Anthropic/v1/messagesupstream, which OpenAI models don't use. This is routing luck, not structural immunity — the Responses path has the same unguarded read loop and would hang identically if its upstream stalled the same way.Ruled out
x-request-id— that id is a per-session id, constant across all/v1/messagescalls in a session; resolved instances repeat it 28–37× and pass.Proposed fix
processResponseFromMessagesEndpoint(messagesApi.ts) andprocessResponseFromChatEndpoint(responsesApi.ts). Arm a watchdog that resets on each chunk; if no chunk arrives within ~120s,response.body.destroy(new Error('stream idle timeout'))so thefor awaitthrows andfeed.reject(...)fails the turn fast, letting the existing fetch/retry path recover instead of hanging. Alsofeed.reject(...)if the stream ends without amessage_stop/completion having been emitted.CancellationTokenthrough —processResponseFromChatEndpointandprocessResponseFromMessagesEndpointare currently called without thecancellationTokenthatdefaultChatResponseProcessorreceives; pass it so user/turn cancellation can abort a stuck body read, and add a per-turn wall-clock deadline to catch dribbling keep-alive streams that never terminate.Links
extensions/copilot/src/platform/endpoint/node/messagesApi.ts(processResponseFromMessagesEndpoint);extensions/copilot/src/platform/endpoint/node/responsesApi.ts(processResponseFromChatEndpoint)~/.agent/eval-cache/opus46-cli-compare/timeout-rootcause.md