Skip to content

Agent turn hangs indefinitely when a streaming response stalls (no idle timeout in messagesApi/responsesApi SSE read loop) #321432

@meganrogge

Description

@meganrogge

Summary

claude-opus agent turns hang for the full harness wall-clock cap (eval: X_AGENT_STILL_RESPONDING, 0 output) when a single /v1/messages (Anthropic Messages API) streaming response stalls — the upstream/proxy returns HTTP 200 headers but never streams a terminating body and never closes the socket, and the client's SSE read loop in processResponseFromMessagesEndpoint has no idle timeout, so the turn never resolves or rejects. Found via terminalbench2 eval runs where it was the dominant error mode (~18 of 30 errored instances across 3 vscode-copilot-cli runs). The same unguarded read loop exists in the OpenAI Responses API path, so this is not Anthropic-specific in principle.

Where it hangs

extensions/copilot/src/platform/endpoint/node/messagesApi.tsprocessResponseFromMessagesEndpoint:

for await (const chunk of response.body) {
    parser.feed(chunk);
}

No inactivity guard. When the body delivers 200 headers but no SSE bytes (and never ends), this loop blocks forever: no message_stop is parsed, the AsyncIterableObject never calls feed.emitOne(...) or feed.reject(...), and the ToolCallingLoop awaiting the turn hangs until an outer cap (the eval harness's 60-min limit; in product, effectively indefinitely / until user cancel) kills it.

The identical pattern is in extensions/copilot/src/platform/endpoint/node/responsesApi.tsprocessResponseFromChatEndpoint (line ~907), so the OpenAI Responses path shares the vulnerability.

Evidence (from eval GitHub Copilot Chat.log)

A healthy turn logs [messagesAPI] message 0 returned. finish reason:[stop]ccreq …| success | claude-opus… → (final turn) [ToolCallingLoop] Stop hook result: shouldContinue=false → clean exit.

A hung turn:

Instance Signature Evidence
circuit-fibsqrt 3 turns OK, 4th /v1/messages never returns last ccreq …success…[stop] @ 20:50:54; capi-proxy sends req [10] @ 20:50:57 with no matching [messagesAPI] message returned / ccreq; silence → OTel flush 21:50:42 (60-min cap)
regex-chess 2 turns OK then hang last return 21:08:27 → cap flush 22:08:14
train-fasttext 36 turns OK then hang last return 22:30:41 → flush 22:36:46 (hung at min ~54)
winning-avg-corewars 60 turns OK then hang last return 22:40:18 → flush 22:42:54; one completed turn took 133,191ms

The hang strikes an intermittent turn (2nd to 61st), confirming it is a per-turn streaming stall, not a prompt/content issue.

Affected eval runs (terminalbench2, vscode-copilot-cli)

Run Model Resolved / Failed / Err timeout-hangs
27478622081 opus-4.6 48 / 28 / 13 9
27511500953 opus-4.6 52 / 27 / 10 6
27511511587 opus-4.7 60 / 23 / 5 3

Same instances PASS on the headless Copilot CLI control runs (different harness): 27479099364 (0 err), 27512067473 (1 err).

Why GPT-5.x was not hit

GPT-5.x routes through useResponsesApiprocessResponseFromChatEndpoint (OpenAI Responses upstream), which is selected before the Messages path; only models advertising Messages in supported_endpoints (Anthropic) hit processResponseFromMessagesEndpoint. The streaming stall in these runs occurred on the Anthropic /v1/messages upstream, which OpenAI models don't use. This is routing luck, not structural immunity — the Responses path has the same unguarded read loop and would hang identically if its upstream stalled the same way.

Ruled out

  • OTel consent modal — present on resolved instances too; a universal end-of-run screenshot artifact, not the cause.
  • "Retry storm" / repeated x-request-id — that id is a per-session id, constant across all /v1/messages calls in a session; resolved instances repeat it 28–37× and pass.
  • HMAC 401 near end-of-hang — a background model-metadata refetch firing ~55 min after the wedge; effect, not cause.

Proposed fix

  1. Add a streaming idle timeout to both processResponseFromMessagesEndpoint (messagesApi.ts) and processResponseFromChatEndpoint (responsesApi.ts). Arm a watchdog that resets on each chunk; if no chunk arrives within ~120s, response.body.destroy(new Error('stream idle timeout')) so the for await throws and feed.reject(...) fails the turn fast, letting the existing fetch/retry path recover instead of hanging. Also feed.reject(...) if the stream ends without a message_stop/completion having been emitted.
  2. Wire the request CancellationToken throughprocessResponseFromChatEndpoint and processResponseFromMessagesEndpoint are currently called without the cancellationToken that defaultChatResponseProcessor receives; pass it so user/turn cancellation can abort a stuck body read, and add a per-turn wall-clock deadline to catch dribbling keep-alive streams that never terminate.
  3. Upstream/proxy — investigate why streaming responses can return 200 headers and then never stream a body or close; the client timeout is the safety net, not the root remedy.

Links

  • Code: extensions/copilot/src/platform/endpoint/node/messagesApi.ts (processResponseFromMessagesEndpoint); extensions/copilot/src/platform/endpoint/node/responsesApi.ts (processResponseFromChatEndpoint)
  • Eval analysis cache: ~/.agent/eval-cache/opus46-cli-compare/timeout-rootcause.md
  • Related (eval-side, supersede — modal claim disproven): microsoft/vscode-copilot-evaluation#4863

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions