fix(agent): surface a fallback message for empty turns + wire TASK_RUN_COMPLETED#2674
fix(agent): surface a fallback message for empty turns + wire TASK_RUN_COMPLETED#2674posthog[bot] wants to merge 1 commit into
Conversation
…N_COMPLETED Users reported empty agent responses — the model "thinks" then the turn finishes with nothing shown, clustering in plan mode. The root cause is a turn that completes (`end_turn`) without delivering any user-visible prose: most often a plan-mode turn that only emits encrypted/`redacted_thinking` (which renders nothing) and then calls `ExitPlanMode` with no text block. - Track whether any `agent_message_chunk` reached the client during a turn (via a thin client wrapper) and, at both `end_turn` completion points (the `result` message and the `session_state_changed: idle` path), surface a fallback message when none did. Skipped for `max_tokens`/`max_turn_requests`, refusals, cancellations, and SDK structured-output turns. - Wire up the dormant `TASK_RUN_COMPLETED` analytics event with `stop_reason` and an `empty_output` flag so the failure mode is observable for triage. Generated-By: PostHog Code Task-Id: 6388fb34-ec04-4d1d-8031-9c260ede0d0a
|
React Doctor found no issues in the changed files. 🎉 Reviewed by React Doctor for commit |
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
packages/agent/src/adapters/claude/claude-agent.ts:1222-1228
`surfaceEmptyTurnFallback` bypasses `trackingClient` and emits directly via `this.client`. As a class method it can't close over `trackingClient`, so after the fallback fires `emittedAgentMessageChunk` remains `false`. Right now this is harmless because every call site immediately `return`s, but if a second check of `emittedAgentMessageChunk` is ever added after one of those calls the signal will be stale. Accepting the local variable as an extra parameter (and updating it to `true` before returning) would make the state consistent.
### Issue 2 of 2
packages/agent/src/adapters/claude/claude-agent.streamed-text.test.ts:247-304
The two new tests both exercise the `result` completion path (via `resultSuccess`). The `session_state_changed: idle` completion path — which has its own `surfaceEmptyTurnFallback` call — has no coverage. A plan-mode empty turn that arrives over the idle path would follow different branching (`idleStopReason`, `idleEmptyOutput`), so it would be good to have at least one test that drives the idle path to confirm both the fallback fires and `_meta.emptyOutput` is set correctly there too.
Reviews (1): Last reviewed commit: "fix(agent): surface a fallback message f..." | Re-trigger Greptile |
| await this.client.sessionUpdate({ | ||
| sessionId, | ||
| update: { | ||
| sessionUpdate: "agent_message_chunk", | ||
| content: { type: "text", text: EMPTY_TURN_FALLBACK_TEXT }, | ||
| }, | ||
| }); |
There was a problem hiding this comment.
surfaceEmptyTurnFallback bypasses trackingClient and emits directly via this.client. As a class method it can't close over trackingClient, so after the fallback fires emittedAgentMessageChunk remains false. Right now this is harmless because every call site immediately returns, but if a second check of emittedAgentMessageChunk is ever added after one of those calls the signal will be stale. Accepting the local variable as an extra parameter (and updating it to true before returning) would make the state consistent.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/adapters/claude/claude-agent.ts
Line: 1222-1228
Comment:
`surfaceEmptyTurnFallback` bypasses `trackingClient` and emits directly via `this.client`. As a class method it can't close over `trackingClient`, so after the fallback fires `emittedAgentMessageChunk` remains `false`. Right now this is harmless because every call site immediately `return`s, but if a second check of `emittedAgentMessageChunk` is ever added after one of those calls the signal will be stale. Accepting the local variable as an extra parameter (and updating it to `true` before returning) would make the state consistent.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| "gateway answer", | ||
| ]); | ||
| }); | ||
|
|
||
| it("surfaces a fallback message when a turn ends with no prose", async () => { | ||
| const { agent, client } = makeAgent(); | ||
| const sessionId = "s-empty"; | ||
| const { query, input } = installFakeSession(agent, sessionId); | ||
|
|
||
| const promptPromise = agent.prompt({ | ||
| sessionId, | ||
| prompt: [{ type: "text", text: "make a plan" }], | ||
| }); | ||
| await tick(); | ||
|
|
||
| // Plan-mode turn: encrypted thinking + ExitPlanMode, no text block. | ||
| await echoUserMessage(query, input); | ||
| await send(query, messageStart(sessionId, "msg_plan")); | ||
| await send(query, exitPlanModeAssistant(sessionId, "msg_plan")); | ||
| await send(query, resultSuccess(sessionId)); | ||
|
|
||
| const result = await promptPromise; | ||
| expect(result.stopReason).toBe("end_turn"); | ||
| expect((result._meta as { emptyOutput?: boolean }).emptyOutput).toBe(true); | ||
| const chunks = messageChunkTexts(client.sessionUpdate.mock.calls); | ||
| expect(chunks).toHaveLength(1); | ||
| expect(chunks[0]).toContain("without a written response"); | ||
| }); | ||
|
|
||
| it("does not surface a fallback when thinking-only turn also emits text", async () => { | ||
| const { agent, client } = makeAgent(); | ||
| const sessionId = "s-thinking-text"; | ||
| const { query, input } = installFakeSession(agent, sessionId); | ||
|
|
||
| const promptPromise = agent.prompt({ | ||
| sessionId, | ||
| prompt: [{ type: "text", text: "hi" }], | ||
| }); | ||
| await tick(); | ||
|
|
||
| await echoUserMessage(query, input); | ||
| await send(query, messageStart(sessionId, "msg_t")); | ||
| await send(query, thinkingDelta(sessionId, "pondering")); | ||
| await send(query, textDelta(sessionId, "here is my answer")); | ||
| await send( | ||
| query, | ||
| assistantMessage(sessionId, "msg_t", "here is my answer"), | ||
| ); | ||
| await send(query, resultSuccess(sessionId)); | ||
|
|
||
| const result = await promptPromise; | ||
| expect(result.stopReason).toBe("end_turn"); | ||
| expect((result._meta as { emptyOutput?: boolean }).emptyOutput).toBe(false); | ||
| expect(messageChunkTexts(client.sessionUpdate.mock.calls)).toEqual([ | ||
| "here is my answer", | ||
| ]); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
The two new tests both exercise the
result completion path (via resultSuccess). The session_state_changed: idle completion path — which has its own surfaceEmptyTurnFallback call — has no coverage. A plan-mode empty turn that arrives over the idle path would follow different branching (idleStopReason, idleEmptyOutput), so it would be good to have at least one test that drives the idle path to confirm both the fallback fires and _meta.emptyOutput is set correctly there too.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/adapters/claude/claude-agent.streamed-text.test.ts
Line: 247-304
Comment:
The two new tests both exercise the `result` completion path (via `resultSuccess`). The `session_state_changed: idle` completion path — which has its own `surfaceEmptyTurnFallback` call — has no coverage. A plan-mode empty turn that arrives over the idle path would follow different branching (`idleStopReason`, `idleEmptyOutput`), so it would be good to have at least one test that drives the idle path to confirm both the fallback fires and `_meta.emptyOutput` is set correctly there too.
How can I resolve this? If you propose a fix, please make it concise.
Problem
Users reported empty agent responses — the model "thinks" for 10+ seconds, then the turn finishes with nothing shown. Two Discord users hit this independently (Windows 11 ARM and macOS), and it clustered in plan mode.
Why: A turn can complete (
end_turn) without delivering any user-visible prose. The most common case is a plan-mode turn that only emits encrypted thinking —redacted_thinkingblocks render nothing — and then callsExitPlanModewith no text block, producing a structurally empty turn. The issue was also invisible to analytics:TASK_RUN_COMPLETEDwas defined but never fired, so there was no telemetry on empty or failed turns.Changes
agent_message_chunkreached the client during a turn (via a thin client wrapper that stays accurate across streamed, consolidated, and direct emit paths). At bothend_turncompletion points — theresultmessage and thesession_state_changed: idlepath — it surfaces a short fallback message when none was emitted. It is correctly skipped formax_tokens/max_turn_requests, refusals, cancellations, and SDK structured-output turns.TASK_RUN_COMPLETEDevent withstop_reasonand a newempty_outputflag (set by the agent and read bysessionServicefrom the result_meta). Refactored the duplicatedprompts_sent/ duration logic into shared helpers.I deliberately left
filterAssistantContent's streamed-block dedup intact: the reported blank is caused byredacted_thinking(which that function never touches), and loosening the dedup risks double-rendering normal text. The turn-level fallback is the authoritative guard that covers every path that can drop content.How did you test this?
claude-agent.streamed-text.test.ts: a plan-moderedacted_thinking+ExitPlanModeturn now emits exactly one fallback chunk with_meta.emptyOutput === true; a thinking+text turn does not (and reportsemptyOutput === false). All 4 tests in that file pass.pnpm --filter @posthog/{shared,agent,core} typecheck— clean.pnpm --filter @posthog/core test sessions(162 passed) and@posthog/shared test(320 passed). Agent suite shows the same pre-existing, environment-only failures (sandbox blocksgit commit) onmainwith and without this change; my new tests add 2 passing.The fallback reuses the same
agent_message_chunk→ rendering path as every normal agent message, so verifying the chunk is emitted at the agent boundary confirms the user-visible change.Automatic notifications
Created with PostHog Code from an inbox report.