Avatar receives user_start/stop but never returns user_talking_message / reply

We are building a web page that lets users speak to a HeyGen Interactive Avatar in real-time (voice chat).
Architecture:

Browser – @heygen/streaming-avatar (v 2.0.13)
Flask backend proxy – forwards every /v1/streaming.* call, adds heartbeat, auth, etc.
HeyGen cloud

Manual avatar.speak({text, taskType: REPEAT}) works, video stream works, heartbeat ok.

The problem

When the user talks:

Browser fires USER_START then USER_STOP (so VAD works)
No USER_TALKING_MESSAGE, USER_END_MESSAGE, AVATAR_* events ever come back
WS “streaming.chat” shows only
```
{"event_type":"user_start"}
{"event_type":"user_stop"}
```
– no STT transcript, no error.

Attempted fixes in full app (still silent)

Tried	Result
Load protobufjs first (`window.protobuf = …`)	✔ no SDK crash, still silent
Wait for audio WS OPEN (same logic as sandbox)	still silent
`sttSettings:{sampleRate:16000}` in `createStartAvatar`	still silent
`useSilencePrompt:true` in `startVoiceChat`	still silent
Logged outgoing frames – we do see `[FRAME] 512 bytes` while speaking	frames are leaving browser
Listened for `stt_error`, `voice_error`, `error` events – none received	no error from server

Relevant code snippet (current prod page)

await avatar.startVoiceChat({isInputAudioMuted:false});

/* wait for WS open */
const ws = avatar.voiceChat._audioWebSocket;
await new Promise((res,rej)=>{
  if (ws.readyState === WebSocket.OPEN) return res();
  ws.addEventListener("open",res,{once:true});
  ws.addEventListener("error",rej,{once:true});
});

/* log frames */
const oldSend = ws.send;
ws.send = d => { console.debug("[FRAME]", d.byteLength); return oldSend.call(ws,d); };

await avatar.startListening();

avatar.on("stt_error",  e=>console.error("STT_ERR",e.detail));
avatar.on("voice_error",e=>console.error("VOICE_ERR",e.detail));

Logs from browser

[EV user_start] {event_type:"user_start"}
[FRAME] 512
[FRAME] 512
[EV user_stop] {event_type:"user_stop"}
(…no further events…)

Network › WS › streaming.chat only shows the two JSON lines above.

Backend proxy confirms request sequence

/v1/streaming.new        200
/v1/streaming.start       200
/v1/streaming.start_listening 200

No other streaming endpoints are hit after that.

Questions for the HeyGen team / community

Are there circumstances where the server would ignore valid audio frames yet still send user_start/stop?
Is there an additional flag (account-level or per-session) required to enable STT?
Does startListening() need to be re-issued after WS open, or should awaiting open + single call suffice?
Any known incompatibilities with proxying through fetch("/api/heygen/proxy?path=...") (body is unchanged JSON)?

Full session id of a failing run (2025-05-12 14:11 UTC): dbcc07a5-2ea8-11f0-8041-aafedb6f6c4d

Happy to provide full HAR / WS capture if needed.
Thanks for any insights!

– Mindhelp Chat Team

Discussions