Discussions
Avatar receives user_start/stop but never returns user_talking_message / reply – need help tracing missing STT step
5 days ago by Konrad Socha
We are building a web page that lets users speak to a HeyGen Interactive Avatar in real-time (voice chat).
Architecture:
- Browser –
@heygen/streaming-avatar
(v 2.0.13) - Flask backend proxy – forwards every
/v1/streaming.*
call, adds heartbeat, auth, etc. - HeyGen cloud
Manual avatar.speak({text, taskType: REPEAT})
works, video stream works, heartbeat ok.
The problem
When the user talks:
-
Browser fires
USER_START
thenUSER_STOP
(so VAD works) -
No
USER_TALKING_MESSAGE
,USER_END_MESSAGE
,AVATAR_*
events ever come back -
WS “streaming.chat” shows only
{"event_type":"user_start"} {"event_type":"user_stop"}
– no STT transcript, no error.
Attempted fixes in full app (still silent)
Tried | Result |
---|---|
Load protobufjs first (window.protobuf = … ) | ✔ no SDK crash, still silent |
Wait for audio WS OPEN (same logic as sandbox) | still silent |
sttSettings:{sampleRate:16000} in createStartAvatar | still silent |
useSilencePrompt:true in startVoiceChat | still silent |
Logged outgoing frames – we do see [FRAME] 512 bytes while speaking | frames are leaving browser |
Listened for stt_error , voice_error , error events – none received | no error from server |
Relevant code snippet (current prod page)
await avatar.startVoiceChat({isInputAudioMuted:false});
/* wait for WS open */
const ws = avatar.voiceChat._audioWebSocket;
await new Promise((res,rej)=>{
if (ws.readyState === WebSocket.OPEN) return res();
ws.addEventListener("open",res,{once:true});
ws.addEventListener("error",rej,{once:true});
});
/* log frames */
const oldSend = ws.send;
ws.send = d => { console.debug("[FRAME]", d.byteLength); return oldSend.call(ws,d); };
await avatar.startListening();
avatar.on("stt_error", e=>console.error("STT_ERR",e.detail));
avatar.on("voice_error",e=>console.error("VOICE_ERR",e.detail));
Logs from browser
[EV user_start] {event_type:"user_start"}
[FRAME] 512
[FRAME] 512
[EV user_stop] {event_type:"user_stop"}
(…no further events…)
Network › WS › streaming.chat only shows the two JSON lines above.
Backend proxy confirms request sequence
/v1/streaming.new 200
/v1/streaming.start 200
/v1/streaming.start_listening 200
No other streaming endpoints are hit after that.
Questions for the HeyGen team / community
- Are there circumstances where the server would ignore valid audio frames yet still send
user_start/stop
? - Is there an additional flag (account-level or per-session) required to enable STT?
- Does
startListening()
need to be re-issued after WS open, or should awaiting open + single call suffice? - Any known incompatibilities with proxying through
fetch("/api/heygen/proxy?path=...")
(body is unchanged JSON)?
Full session id of a failing run (2025-05-12 14:11 UTC): dbcc07a5-2ea8-11f0-8041-aafedb6f6c4d
Happy to provide full HAR / WS capture if needed.
Thanks for any insights!
– Mindhelp Chat Team