HeyGen Streaming Avatar SDK: 401 Error & High Latency Issues

Questions:

Authentication: Why does the token fail despite being freshly generated?
Streaming: Is the ~9s latency expected?
Speech Processing: Shouldn't the avatar start speaking as soon as the first chunk arrives?
Implementation: Are we missing something in our streaming setup?

Any insights would be greatly appreciated!

SDK: @heygen/streaming-avatar ^2.0.8
Framework: Next.js 14.1.0
Browser: Chrome (latest)

Issue 1: Authentication (401 Error)

Even after successfully generating a token via /v1/streaming.create_token, we immediately get a 401 error when using it with createStartAvatar().

Steps:

Token request succeeds → Response: 200 OK
Using token with createStartAvatar() → Response: 401 Unauthorized

→ Question: Why is the token rejected immediately after generation?

Issue 2: High Latency in Streaming

Even when authentication works, the avatar speech is significantly delayed.

Observed Behavior:

GPT response is streamed immediately → Chunks arrive quickly.
Avatar starts speaking only after ALL chunks are processed (~9s delay).
Expected behavior? Shouldn't the avatar start speaking as soon as the first chunk arrives?

Logs & Code:

// Streaming implementation
const reader = chatRes.body?.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  console.log(`[Stream] Chunk received at +${Date.now() - startTime}ms:`, chunk);
  await processStreamedText(chunk);
}

// Logs:
[Stream] Chunk #1 received at +26ms: "First sentence"
[Stream] Chunk #2 received at +52ms: "Second sentence"
// More chunks arrive...
// Avatar starts speaking only at +9000ms