Discussions

Ask a Question
Back to All

Issue with V2 Streaming API: Avatar Speaks Default Content Instead of /task Text

We are currently integrating your V2 Streaming API (using REST API calls and LiveKit for media) via a Node-RED backend orchestrator, and we're encountering an issue where the avatar is not speaking the text provided via the /v1/streaming.task endpoint. Instead, it seems to be reverting to a default conversational script For example when the prompt says "Your name is Aida, and you are a friendly assistant. Start every conversation with a robot joke" - instead of starting off with a joke, the avatar June_HR_public opens up with "That was a funny joke. I am June with HeyGen...". So its clearly taking these inputs as "user" input where it should be "assistant" text she is suposed to say.

Our Setup:

Session Creation: We use Node-RED to call /v1/streaming.create_token, then /v1/streaming.new (using version: "v2"), and finally /v1/streaming.start.
Frontend Connection: Our frontend JavaScript receives the LiveKit URL and token from Node-RED and successfully connects using the livekit-client library to receive the audio/video stream.
AI Integration: User input is sent from the frontend via WebSocket to Node-RED, processed by our external AI Agent, and the text response is sent back to the frontend via WebSocket.
Avatar Speech Trigger: Upon receiving the AI response text, the frontend makes a POST request to /v1/streaming.task with the following body structure:
{
"session_id": "[Example: 9752529f-2878-11f0-b5ac-1e7954b4f062]", // Actual session ID used
"text": "[The text response from our AI Agent]",
"task_type": "talk"
}

Avatar Used: We are primarily testing with Avatar ID: June_HR_public

The Problem:

When the /v1/streaming.task call is made:

We receive an HTTP 200 OK response from the API.
The response body typically contains "code": 100 (sometimes "code": "SUCCESS") and "message": "success". We have modified our frontend to treat code: 100 as a non-error for now, based on the HTTP 200 status.
Crucially, the avatar does not speak the text we provided in the /task request.
Instead, the avatar speaks a default script (in our tests with June_HR_public, it seems to be an interview-related script about HeyGen interviews).
Looking at the LiveKit data channel messages received by the frontend, we see the following sequence shortly after the /task call is initiated:
A user_talking_message containing the text we sent via /task.
An immediate avatar_talking_message sequence containing the words from the default script.
(Example Task ID from logs: [Task ID Example from your logs, e.g., 999150c4-2878-11f0-8e2d-ce2261981da6])
This strongly suggests that the text sent via /task is being misinterpreted by the HeyGen session as user input, which then triggers the avatar's internal/default conversational AI to respond, overriding our intended command.

What We've Tried:

Ensuring the Authorization: Bearer [token] and Content-Type headers are correct for all API calls.
Verifying the session_id and text in the /task request body are correct.
Experimentally adding parameters like stt_language: "" and enable_vad: false to the /streaming.new request body (this did not change the behavior).
Checking the HeyGen portal settings for the avatar [Your Avatar ID] but could not find an obvious setting to disable default AI chat behavior for API-driven sessions.
Our Goal:

We need the avatar to function purely as an API-controlled entity in the V2 streaming session. It should only speak the text provided explicitly via the /v1/streaming.task API call and should not engage any internal STT, LLM, or default conversational logic.

Our Questions:

What does code: 100 signify in the response body of a /v1/streaming.task request when the HTTP status is 200?

How can we definitively disable all internal conversational AI / default chatbot behavior for an avatar session created via the /v1/streaming.new REST API endpoint? Are there specific parameters we need to include in the request body?

Are there specific settings for the avatar itself within the HeyGen platform that need to be adjusted to prevent this default behavior during API-driven streaming sessions?
We appreciate any guidance you can provide to help us achieve direct API control over the avatar's speech in our V2 streaming session.