(Alpha) Interactive Avatar Realtime API
This API is currently in alpha. Still under development. Breaking changes can happen. We can't provide support at this time.
HeyGen’s Interactive Avatar Realtime API enables you to build a low latency, visual conversation experience for your customers. You bring your speech to speech stack (we’re starting with support for pipecat), we’ll provide a human like visual agent and end user video delivery.
Getting Started
- NextJS Demo (Frontend): https://github.com/HeyGen-Official/InteractiveAvatarNextJSDemo/tree/realtime-alpha-demo
- Pipecat Demo (Backend): https://github.com/HeyGen-Official/pipecat-realtime-demo
Architecture Diagram
Sequence Diagram:
Endpoint
The Interactive Avatar Realtime API is a stateful, server-side, event-driven WebSocket API. The WebSocket connection details are provided at session initialization.
-
wss://webrtc-signaling.heygen.io/v2-alpha/interactive-avatar/session/<session_id>
- The WebSocket server address is assigned at the start of each new session and returned in the
realtime_endpoint
field from the/v1/streaming.new
API call.
Event-Based Communication
The WebSocket API follows an event-driven model. Each message is formatted as JSON with the following base structure:
{
"type": "<event_type>",
"event_id": "<event_id>"
}
Client Events (Sent to Server)
agent.audio_buffer_append
agent.audio_buffer_append
Appends audio data to the avatar's buffer.
{
"type": "agent.audio_buffer_append",
"event_id": "<event_id>",
"audio": "<base64_encoded_PCM_16bit_24khz_audio>"
}
Note: Limit audio segments to 1-4 seconds for optimal performance.
agent.audio_buffer_commit
agent.audio_buffer_commit
Commits buffered audio for immediate processing.
{
"type": "agent.audio_buffer_commit",
"event_id": "<event_id>",
"audio": "<base64_encoded_PCM_16bit_24khz_audio>"
}
agent.audio_buffer_clear
agent.audio_buffer_clear
Clears all buffered audio data.
{
"type": "agent.audio_buffer_clear",
"event_id": "<event_id>"
}
agent.interrupt
agent.interrupt
Stops the avatar’s current task and resets it to an idle animation.
{
"type": "agent.interrupt",
"event_id": "<event_id>"
}
agent.start_listening
agent.start_listening
Triggers the avatar's listening animation (only if currently idle).
{
"type": "agent.start_listening",
"event_id": "<event_id>"
}
agent.stop_listening
agent.stop_listening
Stops the listening animation (only if currently in listening state).
{
"type": "agent.stop_listening",
"event_id": "<event_id>"
}
Feedback & Improvements
This API is under continuous development for improved integration and performance. If you have any feedback, please share it with us!
Updated 3 days ago