(Alpha) Interactive Avatar Realtime API

https://docs.heygen.com/docs/interactive-avatar-realtime-api

This API is currently in alpha. Still under development. Breaking changes can happen. We can't provide support at this time.

Summary

HeyGen’s Interactive Avatar Realtime API enables you to build a low latency, visual conversation experience for your customers. You bring your speech to speech stack (we’re starting with support for pipecat), we’ll provide a human like visual agent and end user video delivery.

Architecture Diagram

Sequence Diagram.

Getting Started

NextJS Demo
- https://github.com/HeyGen-Official/InteractiveAvatarNextJSDemo/tree/realtime-alpha-demo
Pipecat Demo
- https://github.com/HeyGen-Official/pipecat-realtime-demo

Endpoint

Interactive Avatar Realtime API is a stateful, server-side, event-based API that communicates over a WebSocket. The websocket address and port will be provided when The WebSocket connection requires the following parameters:

URL: wss://{heygen-server}:{port}/v1-alpha/realtime/{session_id}

The server ip and port will be assigned at the start of each new session and returned as part of the /v1/streaming.new api call in a field called: realtime_endoint

Event Base Body:

All events will be json and have these base fields:

{
	"type": "<event_type>",
	"event_id": "<event_id>"
}

Client Events (events you send):

agent.audio_buffer_append

{
	"type": "agent.audio_buffer_append",
	"event_id": "<event_id>",
	"audio": "{Send base64 encoded PCM 16bit 24khz audio segments}"
}
# to optimize inferencing speeds, we suggest limiting segments to 1-4 seconds

agent.audio_buffer_commit

{
	"type": "agent.audio_buffer_commit",
	"event_id": "<event_id>",
	"audio": "{Send base64 encoded PCM 16bit 24khz audio segments}"
}

agent.audio_buffer_clear

{
	"type": "agent.audio_buffer_clear",
	"event_id": "<event_id>",
}

agent.interrupt - stops agent’s current task to return back to silent animation

{
	"type": "agent.interrupt",
	"event_id": "<event_id>",
}

agent.start_listening - starts listening animation, only possible if agent is already silent

{
	"type": "agent.start_listening",
	"event_id": "<event_id>",
}

agent.stop_listening - ends listening animation, only possible if agent is in listening animation

{
	"type": "agent.end_listening",
	"event_id": "<event_id>",
}

This API is under continuous development for easier integration and better performance.
If any feedback please share with us!