Summary

HeyGen’s Interactive Avatar Realtime API enables you to build a low latency, visual conversation experience for your customers. You bring your speech to speech stack (we’re starting with support for pipecat), we’ll provide a human like visual agent and end user video delivery.

Architecture Diagram

Sequence Diagram.

Getting Started

Endpoint

Interactive Avatar Realtime API is a stateful, server-side, event-based API that communicates over a WebSocket. The websocket address and port will be provided when The WebSocket connection requires the following parameters:

  • URL: wss://{heygen-server}:{port}/v1-alpha/realtime/{session_id}

The server ip and port will be assigned at the start of each new session and returned as part of the /v1/streaming.new api call in a field called: realtime_endoint

Event Base Body:

All events will be json and have these base fields:

{
	"type": "<event_type>",
	"event_id": "<event_id>"
}

Client Events (events you send):

  • agent.audio_buffer_append
{
	"type": "agent.audio_buffer_append",
	"event_id": "<event_id>",
	"audio": "{Send base64 encoded PCM 16bit 24khz audio segments}"
}
# to optimize inferencing speeds, we suggest limiting segments to 1-4 seconds
  • agent.audio_buffer_commit
{
	"type": "agent.audio_buffer_commit",
	"event_id": "<event_id>",
	"audio": "{Send base64 encoded PCM 16bit 24khz audio segments}"
}
  • agent.audio_buffer_clear
{
	"type": "agent.audio_buffer_clear",
	"event_id": "<event_id>",
}
  • agent.interrupt - stops agent’s current task to return back to silent animation
{
	"type": "agent.interrupt",
	"event_id": "<event_id>",
}
  • agent.start_listening - starts listening animation, only possible if agent is already silent
{
	"type": "agent.start_listening",
	"event_id": "<event_id>",
}
  • agent.stop_listening - ends listening animation, only possible if agent is in listening animation
{
	"type": "agent.end_listening",
	"event_id": "<event_id>",
}

Terms

This API is considered alpha and experimental. We are previewing this select few customers to gather feedback and guide our product. By using this API, you agree to the terms below:

  • Alpha Release: Experimental codebase in active development; use at your own risk.
  • Limited Functionality: Only basic features available; not representative of the final product.
  • Experimental Features: Features may change or be removed in future versions.
  • Bugs and Issues: Expect bugs and glitches; users are encouraged to report them.
  • Incomplete Documentation: Documentation may be missing or incomplete; rely on self-troubleshooting.
  • Limited Support: Support is not guaranteed and considered best effort from the development team.
  • Data Risk: Potential for data loss or corruption; avoid using critical/sensitive data.
  • Frequent Updates: While provide best effort to communicate updates and changes ahead of time: additions, changes, and removal of features can occur frequently and without notice
  • Community Feedback: User feedback and collaboration are vital for improvement.
  • Testing Purpose: Intended for testing and evaluation, **not for production use.**
  • No Guarantees: No warranties; users assume all risks.