Discussions
Question about customizing Streaming Avatar flow with HeyGen SDK
Hey,
I am using the HeyGen SDK to build a React application as part of a
larger product designed to assist businesses in streamlining their
operations. Our application serves as a live, interactive tool that
provides real-time information about a venue (e.g. a restaurant,
clinic, or office), helping users navigate services, understand
offerings, and make quicker decisions — all through an intuitive
conversational interface powered by an AI avatar.
This project is strictly focused on delivering contextual, localized
information and improving user experience through natural dialogue. We
are not seeking to misuse or extract access to HeyGen's LLM
functionality beyond its intended use — our goal is to create a
seamless business assistant experience within the expected boundaries
of the SDK.
I would like to clarify whether it’s possible to implement the
following scenario using the Streaming Avatar SDK:
When a session starts and a user asks a question, I need to:
Extract the user's spoken prompt as text.
Send it to a third-party API for contextual analysis.
Based on the API's response, either:
a) Ask the avatar to speak a specific sentence, or
b) Continue with the regular HeyGen flow (i.e. let HeyGen analyze the
prompt and generate a response via its internal LLM).
My main questions are:
Is there a way to tap into the intermediate steps of the Streaming
Avatar flow? For example, intercepting or modifying the transcribed
prompt before it reaches the LLM?
Is it possible to control which steps are executed under the hood —
such as pausing transcription temporarily, or deferring the LLM
response until a condition is met?
I would greatly appreciate any guidance or recommendations on how to
achieve this using your SDK.
Thank you in advance for your time and support!