Discussions
HeyGen Realtime API (Alpha) + OpenAI Realtime
Hey there!
I'm a CS student conducting some research about realtime conversational avatars. I am currently experimenting with integrating OpenAI Realtime with the recently released (Alpha) Realtime API of HeyGen. The API works fine including state management, voice detection (which heavily relies on OpenAI Realtime), avatar startup, lipsycing small audio chunks but I seem to have some problems understanding how exactly the API works and triggers lipsync and the avatars talking stage.
Specifically about what best practices for appending and committing chunks of audio are. The documentation states that "Limit audio segments to 1-4 seconds for optimal performance." but does this count for commits only ? It seems like the lipsync only starts after committing, but if the (speech) answer from OpenAI realtime takes longer, it introduces alot of latency to the conversation.
Do you have any experience with committing earlier, buffering audio answers from OpenAI until the avatar enters a talking state, and continue to commit while the avatar is already in a talking state and lipsync is running ? Or is there a possibility to stream the audio from OpenAI directly to HeyGens Realtime API, and buffer the output until lipsync has been processed ?
I know this still is in Alpha, but would really appreciate some insights.
Regards, Julian