Discussions

Latency Optimization with HeyGen SDK + OpenAI Assistant API

6 months ago by Michael Hajster

We're experiencing higher than desired latency in our implementation using HeyGen's Streaming Avatar SDK combined with OpenAI's Assistant API. Our setup:

Tech Stack:

HeyGen Streaming Avatar SDK (Low quality setting)
OpenAI Assistant API (for knowledge base retrieval)
Next.js API routes

Current Flow:

Receive user input
Process via OpenAI Assistant API (2-5s)
Stream response through HeyGen avatar

Questions:

Are others experiencing similar latency when combining these technologies?
Is the Assistant API known to be slower than direct chat completions?
Any recommendations for reducing the end-to-end latency while maintaining the ability to query our knowledge base?

Current implementation details:

Using createAndPoll() for Assistant API
Sequential processing (wait for GPT, then avatar)
Low quality avatar setting

Any insights from the community would be greatly appreciated!
Github Repo: https://github.com/michaelhajster/ibw-virtual-advisor