Discussions
Latency Optimization with HeyGen SDK + OpenAI Assistant API
3 months ago by Michael Hajster
We're experiencing higher than desired latency in our implementation using HeyGen's Streaming Avatar SDK combined with OpenAI's Assistant API. Our setup:
Tech Stack:
- HeyGen Streaming Avatar SDK (Low quality setting)
- OpenAI Assistant API (for knowledge base retrieval)
- Next.js API routes
Current Flow:
- Receive user input
- Process via OpenAI Assistant API (2-5s)
- Stream response through HeyGen avatar
Questions:
- Are others experiencing similar latency when combining these technologies?
- Is the Assistant API known to be slower than direct chat completions?
- Any recommendations for reducing the end-to-end latency while maintaining the ability to query our knowledge base?
Current implementation details:
- Using createAndPoll() for Assistant API
- Sequential processing (wait for GPT, then avatar)
- Low quality avatar setting
Any insights from the community would be greatly appreciated!
Github Repo: https://github.com/michaelhajster/ibw-virtual-advisor