Discussions

Ask a Question
Back to All

Latency Optimization with HeyGen SDK + OpenAI Assistant API

We're experiencing higher than desired latency in our implementation using HeyGen's Streaming Avatar SDK combined with OpenAI's Assistant API. Our setup:

Tech Stack:

  • HeyGen Streaming Avatar SDK (Low quality setting)
  • OpenAI Assistant API (for knowledge base retrieval)
  • Next.js API routes

Current Flow:

  1. Receive user input
  2. Process via OpenAI Assistant API (2-5s)
  3. Stream response through HeyGen avatar

Questions:

  1. Are others experiencing similar latency when combining these technologies?
  2. Is the Assistant API known to be slower than direct chat completions?
  3. Any recommendations for reducing the end-to-end latency while maintaining the ability to query our knowledge base?

Current implementation details:

  • Using createAndPoll() for Assistant API
  • Sequential processing (wait for GPT, then avatar)
  • Low quality avatar setting

Any insights from the community would be greatly appreciated!
Github Repo: https://github.com/michaelhajster/ibw-virtual-advisor