How can I implement custom RAG with voice mode?

Hello,

I’m planning to work on a project where I want users to continuously talk to an avatar, and the avatar responds based on a custom LLM or Retrieval-Augmented Generation (RAG) model.

This voice mode feature has been implemented in the InteractiveAvatarNextJSDemo, but I’m looking to integrate my own custom LLM via FastAPI. When the user speaks, the system should fetch relevant information from the custom LLM/RAG and respond in real-time.

While analyzing the demo, I noticed that in voice mode, the avatar responds automatically without needing to run the .speak method manually.

Could anyone advise on how to implement this, especially since there isn’t much documentation on setting up custom LLMs, RAG, or custom agents in chat mode?