How to Reduce Initial Latency (8-9s) in Assistants API with File Retrieval while Maintaining Full Retrieval Functionality

We're building a virtual university advisor where maintaining accurate, knowledge-based responses is crucial. Our main goal is to keep the full retrieval functionality but significantly reduce the initial response latency.

Current Setup:

OpenAI Assistant with:
- Vector store (138 KB)
- 4 JSON files with university program information
- File retrieval for accurate Q&A responses
HeyGen Streaming Avatar for response delivery

Current Behavior:

[0ms] Starting response generation
[1220ms] Stream started
[8904ms] First text chunk received  <- ~7.7s delay after stream start
[9424ms] First complete sentence

Critical Requirements:

MUST maintain full retrieval capabilities
MUST keep response accuracy from knowledge base
MUST continue using file-based knowledge system
Only want to optimize latency without compromising these features

Questions:

How can we reduce this initial 7-8 second latency while keeping ALL retrieval functionality intact?
Are there optimization techniques that don't compromise the retrieval quality?
Could we optimize the vector store/file structure while maintaining the same knowledge coverage?

We specifically want to avoid solutions that suggest:

Removing/reducing retrieval capabilities
Simplifying the knowledge base
Using simpler but less accurate responses

The goal is purely performance optimization while keeping the current functionality exactly as is. Idk if I should switch from Assistants API but this project has not so much time left and I am lost.