Adding Built-in Voice Chat Integration to Demo Project

Please note: The built-in TTS/voice mode is tightly integrated with HeyGen's internal LLM/knowledge base. Currently, custom avatar speech input is not supported when voice chat is enabled. If you want to provide custom input, you will need to integrate your own STT solution.

This guide walks you through the next step of enhancing your existing Vite demo project using the HeyGen SDK by integrating built-in voice chat functionality. Building on the initial setup, we'll show you how to enable voice mode for real-time interaction with the avatar, allowing users to switch seamlessly between text and voice input.

1. Update index.html Structure

Add buttons to switch between text and voice modes, and include a section to manage voice controls.

<!-- Add mode switching buttons -->
<div class="chat-modes" role="group">
  <button id="textModeBtn" class="active">Text Mode</button>
  <button id="voiceModeBtn" disabled>Voice Mode</button>
</div>

<!-- Add voice mode controls section -->
<section id="voiceModeControls" role="group" style="display: none">
  <div id="voiceStatus"></div>
</section>

2. Update main.ts Code

Add New DOM References

Update the DOM to include references for the new buttons and controls.

// Add these DOM elements
const textModeBtn = document.getElementById("textModeBtn") as HTMLButtonElement;
const voiceModeBtn = document.getElementById("voiceModeBtn") as HTMLButtonElement;
const textModeControls = document.getElementById("textModeControls") as HTMLElement;
const voiceModeControls = document.getElementById("voiceModeControls") as HTMLElement;
const voiceStatus = document.getElementById("voiceStatus") as HTMLElement;

// Add mode tracking
let currentMode: "text" | "voice" = "text";

Update Avatar Initialization

Modify avatar initialization to handle voice chat events and display appropriate status updates.

async function initializeAvatarSession() {
  const token = await fetchAccessToken();
  avatar = new StreamingAvatar({ token });

  sessionData = await avatar.createStartAvatar({
    quality: AvatarQuality.High,
    avatarName: "default",
    disableIdleTimeout: true,
    language: "en",  // Use correct language code
  });

  // Add voice chat event listeners
  avatar.on(StreamingEvents.USER_START, () => {
    voiceStatus.textContent = "Listening...";
  });
  avatar.on(StreamingEvents.USER_STOP, () => {
    voiceStatus.textContent = "Processing...";
  });
  avatar.on(StreamingEvents.AVATAR_START_TALKING, () => {
    voiceStatus.textContent = "Avatar is speaking...";
  });
  avatar.on(StreamingEvents.AVATAR_STOP_TALKING, () => {
    voiceStatus.textContent = "Waiting for you to speak...";
  });
}

Add Voice Chat Functions

Create functions to manage the switching between modes and starting the voice chat.

async function startVoiceChat() {
  if (!avatar) return;
  
  try {
    await avatar.startVoiceChat({
      useSilencePrompt: false
    });
    voiceStatus.textContent = "Waiting for you to speak...";
  } catch (error) {
    console.error("Error starting voice chat:", error);
    voiceStatus.textContent = "Error starting voice chat";
  }
}

async function switchMode(mode: "text" | "voice") {
  if (currentMode === mode) return;
  
  currentMode = mode;
  
  if (mode === "text") {
    textModeBtn.classList.add("active");
    voiceModeBtn.classList.remove("active");
    textModeControls.style.display = "block";
    voiceModeControls.style.display = "none";
    if (avatar) {
      await avatar.closeVoiceChat();
    }
  } else {
    textModeBtn.classList.remove("active");
    voiceModeBtn.classList.add("active");
    textModeControls.style.display = "none";
    voiceModeControls.style.display = "block";
    if (avatar) {
      await startVoiceChat();
    }
  }
}

Enable Voice Mode Button

Enable the voice mode button once the avatar stream is ready.

function handleStreamReady(event: any) {
  if (event.detail && videoElement) {
    videoElement.srcObject = event.detail;
    videoElement.onloadedmetadata = () => {
      videoElement.play().catch(console.error);
    };
    voiceModeBtn.disabled = false;  // Enable voice mode after stream is ready
  }
}

Add Event Listeners

Make sure to add event listeners for switching modes.

// Add these with your other event listeners
textModeBtn.addEventListener("click", () => switchMode("text"));
voiceModeBtn.addEventListener("click", () => switchMode("voice"));

Important Notes:

  • Voice mode button starts disabled and enables only after stream is ready
  • Always use language code "en" instead of "English"
  • Voice chat status updates automatically through events
  • Voice chat starts when switching to voice mode
  • Make sure to handle cleanup when switching modes

Conclusion

In this guide, we’ve walked through the process of integrating built-in voice chat into the Vite demo project using HeyGen’s Streaming API. By following these steps, you can seamlessly switch between text and voice modes, allowing for a more interactive experience. Keep in mind that for custom speech input with voice chat, you will need to integrate your own STT solution.