HeyGen Skills Guide

Covers: heygen · heygen-best-practices · text-to-speech · video-translate

Overview of the 4 Skills

SkillPurpose
heygenFull API skill — video generation, TTS, translation, Remotion
heygen-best-practicesRule-file knowledge base for writing HeyGen API code
text-to-speechFocused skill for standalone audio via Starfish model
video-translateFocused skill for translating and dubbing existing videos

Install any skill:

npx skills add https://github.com/heygen-com/skills --skill <skill-name>

# Or install all globally
npx skills add heygen-com/skills -a claude-code -g

Authentication (All Skills)

Set HEYGEN_API_KEY as an environment variable. Get your key at API Dashboard.


1. heygen

The main, full-featured skill. Use this as your default.

npx skills add https://github.com/heygen-com/skills --skill heygen

Tool Selection: MCP vs Direct API

If HeyGen MCP tools (mcp__heygen__*) are available, prefer them — they handle auth and request formatting automatically.

TaskMCP ToolDirect API Fallback
Generate video from promptmcp__heygen__generate_video_agentPOST /v1/video_agent/generate
Check video status / get URLmcp__heygen__get_videoGET /v1/video_status.get
List account videosmcp__heygen__list_videosGET /v1/video.list
Generate TTS audiomcp__heygen__text_to_speechPOST /v1/audio/text_to_speech
List TTS voicesmcp__heygen__list_audio_voicesGET /v1/audio/voices
Delete a videomcp__heygen__delete_videoDELETE /v1/video.delete

Default Workflow Decision

Use CaseRecommended API
Most video requestsVideo Agent POST /v1/video_agent/generate
Exact script without AI modificationPOST /v2/video/generate
Specific voice_id selectionPOST /v2/video/generate
Different avatars/backgrounds per scenePOST /v2/video/generate
Precise per-scene timingPOST /v2/video/generate
Programmatic/batch generationPOST /v2/video/generate

Video Agent Workflow (Easy Path)

With MCP tools:

  1. Write an optimized prompt using prompt-optimizer.mdvisual-styles.md
  2. Call mcp__heygen__generate_video_agent with prompt, duration_sec, orientation, avatar_id
  3. Call mcp__heygen__get_video with the returned video_id to poll for the download URL

Without MCP tools (direct API):

  1. Write an optimized prompt (scenes + timing + visual style)
  2. POST /v1/video_agent/generate
  3. GET /v1/video_status.get?video_id=<id> — poll until status: "completed"

Quick Reference by Task

TaskMCP ToolReference File
Generate video from promptmcp__heygen__generate_video_agentprompt-optimizer.mdvisual-styles.mdvideo-agent.md
Generate video with precise controlvideo-generation.md, avatars.md, voices.md
Check video status / get URLmcp__heygen__get_videovideo-status.md
Add captions or text overlayscaptions.md, text-overlays.md
Transparent video (WebM)video-generation.md (WebM section)
Standalone TTS audiomcp__heygen__text_to_speechtext-to-speech.md
List TTS voicesmcp__heygen__list_audio_voicesvoices.md
Translate/dub existing videovideo-translation.md
Use with Remotionremotion-integration.md

Reference File Map

Foundation

  • references/authentication.md — API key setup and X-Api-Key header
  • references/quota.md — Credit system and usage limits
  • references/video-status.md — Polling patterns and download URLs
  • references/assets.md — Uploading images, videos, audio

Core Video Creation

  • references/avatars.md — Listing avatars, styles, avatar_id selection
  • references/voices.md — Listing voices, locales, speed/pitch
  • references/scripts.md — Writing scripts, pauses, pacing
  • references/video-generation.mdPOST /v2/video/generate and multi-scene videos
  • references/video-agent.md — One-shot prompt video generation
  • references/prompt-optimizer.md — Writing effective Video Agent prompts (core workflow + rules)
  • references/visual-styles.md — 20 named visual styles with full specs
  • references/prompt-examples.md — Full production prompt example + ready-to-use templates
  • references/dimensions.md — Resolution and aspect ratios

Video Customization

  • references/backgrounds.md — Solid colors, images, video backgrounds
  • references/text-overlays.md — Adding text with fonts and positioning
  • references/captions.md — Auto-generated captions and subtitles

Advanced Features

  • references/templates.md — Template listing and variable replacement
  • references/video-translation.md — Translating videos and dubbing
  • references/text-to-speech.md — Standalone TTS audio with Starfish model
  • references/streaming-avatars.md — Real-time interactive sessions
  • references/photo-avatars.md — Creating avatars from photos
  • references/webhooks.md — Webhook endpoints and events

Integration

  • references/remotion-integration.md — Using HeyGen in Remotion compositions

2. heygen-best-practices

A knowledge-base skill structured as rule files, designed for passive use while writing HeyGen API code — the agent reads the relevant rule file before generating code rather than following an active workflow.

npx skills add https://github.com/heygen-com/skills --skill heygen-best-practices

When to use: Any time you're writing code that calls the HeyGen API. Pair with heygen for maximum coverage.


Rule File Map

Foundation

  • rules/authentication.md — API key setup, X-Api-Key header, authentication patterns
  • rules/quota.md — Credit system, usage limits, checking remaining quota
  • rules/video-status.md — Polling patterns, status types, retrieving download URLs
  • rules/assets.md — Uploading images, videos, audio for use in video generation

Core Video Creation

  • rules/avatars.md — Listing avatars, avatar styles, avatar_id selection
  • rules/voices.md — Listing voices, locales, speed/pitch configuration
  • rules/scripts.md — Writing scripts, pauses/breaks, pacing, structure templates
  • rules/video-generation.mdPOST /v2/video/generate workflow and multi-scene videos
  • rules/video-agent.md — One-shot prompt video generation with Video Agent API
  • rules/dimensions.md — Resolution options (720p/1080p) and aspect ratios

Video Customization

  • rules/backgrounds.md — Solid colors, images, and video backgrounds
  • rules/text-overlays.md — Adding text with fonts and positioning
  • rules/captions.md — Auto-generated captions and subtitle options

Advanced Features

  • rules/templates.md — Template listing and variable replacement
  • rules/video-translation.md — Translating videos, quality/fast modes, and dubbing
  • rules/streaming-avatars.md — Real-time interactive avatar sessions
  • rules/photo-avatars.md — Creating avatars from photos (talking photos)
  • rules/webhooks.md — Registering webhook endpoints and event types

Integration

  • rules/remotion-integration.md — Using HeyGen avatar videos in Remotion compositions

3. text-to-speech

Focused skill for generating standalone audio using HeyGen's Starfish TTS model.

npx skills add https://github.com/heygen-com/skills --skill text-to-speech

Use when: You need audio-only output — podcasts, IVR prompts, narration tracks, or audio to pair with custom visuals.


Key Endpoints

TaskEndpoint
List available voicesGET /v1/audio/voices
Generate speechPOST /v1/audio/text_to_speech

Generate Speech

curl -X POST "https://api.heygen.com/v1/audio/text_to_speech" \
  -H "X-Api-Key: $HEYGEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the Starfish TTS model.",
    "voice_id": "YOUR_VOICE_ID"
  }'

List Voices

curl "https://api.heygen.com/v1/audio/voices" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

Returns voice_id, locale, gender, preview audio URL. These IDs also work for locking a specific voice in v2/video/generate.


mcp__heygen__text_to_speech    — generate audio
mcp__heygen__list_audio_voices — list available voices

Use Cases

  • Podcast narration tracks
  • IVR / phone system prompts
  • Narration to combine with custom animation or Remotion
  • A/B testing voices before committing to a full video

4. video-translate

Focused skill for translating and dubbing existing videos into other languages, preserving lip-sync and the original speaker's voice characteristics.

npx skills add https://github.com/heygen-com/skills --skill video-translate

Use when: You have an existing video (by URL or HeyGen video ID) and want to dub it into one or more languages.


Default Workflow

  1. Provide a video_url or video_id
  2. Call POST /v2/video_translate with the target language
  3. Poll GET /v2/video_translate/{translate_id} until status: "completed"
  4. Download from the returned URL (allow up to 30 minutes)

Translation Modes

ModeQuality
speed (default)Good; best for limited facial movement
percisionHigher; context-aware, more natural lip-sync

Supported Languages (Sample)

LanguageCodeLanguageCode
English (US)en-USJapaneseja-JP
Spanish (Spain)es-ESKoreanko-KR
Spanish (Mexico)es-MXChinese (Mandarin)zh-CN
Frenchfr-FRHindihi-IN
Germande-DEArabicar-SA
Portuguese (Brazil)pt-BRItalianit-IT

175+ languages and dialects supported in total.


Poll for Completion

async function waitForTranslation(translateId: string): Promise<string> {
  const maxWait = 30 * 60 * 1000; // 30 minutes
  const interval = 30_000;        // poll every 30 seconds
  const start = Date.now();

  while (Date.now() - start < maxWait) {
    const res = await fetch(
      `https://api.heygen.com/v2/video_translate/${translateId}`,
      { headers: { "X-Api-Key": process.env.HEYGEN_API_KEY! } }
    );
    const { data } = await res.json();

    if (data.status === "completed") return data.video_url;
    if (data.status === "failed") throw new Error(data.message);

    await new Promise(r => setTimeout(r, interval));
  }
  throw new Error("Timed out");
}

Batch Translation

// Kick off all languages in parallel, then poll each
const languages = ["es-ES", "fr-FR", "de-DE", "ja-JP"];
const jobs = await Promise.all(
  languages.map(lang =>
    fetch("https://api.heygen.com/v2/video_translate", {
      method: "POST",
      headers: {
        "X-Api-Key": process.env.HEYGEN_API_KEY!,
        "Content-Type": "application/json"
      },
      body: JSON.stringify({
        video_url: "https://example.com/video.mp4",
        output_language: lang
      })
    }).then(r => r.json()).then(j => ({ lang, id: j.data.video_translate_id }))
  )
);

Best Practices

  1. Source quality matters — Use high-quality video with clear audio
  2. Single speaker preferred — Best lip-sync with one speaker at a time
  3. Frontal face position — Avoid large-angle head movements or rapid cuts
  4. Moderate speech pace — Very fast speech may reduce translation quality
  5. Test with a short clip first — Before translating long content
  6. Allow up to 30 minutes — Translation is slower than video generation
  7. Enable Dynamic Duration — Automatically adjusts timing when translated audio length differs from the original

Choosing the Right Skill

You want to...Use
General video work (generate, translate, TTS, Remotion)heygen
Write HeyGen API code with rule-based guidanceheygen-best-practices
Generate standalone audio onlytext-to-speech
Translate or dub an existing videovideo-translate
Do all of the aboveInstall all 4

The focused skills (text-to-speech, video-translate) are lighter-weight alternatives when your use case is narrow. heygen-best-practices is best paired with heygen for coding tasks.


MCP Server (Alternative to Skills)

{
  "mcpServers": {
    "HeyGen": {
      "command": "uvx",
      "args": ["heygen-mcp"],
      "env": { "HEYGEN_API_KEY": "<your-api-key>" }
    }
  }
}

Available MCP tools: get_remaining_credits, get_voices, get_avatar_groups, get_avatars_in_avatar_group, generate_avatar_video, get_avatar_video_status.


Related Resources

ResourceURL
Skills Marketplaceskills.sh/heygen-com/skills
GitHub Repositorygithub.com/heygen-com/skills
HeyGen Developer PortalAPI Dashboard
HeyGen MCP ServerMCP Server