HeyGen Skills Guide
Covers: heygen · heygen-best-practices · text-to-speech · video-translate
Overview of the 4 Skills
| Skill | Purpose |
|---|---|
heygen | Full API skill — video generation, TTS, translation, Remotion |
heygen-best-practices | Rule-file knowledge base for writing HeyGen API code |
text-to-speech | Focused skill for standalone audio via Starfish model |
video-translate | Focused skill for translating and dubbing existing videos |
Install any skill:
npx skills add https://github.com/heygen-com/skills --skill <skill-name>
# Or install all globally
npx skills add heygen-com/skills -a claude-code -gAuthentication (All Skills)
Set HEYGEN_API_KEY as an environment variable. Get your key at API Dashboard.
1. heygen
heygenThe main, full-featured skill. Use this as your default.
npx skills add https://github.com/heygen-com/skills --skill heygenTool Selection: MCP vs Direct API
If HeyGen MCP tools (mcp__heygen__*) are available, prefer them — they handle auth and request formatting automatically.
| Task | MCP Tool | Direct API Fallback |
|---|---|---|
| Generate video from prompt | mcp__heygen__generate_video_agent | POST /v1/video_agent/generate |
| Check video status / get URL | mcp__heygen__get_video | GET /v1/video_status.get |
| List account videos | mcp__heygen__list_videos | GET /v1/video.list |
| Generate TTS audio | mcp__heygen__text_to_speech | POST /v1/audio/text_to_speech |
| List TTS voices | mcp__heygen__list_audio_voices | GET /v1/audio/voices |
| Delete a video | mcp__heygen__delete_video | DELETE /v1/video.delete |
Default Workflow Decision
| Use Case | Recommended API |
|---|---|
| Most video requests | Video Agent POST /v1/video_agent/generate |
| Exact script without AI modification | POST /v2/video/generate |
Specific voice_id selection | POST /v2/video/generate |
| Different avatars/backgrounds per scene | POST /v2/video/generate |
| Precise per-scene timing | POST /v2/video/generate |
| Programmatic/batch generation | POST /v2/video/generate |
Video Agent Workflow (Easy Path)
With MCP tools:
- Write an optimized prompt using
prompt-optimizer.md→visual-styles.md - Call
mcp__heygen__generate_video_agentwithprompt,duration_sec,orientation,avatar_id - Call
mcp__heygen__get_videowith the returnedvideo_idto poll for the download URL
Without MCP tools (direct API):
- Write an optimized prompt (scenes + timing + visual style)
POST /v1/video_agent/generateGET /v1/video_status.get?video_id=<id>— poll untilstatus: "completed"
Quick Reference by Task
| Task | MCP Tool | Reference File |
|---|---|---|
| Generate video from prompt | mcp__heygen__generate_video_agent | prompt-optimizer.md → visual-styles.md → video-agent.md |
| Generate video with precise control | — | video-generation.md, avatars.md, voices.md |
| Check video status / get URL | mcp__heygen__get_video | video-status.md |
| Add captions or text overlays | — | captions.md, text-overlays.md |
| Transparent video (WebM) | — | video-generation.md (WebM section) |
| Standalone TTS audio | mcp__heygen__text_to_speech | text-to-speech.md |
| List TTS voices | mcp__heygen__list_audio_voices | voices.md |
| Translate/dub existing video | — | video-translation.md |
| Use with Remotion | — | remotion-integration.md |
Reference File Map
Foundation
references/authentication.md— API key setup andX-Api-Keyheaderreferences/quota.md— Credit system and usage limitsreferences/video-status.md— Polling patterns and download URLsreferences/assets.md— Uploading images, videos, audio
Core Video Creation
references/avatars.md— Listing avatars, styles,avatar_idselectionreferences/voices.md— Listing voices, locales, speed/pitchreferences/scripts.md— Writing scripts, pauses, pacingreferences/video-generation.md—POST /v2/video/generateand multi-scene videosreferences/video-agent.md— One-shot prompt video generationreferences/prompt-optimizer.md— Writing effective Video Agent prompts (core workflow + rules)references/visual-styles.md— 20 named visual styles with full specsreferences/prompt-examples.md— Full production prompt example + ready-to-use templatesreferences/dimensions.md— Resolution and aspect ratios
Video Customization
references/backgrounds.md— Solid colors, images, video backgroundsreferences/text-overlays.md— Adding text with fonts and positioningreferences/captions.md— Auto-generated captions and subtitles
Advanced Features
references/templates.md— Template listing and variable replacementreferences/video-translation.md— Translating videos and dubbingreferences/text-to-speech.md— Standalone TTS audio with Starfish modelreferences/streaming-avatars.md— Real-time interactive sessionsreferences/photo-avatars.md— Creating avatars from photosreferences/webhooks.md— Webhook endpoints and events
Integration
references/remotion-integration.md— Using HeyGen in Remotion compositions
2. heygen-best-practices
heygen-best-practicesA knowledge-base skill structured as rule files, designed for passive use while writing HeyGen API code — the agent reads the relevant rule file before generating code rather than following an active workflow.
npx skills add https://github.com/heygen-com/skills --skill heygen-best-practicesWhen to use: Any time you're writing code that calls the HeyGen API. Pair with heygen for maximum coverage.
Rule File Map
Foundation
rules/authentication.md— API key setup,X-Api-Keyheader, authentication patternsrules/quota.md— Credit system, usage limits, checking remaining quotarules/video-status.md— Polling patterns, status types, retrieving download URLsrules/assets.md— Uploading images, videos, audio for use in video generation
Core Video Creation
rules/avatars.md— Listing avatars, avatar styles,avatar_idselectionrules/voices.md— Listing voices, locales, speed/pitch configurationrules/scripts.md— Writing scripts, pauses/breaks, pacing, structure templatesrules/video-generation.md—POST /v2/video/generateworkflow and multi-scene videosrules/video-agent.md— One-shot prompt video generation with Video Agent APIrules/dimensions.md— Resolution options (720p/1080p) and aspect ratios
Video Customization
rules/backgrounds.md— Solid colors, images, and video backgroundsrules/text-overlays.md— Adding text with fonts and positioningrules/captions.md— Auto-generated captions and subtitle options
Advanced Features
rules/templates.md— Template listing and variable replacementrules/video-translation.md— Translating videos, quality/fast modes, and dubbingrules/streaming-avatars.md— Real-time interactive avatar sessionsrules/photo-avatars.md— Creating avatars from photos (talking photos)rules/webhooks.md— Registering webhook endpoints and event types
Integration
rules/remotion-integration.md— Using HeyGen avatar videos in Remotion compositions
3. text-to-speech
text-to-speechFocused skill for generating standalone audio using HeyGen's Starfish TTS model.
npx skills add https://github.com/heygen-com/skills --skill text-to-speechUse when: You need audio-only output — podcasts, IVR prompts, narration tracks, or audio to pair with custom visuals.
Key Endpoints
| Task | Endpoint |
|---|---|
| List available voices | GET /v1/audio/voices |
| Generate speech | POST /v1/audio/text_to_speech |
Generate Speech
curl -X POST "https://api.heygen.com/v1/audio/text_to_speech" \
-H "X-Api-Key: $HEYGEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is a test of the Starfish TTS model.",
"voice_id": "YOUR_VOICE_ID"
}'List Voices
curl "https://api.heygen.com/v1/audio/voices" \
-H "X-Api-Key: $HEYGEN_API_KEY"Returns voice_id, locale, gender, preview audio URL. These IDs also work for locking a specific voice in v2/video/generate.
mcp__heygen__text_to_speech — generate audio
mcp__heygen__list_audio_voices — list available voices
Use Cases
- Podcast narration tracks
- IVR / phone system prompts
- Narration to combine with custom animation or Remotion
- A/B testing voices before committing to a full video
4. video-translate
video-translateFocused skill for translating and dubbing existing videos into other languages, preserving lip-sync and the original speaker's voice characteristics.
npx skills add https://github.com/heygen-com/skills --skill video-translateUse when: You have an existing video (by URL or HeyGen video ID) and want to dub it into one or more languages.
Default Workflow
- Provide a
video_urlorvideo_id - Call
POST /v2/video_translatewith the target language - Poll
GET /v2/video_translate/{translate_id}untilstatus: "completed" - Download from the returned URL (allow up to 30 minutes)
Translation Modes
| Mode | Quality |
|---|---|
speed (default) | Good; best for limited facial movement |
percision | Higher; context-aware, more natural lip-sync |
Supported Languages (Sample)
| Language | Code | Language | Code |
|---|---|---|---|
| English (US) | en-US | Japanese | ja-JP |
| Spanish (Spain) | es-ES | Korean | ko-KR |
| Spanish (Mexico) | es-MX | Chinese (Mandarin) | zh-CN |
| French | fr-FR | Hindi | hi-IN |
| German | de-DE | Arabic | ar-SA |
| Portuguese (Brazil) | pt-BR | Italian | it-IT |
175+ languages and dialects supported in total.
Poll for Completion
async function waitForTranslation(translateId: string): Promise<string> {
const maxWait = 30 * 60 * 1000; // 30 minutes
const interval = 30_000; // poll every 30 seconds
const start = Date.now();
while (Date.now() - start < maxWait) {
const res = await fetch(
`https://api.heygen.com/v2/video_translate/${translateId}`,
{ headers: { "X-Api-Key": process.env.HEYGEN_API_KEY! } }
);
const { data } = await res.json();
if (data.status === "completed") return data.video_url;
if (data.status === "failed") throw new Error(data.message);
await new Promise(r => setTimeout(r, interval));
}
throw new Error("Timed out");
}Batch Translation
// Kick off all languages in parallel, then poll each
const languages = ["es-ES", "fr-FR", "de-DE", "ja-JP"];
const jobs = await Promise.all(
languages.map(lang =>
fetch("https://api.heygen.com/v2/video_translate", {
method: "POST",
headers: {
"X-Api-Key": process.env.HEYGEN_API_KEY!,
"Content-Type": "application/json"
},
body: JSON.stringify({
video_url: "https://example.com/video.mp4",
output_language: lang
})
}).then(r => r.json()).then(j => ({ lang, id: j.data.video_translate_id }))
)
);Best Practices
- Source quality matters — Use high-quality video with clear audio
- Single speaker preferred — Best lip-sync with one speaker at a time
- Frontal face position — Avoid large-angle head movements or rapid cuts
- Moderate speech pace — Very fast speech may reduce translation quality
- Test with a short clip first — Before translating long content
- Allow up to 30 minutes — Translation is slower than video generation
- Enable Dynamic Duration — Automatically adjusts timing when translated audio length differs from the original
Choosing the Right Skill
| You want to... | Use |
|---|---|
| General video work (generate, translate, TTS, Remotion) | heygen |
| Write HeyGen API code with rule-based guidance | heygen-best-practices |
| Generate standalone audio only | text-to-speech |
| Translate or dub an existing video | video-translate |
| Do all of the above | Install all 4 |
The focused skills (text-to-speech, video-translate) are lighter-weight alternatives when your use case is narrow. heygen-best-practices is best paired with heygen for coding tasks.
MCP Server (Alternative to Skills)
{
"mcpServers": {
"HeyGen": {
"command": "uvx",
"args": ["heygen-mcp"],
"env": { "HEYGEN_API_KEY": "<your-api-key>" }
}
}
}Available MCP tools: get_remaining_credits, get_voices, get_avatar_groups, get_avatars_in_avatar_group, generate_avatar_video, get_avatar_video_status.
Related Resources
| Resource | URL |
|---|---|
| Skills Marketplace | skills.sh/heygen-com/skills |
| GitHub Repository | github.com/heygen-com/skills |
| HeyGen Developer Portal | API Dashboard |
| HeyGen MCP Server | MCP Server |
Updated 23 days ago