This API now generates videos with our New AI Studio backend.
Request Body
Field  | Type  | Description  | 
|---|---|---|
caption  | bool (optional)  | Whether to add a caption to the video. Default is   | 
title  | str (optional)  | Title for the video.  | 
callback_id  | str (optional)  | A custom ID for callback purposes.  | 
video_inputs  | List of video input settings (scenes). Must contain between 1 to 50 items. A video input describes the avatar, background, voice, and script, which together equals a 'scene'.  | |
dimension  | The dimensions of the output video.  | |
folder_id  | str (optional)  | Allows them to specify the video output folder destination.  | 
callback_url  | str(optional)  | An optional callback url. This is useful if your callback endpoint is dynamic and each video have it's separate callback url.  | 
VideoInput
Field  | Type  | Description  | 
|---|---|---|
character  | AvatarSettings or TalkingPhotoSettings (optional).  | Character settings.  | 
voice  | TextVoiceSettings or AudioVoiceSettings or SilenceVoiceSettings  | Voice settings.  | 
background  | ColorBackground or ImageBackground or VideoBackground (optional)  | Background settings.  | 
Character Settings
AvatarSettings
| Field | Type | Description | 
|---|---|---|
| type | Literal["avatar"] | Indicates that this is an avatar character setting. | 
| avatar_id | str | Avatar ID. Please note that this is NOT the Avatar Group ID; they are different. | 
| scale | float | Avatar scale, value between 0 and 5.0. Default is 1.0. Use the Avatar Positioning tool for easier adjustment.  | 
| avatar_style | CharacterRenderType (optional) | Avatar style. Supported values are: circle, normal, closeUp. | 
| offset | Offset | Avatar offset. Default is { "x": 0.0, "y": 0.0 }. Use the Avatar Positioning tool for easier adjustment.  | 
| matting | bool (optional) | Whether to do matting | 
| circle_background_color | str (optional) | background color in the circle when using circle style | 
Note: Currently, background-removed custom avatars are not supported in the API.
TalkingPhotoSettings
Field  | Type  | Description  | 
|---|---|---|
type  | Literalst Body [block:p  | Indicates that this is a talking photo character setting.  | 
talking_photo_id  | str  | Talking Photo ID.  | 
scale  | float  | Talking Photo scale, value between   | 
talking_photo_style  | TACropStyle (optional)  | Talking Photo crop style. Supported values are:   | 
offset  | Offset  | Talking Photo offset.   | 
talking_style  | TPExpression  | Talking Photo talking style. Default is   | 
expression  | TPExpressionStyle  | Talking Photo expression style. Default is   | 
super_resolution  | bool (optional)  | Whether to enhance this photar image.  | 
matting  | bool (optional)  | Whether to do matting.  | 
circle_background_color  | str (optional)  | background color in the circle/square when using circle/square style  | 
Voice Settings
TextVoiceSettings
| Field | Type | Description | 
|---|---|---|
| type | Literal["text"] | Indicates that this is a text voice setting. | 
| voice_id | str | Voice ID. | 
| input_text | str | Input text. | 
| speed | float (optional) | Voice speed, value between 0.5 and 1.5. Default is 1. | 
| pitch | int (optional) | Voice pitch, value between -50 and 50. Default is 0. | 
| emotion | str (optional) | Voice emotion, if voice support emotion. value are ['Excited','Friendly','Serious','Soothing','Broadcaster'] | 
| locale | str (optional) | Allows to specify voice accents/locales for multilingual voices. (e.g., en-US, en-IN, pt-PT, pt-BR ) | 
| elevenlabs_settings | ElevenLabsSettings object (optional) | ElevenLabs specific voice settings. | 
ElevenLabsSettings:
| Field | Type | Description | 
|---|---|---|
| model | string | The ElevenLabs model to use. Valid options: eleven_monolingual_v1, eleven_multilingual_v1, eleven_multilingual_v2, eleven_turbo_v2, eleven_turbo_v2_5 | 
| similarity_boost | float | Controls how similar the generated speech should be to the original voice. Range: 0.0 to 1.0 | 
| stability | float | Controls the stability of the voice generation. Higher values result in more consistent and stable output. Range: 0.0 to 1.0 | 
| style | float | Controls the style intensity of the generated speech. Range: 0.0 to 1.0 | 
AudioVoiceSettings
| Field | Type | Description | 
|---|---|---|
| type | Literal["audio"] | Indicates that this is an audio voice setting. | 
| audio_url | str (optional) | Audio URL. | 
| audio_asset_id | str (optional) | Audio asset ID. Either audio_url or audio_asset_id must be provided. | 
SilenceVoiceSettings
| Field | Type | Description | 
|---|---|---|
| type | Literal["silence"] | Indicates that this is a silence voice setting. | 
| duration | float | Duration of silence, value between 1.0 and 100.0. Default is 1.0. | 
Background Settings
ColorBackground
| Field | Type | Description | 
|---|---|---|
| type | Literal["color"] | Indicates that this is a color background setting. Default is color. | 
| value | str | Color value in hex format. Default is #f6f6fc. | 
ImageBackground
| Field | Type | Description | 
|---|---|---|
| type | Literal["image"] | Indicates that this is an image background setting. | 
| url | str (optional) | Image URL. | 
| image_asset_id | str (optional) | Image asset ID. Either url or image_asset_id must be provided. | 
| fit | str (optional) | Background image fit to the screen. Choose among cover , crop, contain and none. Default is cover | 
VideoBackground
| Field | Type | Description | 
|---|---|---|
| type | Literal["video"] | Indicates that this is a video background setting. | 
| url | str (optional) | Video URL. | 
| video_asset_id | str (optional) | Video asset ID. Either url or video_asset_id must be provided. | 
| play_style | VideoPlayback | Video play style. Supported values are: fit_to_scene, freeze, loop, once. More Info | 
| fit | str (optional) | Background video fit to the screen. Choose among cover , crop, contain and none. Default is cover | 
Response
| Field | Type | Description | 
|---|---|---|
| video_id | str | ID of the generated video. | 
