Video Voiceover

Add narration to your videos with text-to-speech generation and scene durations that size automatically to match each generated audio clip.

Add professional narration to your videos with text-to-speech generation and automatic scene duration sizing.

How It Works

1. Write Narration

Each scene card in the editor includes a text area for voiceover copy. Write what should be spoken during that scene.

2. Generate Speech

Click "Generate VO" in the editor header. The system sends each scene's text to ElevenLabs, generates an MP3 clip, measures its duration, and uploads it to cloud storage.

3. Auto-Size Durations

Scene durations are automatically adjusted to match the VO audio length plus configurable padding (default: 0.5 seconds). This ensures narration never gets cut off and scenes don't linger.

4. Preview & Iterate

Play VO clips inline in each scene card. Edit the text and regenerate as needed before rendering the final video.

Features

Text-to-Speech

Powered by ElevenLabs multilingual TTS
18 curated voices (10 male, 8 female) with varied accents
High-quality MP3 output
Per-scene generation, only regenerate what changed

Voice Selection

Set a site-wide default voice in Settings > Branding
Override per scene with the voice picker dropdown
Mix voices within a video, different narrator per scene
Paste any ElevenLabs voice ID as a custom voice
Bring your own ElevenLabs API key for cloned voices
Fallback chain: scene → video → custom → site default → Brian

Auto-Sizing

Duration measured via audio probe
Configurable padding after narration
Minimum duration enforced if VO is very short
Total video duration updated automatically

Editor Integration

Inline text area per scene card
Play / pause audio preview
VO duration shown alongside scene duration
One-click regeneration from header

Storage & Delivery

Audio files uploaded to cloud storage
CDN-backed public URLs
Organized per video and scene
Previous versions overwritten on regeneration

Bring Your Own ElevenLabs Key

By default, voiceover generation uses the platform's shared ElevenLabs account. If you need access to custom cloned voices or want to use your own usage quota, you can connect your own ElevenLabs API key.

Setup

Go to Settings > Integrations > ElevenLabs and paste your API key. All voiceover generation will immediately switch to your account.

What It Unlocks

Use your own cloned or custom voices
Access professional voice cloning features
Use your own ElevenLabs usage quota
Falls back to the platform key if removed

Audio Mixing

When you render a video, voiceover audio is automatically mixed into the final MP4. Each scene's VO clip is timeline-aligned to start at the correct timestamp using FFmpeg delay filters, then overlaid onto the visual track as AAC audio.

Timeline Alignment

Each VO clip is delayed by the cumulative duration of preceding scenes, so narration starts exactly when its scene appears.

Multi-Track Mixing

Multiple VO tracks are mixed together using FFmpeg's amix filter, producing a single audio stream that spans the entire video.

Graceful Fallback

If no scenes have voiceover audio, the render produces a silent MP4, no errors, no empty audio tracks.

MCP Workflow

The full voiceover workflow is available through MCP tools for AI-assisted video creation. When creating a video via MCP, the AI will ask whether you want voiceover if you don't specify. Use list_voices to browse available voices, and set per-scene voices via voiceover.voiceId on each scene.

create_video: Include voiceover.text on each scene
generate_voiceover: Generates MP3 clips and auto-sizes scene durations
render_video: Renders visuals and mixes VO audio into the final MP4

Writing Effective Narration

Write conversationally, as if speaking to a colleague
Keep sentences short and punchy
Narrate the story, don't read on-screen text
Add natural pauses with punctuation
Match the VO tone to the scene mood

Avoid

Reading headlines or button text verbatim
Long, complex sentences
Jargon your audience won't understand
Narrating for scenes that work better silent
Rushing, let auto-sizing handle timing

Templates: Scene building blocks
Rendering: Pipeline and exports
Batch & Pacing: Variations and platform pacing

Want this page as machine-readable markdown? GET /docs/features/video-generation/voiceover.md