Voiceover
Add professional narration to your videos with text-to-speech generation and automatic scene duration sizing.
How It Works
Write Narration
Each scene card in the editor includes a text area for voiceover copy. Write what should be spoken during that scene.
Generate Speech
Click “Generate VO” in the editor header. The system sends each scene's text to ElevenLabs, generates an MP3 clip, measures its duration, and uploads it to cloud storage.
Auto-Size Durations
Scene durations are automatically adjusted to match the VO audio length plus configurable padding (default: 0.5 seconds). This ensures narration never gets cut off and scenes don't linger.
Preview & Iterate
Play VO clips inline in each scene card. Edit the text and regenerate as needed before rendering the final video.
Features
Text-to-Speech
- • Powered by ElevenLabs multilingual TTS
- • 18 curated voices (10 male, 8 female) with varied accents
- • High-quality MP3 output
- • Per-scene generation — only regenerate what changed
Voice Selection
- • Set a site-wide default voice in Settings > Branding
- • Override per scene with the voice picker dropdown
- • Mix voices within a video — different narrator per scene
- • Paste any ElevenLabs voice ID as a custom voice
- • Bring your own ElevenLabs API key for cloned voices
- • Fallback chain: scene → video → custom → site default → Brian
Auto-Sizing
- • Duration measured via audio probe
- • Configurable padding after narration
- • Minimum duration enforced if VO is very short
- • Total video duration updated automatically
Editor Integration
- • Inline text area per scene card
- • Play / pause audio preview
- • VO duration shown alongside scene duration
- • One-click regeneration from header
Storage & Delivery
- • Audio files uploaded to cloud storage
- • CDN-backed public URLs
- • Organized per video and scene
- • Previous versions overwritten on regeneration
Bring Your Own ElevenLabs Key
By default, voiceover generation uses the platform's shared ElevenLabs account. If you need access to custom cloned voices or want to use your own usage quota, you can connect your own ElevenLabs API key.
Setup
Go to Settings > Integrations > ElevenLabs and paste your API key. All voiceover generation will immediately switch to your account.
What It Unlocks
- • Use your own cloned or custom voices
- • Access professional voice cloning features
- • Use your own ElevenLabs usage quota
- • Falls back to the platform key if removed
Audio Mixing
When you render a video, voiceover audio is automatically mixed into the final MP4. Each scene's VO clip is timeline-aligned to start at the correct timestamp using FFmpeg delay filters, then overlaid onto the visual track as AAC audio.
Timeline Alignment
Each VO clip is delayed by the cumulative duration of preceding scenes, so narration starts exactly when its scene appears.
Multi-Track Mixing
Multiple VO tracks are mixed together using FFmpeg's amix filter, producing a single audio stream that spans the entire video.
Graceful Fallback
If no scenes have voiceover audio, the render produces a silent MP4 — no errors, no empty audio tracks.
MCP Workflow
The full voiceover workflow is available through MCP tools for AI-assisted video creation. When creating a video via MCP, the AI will ask whether you want voiceover if you don't specify. Use list_voices to browse available voices, and set per-scene voices via voiceover.voiceId on each scene.
voiceover.text on each sceneWriting Effective Narration
Do
- • Write conversationally — as if speaking to a colleague
- • Keep sentences short and punchy
- • Narrate the story, don't read on-screen text
- • Add natural pauses with punctuation
- • Match the VO tone to the scene mood
Avoid
- • Reading headlines or button text verbatim
- • Long, complex sentences
- • Jargon your audience won't understand
- • Narrating for scenes that work better silent
- • Rushing — let auto-sizing handle timing