Voiceover

Add professional narration to your videos with text-to-speech generation and automatic scene duration sizing.

How It Works

1

Write Narration

Each scene card in the editor includes a text area for voiceover copy. Write what should be spoken during that scene.

2

Generate Speech

Click “Generate VO” in the editor header. The system sends each scene's text to ElevenLabs, generates an MP3 clip, measures its duration, and uploads it to cloud storage.

3

Auto-Size Durations

Scene durations are automatically adjusted to match the VO audio length plus configurable padding (default: 0.5 seconds). This ensures narration never gets cut off and scenes don't linger.

4

Preview & Iterate

Play VO clips inline in each scene card. Edit the text and regenerate as needed before rendering the final video.

Features

Text-to-Speech

  • • Powered by ElevenLabs multilingual TTS
  • • 18 curated voices (10 male, 8 female) with varied accents
  • • High-quality MP3 output
  • • Per-scene generation — only regenerate what changed

Voice Selection

  • • Set a site-wide default voice in Settings > Branding
  • • Override per scene with the voice picker dropdown
  • • Mix voices within a video — different narrator per scene
  • • Paste any ElevenLabs voice ID as a custom voice
  • • Bring your own ElevenLabs API key for cloned voices
  • • Fallback chain: scene → video → custom → site default → Brian

Auto-Sizing

  • • Duration measured via audio probe
  • • Configurable padding after narration
  • • Minimum duration enforced if VO is very short
  • • Total video duration updated automatically

Editor Integration

  • • Inline text area per scene card
  • • Play / pause audio preview
  • • VO duration shown alongside scene duration
  • • One-click regeneration from header

Storage & Delivery

  • • Audio files uploaded to cloud storage
  • • CDN-backed public URLs
  • • Organized per video and scene
  • • Previous versions overwritten on regeneration

Bring Your Own ElevenLabs Key

By default, voiceover generation uses the platform's shared ElevenLabs account. If you need access to custom cloned voices or want to use your own usage quota, you can connect your own ElevenLabs API key.

Setup

Go to Settings > Integrations > ElevenLabs and paste your API key. All voiceover generation will immediately switch to your account.

What It Unlocks

  • • Use your own cloned or custom voices
  • • Access professional voice cloning features
  • • Use your own ElevenLabs usage quota
  • • Falls back to the platform key if removed

Audio Mixing

When you render a video, voiceover audio is automatically mixed into the final MP4. Each scene's VO clip is timeline-aligned to start at the correct timestamp using FFmpeg delay filters, then overlaid onto the visual track as AAC audio.

Timeline Alignment

Each VO clip is delayed by the cumulative duration of preceding scenes, so narration starts exactly when its scene appears.

Multi-Track Mixing

Multiple VO tracks are mixed together using FFmpeg's amix filter, producing a single audio stream that spans the entire video.

Graceful Fallback

If no scenes have voiceover audio, the render produces a silent MP4 — no errors, no empty audio tracks.

MCP Workflow

The full voiceover workflow is available through MCP tools for AI-assisted video creation. When creating a video via MCP, the AI will ask whether you want voiceover if you don't specify. Use list_voices to browse available voices, and set per-scene voices via voiceover.voiceId on each scene.

1
create_video— Include voiceover.text on each scene
2
generate_voiceover— Generates MP3 clips and auto-sizes scene durations
3
render_video— Renders visuals and mixes VO audio into the final MP4

Writing Effective Narration

Do

  • • Write conversationally — as if speaking to a colleague
  • • Keep sentences short and punchy
  • • Narrate the story, don't read on-screen text
  • • Add natural pauses with punctuation
  • • Match the VO tone to the scene mood

Avoid

  • • Reading headlines or button text verbatim
  • • Long, complex sentences
  • • Jargon your audience won't understand
  • • Narrating for scenes that work better silent
  • • Rushing — let auto-sizing handle timing
Documentation - BlackOps Center | BlackOps Center