Voiceover is the spine, if the read breathes the pictures feel real. Pick intent first, warm for explainers, bright for hooks, calm for product pages, neutral for training. Write for the ear, short sentences, contractions, numbers the way people say them, two thousand twenty five not 2025, add light stage directions in brackets only when needed, smile, quick pause, softer.
Match words to picture, about 150 words per minute for explainers, 170 to 190 for short social hooks, 135 to 155 for tutorials. Record voice first when you can and cut visuals to it, if you must fit an existing cut, trim adjectives, split long clauses, leave one second of silence at the start so the first word is not crushed by music.
Shape gently, a light high pass, a soft two to one compressor, a touch of de ess, a little room tone under the track, music about ten to twelve decibels under the voice with a tiny dip under key lines. Protect pronunciation with saved phonetic hints or a brand dictionary.
Fast tools and how to use them, in one line each
- ElevenLabs, paste script, pick voice, set speed near 0.95 to 1.0, add commas for micro pauses, export WAV and mix under music.
- PlayHT, choose voice and speaking rate, add pause tags for breath, keep a pronunciation list, export high quality WAV.
- Descript, generate TTS in a project, nudge emphasis and timing on the script, export or finish the whole edit inside.
- CapCut, drop your cut, use Text to Speech, slow slightly for explainers, lower music, export 1080 by 1920 for vertical.
- Synthesia or DeepBrain, paste script, pick avatar and pace, preview pronunciations, export audio or the full video.
Mini checklist, one idea per sentence, commas mark breaths, contractions everywhere, numbers written how people speak, pronunciation notes for any tricky word, one line at the top with pace and mood.



