Audio & voiceover
Layer music, voiceover, and sound effects with frame-accurate sync against your animation.
Layer voiceover over bed music, duck the music while the VO speaks, drop in a confirmation SFX at exactly the right beat. Twenty minutes from script to mixed MP4.
What you'll learn
- The
<audio>model:data-start,data-trim-start,data-volume - Ducking music under voiceover with timed volume changes
- A complete VO + music + SFX composition
The model
Audio tracks are <audio> elements with the same timing grammar as your visuals. The renderer picks them up, FFmpeg mixes them in.
| Attribute | Meaning |
|---|---|
data-start | When the track enters the timeline (seconds) |
data-duration | How long it plays before fade-out |
data-trim-start | Offset into the source file |
data-volume | Linear gain, 0.0 to 1.0 |
data-fade | in:0.5,out:0.8 for envelope fades |
That's the whole API. The visual side of the document is unchanged; audio just lives in the same DOM.
A full composition
A 10-second promo with intro music, a voiceover line, a confirmation chime at the punchline, and music that ducks under the VO.
<!doctype html>
<html>
<head>
<style>
body { margin:0; height:100vh; background:#0f172a; color:#fff;
display:grid; place-items:center;
font: 700 64px/1.1 ui-sans-serif, system-ui;
letter-spacing:-0.02em; }
h2 { opacity:0; animation: rise .9s cubic-bezier(.2,.9,.2,1) 1.2s both; }
@keyframes rise { from { opacity:0; transform: translateY(12px); } to { opacity:1; transform:none; } }
</style>
</head>
<body>
<h2>Render in your sleep.</h2>
<!-- Bed music: full length, ducked under VO -->
<audio src="bed.mp3"
data-start="0"
data-duration="10"
data-volume="0.8"
data-fade="in:0.6,out:1.0"></audio>
<!-- VO: enters at 1.2s, runs ~5s -->
<audio src="vo.wav"
data-start="1.2"
data-duration="5.2"
data-volume="1.0"
data-fade="in:0.1,out:0.2"></audio>
<!-- Music duck: drop bed to 0.25 while VO is talking -->
<audio src="bed.mp3"
data-start="1.2"
data-duration="5.2"
data-volume="0.25"
data-trim-start="1.2"
data-fade="in:0.3,out:0.5"></audio>
<!-- SFX: confirmation chime on the punchline -->
<audio src="chime.wav"
data-start="6.6"
data-duration="0.8"
data-volume="0.9"></audio>
</body>
</html>hyperframes render promo.html --out promo.mp4 --crf 18 --workers 4Syncing SFX to the animation
The visual punchline ("Render in your sleep.") rises at t = 1.2s and finishes its ease-in at roughly t = 2.1s. If you want a soft tick exactly when the type lands:
<audio src="tick.wav" data-start="2.10" data-duration="0.3" data-volume="0.6"></audio>Because the renderer is frame-pinned, that 2.10s is exact. There's no audio drift over the length of the clip — frame N always plays sample N.
Trimming a source
If your bed music starts with two seconds of silence you don't want, trim it instead of re-editing:
<audio src="bed.mp3"
data-start="0"
data-duration="10"
data-trim-start="2.0"
data-volume="0.8"></audio>data-trim-start shifts the playhead inside the source, leaving the timeline position untouched.
Voiceover from TTS
If you're generating VO with a TTS provider (ElevenLabs, Azure, OpenAI), save the audio next to the template and reference it directly. The TTS step is the only non-deterministic part — once the WAV exists, every render of the composition produces identical output.
# 1. Generate VO (non-deterministic)
node tts.mjs --text "Render in your sleep." --out vo.wav
# 2. Render (deterministic, byte-identical across runs)
hyperframes render promo.html --out promo.mp4 --crf 18For batch personalization, generate one VO per variant and reference it via a token:
<audio src="{{$VO_PATH}}" data-start="1.2" data-duration="{{$VO_DURATION}}" data-volume="1.0"></audio>Mixing tips that survive social compression
- Keep VO peaks around -3 dBFS; platforms will normalize and a hotter master gets squashed harder.
- Duck music to about 25–30% of full level under VO. Less than that feels disconnected; more than that fights the voice.
- Add a short (200–400ms) tail of music after the VO ends — abrupt level jumps are jarring.
Tweak it
- Add
data-fade="in:1.5"to a long musical bed for a slow fade-up under titles. - Layer a second SFX (a low boom) under your existing chime to add weight without changing the visual cue.
- Use
data-volumeto mix down a stem you want present but not foregrounded.