Audio & voiceover

Layer music, voiceover, and sound effects with frame-accurate sync against your animation.

Layer voiceover over bed music, duck the music while the VO speaks, drop in a confirmation SFX at exactly the right beat. Twenty minutes from script to mixed MP4.

What you'll learn

The <audio> model: data-start, data-trim-start, data-volume
Ducking music under voiceover with timed volume changes
A complete VO + music + SFX composition

The model

Audio tracks are <audio> elements with the same timing grammar as your visuals. The renderer picks them up, FFmpeg mixes them in.

Attribute	Meaning
`data-start`	When the track enters the timeline (seconds)
`data-duration`	How long it plays before fade-out
`data-trim-start`	Offset into the source file
`data-volume`	Linear gain, `0.0` to `1.0`
`data-fade`	`in:0.5,out:0.8` for envelope fades

That's the whole API. The visual side of the document is unchanged; audio just lives in the same DOM.

A full composition

A 10-second promo with intro music, a voiceover line, a confirmation chime at the punchline, and music that ducks under the VO.

html

<!doctype html>
<html>
<head>
<style>
  body { margin:0; height:100vh; background:#0f172a; color:#fff;
         display:grid; place-items:center;
         font: 700 64px/1.1 ui-sans-serif, system-ui;
         letter-spacing:-0.02em; }
  h2 { opacity:0; animation: rise .9s cubic-bezier(.2,.9,.2,1) 1.2s both; }
  @keyframes rise { from { opacity:0; transform: translateY(12px); } to { opacity:1; transform:none; } }
</style>
</head>
<body>
  <h2>Render in your sleep.</h2>

  <!-- Bed music: full length, ducked under VO -->
  <audio src="bed.mp3"
         data-start="0"
         data-duration="10"
         data-volume="0.8"
         data-fade="in:0.6,out:1.0"></audio>

  <!-- VO: enters at 1.2s, runs ~5s -->
  <audio src="vo.wav"
         data-start="1.2"
         data-duration="5.2"
         data-volume="1.0"
         data-fade="in:0.1,out:0.2"></audio>

  <!-- Music duck: drop bed to 0.25 while VO is talking -->
  <audio src="bed.mp3"
         data-start="1.2"
         data-duration="5.2"
         data-volume="0.25"
         data-trim-start="1.2"
         data-fade="in:0.3,out:0.5"></audio>

  <!-- SFX: confirmation chime on the punchline -->
  <audio src="chime.wav"
         data-start="6.6"
         data-duration="0.8"
         data-volume="0.9"></audio>
</body>
</html>

bash

hyperframes render promo.html --out promo.mp4 --crf 18 --workers 4

Syncing SFX to the animation

The visual punchline ("Render in your sleep.") rises at t = 1.2s and finishes its ease-in at roughly t = 2.1s. If you want a soft tick exactly when the type lands:

html

<audio src="tick.wav" data-start="2.10" data-duration="0.3" data-volume="0.6"></audio>

Because the renderer is frame-pinned, that 2.10s is exact. There's no audio drift over the length of the clip — frame N always plays sample N.

Trimming a source

If your bed music starts with two seconds of silence you don't want, trim it instead of re-editing:

html

<audio src="bed.mp3"
       data-start="0"
       data-duration="10"
       data-trim-start="2.0"
       data-volume="0.8"></audio>

data-trim-start shifts the playhead inside the source, leaving the timeline position untouched.

If you're generating VO with a TTS provider (ElevenLabs, Azure, OpenAI), save the audio next to the template and reference it directly. The TTS step is the only non-deterministic part — once the WAV exists, every render of the composition produces identical output.

bash

# 1. Generate VO (non-deterministic)
node tts.mjs --text "Render in your sleep." --out vo.wav

# 2. Render (deterministic, byte-identical across runs)
hyperframes render promo.html --out promo.mp4 --crf 18

For batch personalization, generate one VO per variant and reference it via a token:

html

<audio src="{{$VO_PATH}}" data-start="1.2" data-duration="{{$VO_DURATION}}" data-volume="1.0"></audio>

Keep VO peaks around -3 dBFS; platforms will normalize and a hotter master gets squashed harder.
Duck music to about 25–30% of full level under VO. Less than that feels disconnected; more than that fights the voice.
Add a short (200–400ms) tail of music after the VO ends — abrupt level jumps are jarring.

Tweak it

Add data-fade="in:1.5" to a long musical bed for a slow fade-up under titles.
Layer a second SFX (a low boom) under your existing chime to add weight without changing the visual cue.
Use data-volume to mix down a stem you want present but not foregrounded.

Audio & voiceover

Layer music, voiceover, and sound effects with frame-accurate sync against your animation.

Layer voiceover over bed music, duck the music while the VO speaks, drop in a confirmation SFX at exactly the right beat. Twenty minutes from script to mixed MP4.

What you'll learn

The <audio> model: data-start, data-trim-start, data-volume
Ducking music under voiceover with timed volume changes
A complete VO + music + SFX composition

The model

Audio tracks are <audio> elements with the same timing grammar as your visuals. The renderer picks them up, FFmpeg mixes them in.

Attribute	Meaning
`data-start`	When the track enters the timeline (seconds)
`data-duration`	How long it plays before fade-out
`data-trim-start`	Offset into the source file
`data-volume`	Linear gain, `0.0` to `1.0`
`data-fade`	`in:0.5,out:0.8` for envelope fades

That's the whole API. The visual side of the document is unchanged; audio just lives in the same DOM.

A full composition

A 10-second promo with intro music, a voiceover line, a confirmation chime at the punchline, and music that ducks under the VO.

html

<!doctype html>
<html>
<head>
<style>
  body { margin:0; height:100vh; background:#0f172a; color:#fff;
         display:grid; place-items:center;
         font: 700 64px/1.1 ui-sans-serif, system-ui;
         letter-spacing:-0.02em; }
  h2 { opacity:0; animation: rise .9s cubic-bezier(.2,.9,.2,1) 1.2s both; }
  @keyframes rise { from { opacity:0; transform: translateY(12px); } to { opacity:1; transform:none; } }
</style>
</head>
<body>
  <h2>Render in your sleep.</h2>

  <!-- Bed music: full length, ducked under VO -->
  <audio src="bed.mp3"
         data-start="0"
         data-duration="10"
         data-volume="0.8"
         data-fade="in:0.6,out:1.0"></audio>

  <!-- VO: enters at 1.2s, runs ~5s -->
  <audio src="vo.wav"
         data-start="1.2"
         data-duration="5.2"
         data-volume="1.0"
         data-fade="in:0.1,out:0.2"></audio>

  <!-- Music duck: drop bed to 0.25 while VO is talking -->
  <audio src="bed.mp3"
         data-start="1.2"
         data-duration="5.2"
         data-volume="0.25"
         data-trim-start="1.2"
         data-fade="in:0.3,out:0.5"></audio>

  <!-- SFX: confirmation chime on the punchline -->
  <audio src="chime.wav"
         data-start="6.6"
         data-duration="0.8"
         data-volume="0.9"></audio>
</body>
</html>

bash

hyperframes render promo.html --out promo.mp4 --crf 18 --workers 4

Syncing SFX to the animation

The visual punchline ("Render in your sleep.") rises at t = 1.2s and finishes its ease-in at roughly t = 2.1s. If you want a soft tick exactly when the type lands:

html

<audio src="tick.wav" data-start="2.10" data-duration="0.3" data-volume="0.6"></audio>

Because the renderer is frame-pinned, that 2.10s is exact. There's no audio drift over the length of the clip — frame N always plays sample N.

Trimming a source

If your bed music starts with two seconds of silence you don't want, trim it instead of re-editing:

html

<audio src="bed.mp3"
       data-start="0"
       data-duration="10"
       data-trim-start="2.0"
       data-volume="0.8"></audio>

data-trim-start shifts the playhead inside the source, leaving the timeline position untouched.

Voiceover from TTS

bash

# 1. Generate VO (non-deterministic)
node tts.mjs --text "Render in your sleep." --out vo.wav

# 2. Render (deterministic, byte-identical across runs)
hyperframes render promo.html --out promo.mp4 --crf 18

For batch personalization, generate one VO per variant and reference it via a token:

html

<audio src="{{$VO_PATH}}" data-start="1.2" data-duration="{{$VO_DURATION}}" data-volume="1.0"></audio>

Keep VO peaks around -3 dBFS; platforms will normalize and a hotter master gets squashed harder.
Duck music to about 25–30% of full level under VO. Less than that feels disconnected; more than that fights the voice.
Add a short (200–400ms) tail of music after the VO ends — abrupt level jumps are jarring.

Tweak it

Add data-fade="in:1.5" to a long musical bed for a slow fade-up under titles.
Layer a second SFX (a low boom) under your existing chime to add weight without changing the visual cue.
Use data-volume to mix down a stem you want present but not foregrounded.

Audio & voiceover

What you'll learn

The model

A full composition

Syncing SFX to the animation

Trimming a source

Voiceover from TTS

Tweak it

Next

Audio & voiceover

What you'll learn

The model

A full composition

Syncing SFX to the animation

Trimming a source

Voiceover from TTS

Tweak it

Next