Animated captions, burned-in
Render animated, per-word captions directly into the MP4 from a transcript JSON file.
Ship per-word animated captions in fifteen minutes. No SRT muxing, no caption library, no font-loading flake — captions are just DOM nodes, animated by CSS, rendered into the frame.
What you'll learn
- Mapping a transcript JSON into timed
<span>s - Driving the animation with
data-startanddata-duration - A tiny build step that goes from Whisper output to ready-to-render HTML
The result
The transcript
Whisper, Deepgram, AssemblyAI — they all emit something close to this shape:
{
"words": [
{ "text": "Captions", "start": 0.10, "end": 0.42 },
{ "text": "are", "start": 0.45, "end": 0.58 },
{ "text": "just", "start": 0.80, "end": 1.02 },
{ "text": "DOM", "start": 1.15, "end": 1.40 },
{ "text": "nodes", "start": 1.50, "end": 1.78 },
{ "text": "now.", "start": 1.85, "end": 2.20 }
]
}The build step
A short script turns the transcript into spans with timing baked in:
// build-captions.mjs
import { readFileSync, writeFileSync } from "node:fs";
const { words } = JSON.parse(readFileSync("transcript.json", "utf8"));
const spans = words.map(w => {
const duration = (w.end - w.start).toFixed(3);
return `<span class="w"
data-start="${w.start}"
data-duration="${duration}"
style="animation-delay:${w.start}s">${w.text}</span>`;
}).join("\n ");
const html = readFileSync("template.html", "utf8").replace("{{CAPTIONS}}", spans);
writeFileSync("out.html", html);The template is a plain document with one slot:
<!doctype html>
<html>
<head>
<style>
body { margin:0; height:100vh; background:#000; color:#fff;
font: 700 56px/1.2 ui-sans-serif, system-ui;
display:grid; place-items:end center; padding-bottom:80px; }
.cap { max-width:80vw; text-align:center; text-wrap: balance; }
.w { opacity:0; display:inline-block; margin:0 6px;
animation: pop .35s cubic-bezier(.2,.9,.2,1) both; }
@keyframes pop {
from { opacity:0; transform: translateY(8px) scale(.96); }
to { opacity:1; transform:none; }
}
</style>
</head>
<body>
<p class="cap">{{CAPTIONS}}</p>
</body>
</html>Render:
node build-captions.mjs
hyperframes render out.html --out captioned.mp4 --crf 18 --workers 4Pairing captions with a clip
If the captions sit on top of a video clip, both share the same timeline. The clip element uses data-start / data-duration for its own timing; captions use data-start for their per-word reveal. No coordination required — both read from frame-pinned time.
<video class="clip" src="hero.webm"
data-start="0" data-duration="8"></video>
<p class="cap" data-start="0" data-duration="8">
<span class="w" data-start="0.10" style="animation-delay:0.10s">Captions</span>
...
</p>Styling that survives platform compression
Three things make burned-in captions readable after TikTok's compressor mangles them:
- A semi-opaque shadow under each word:
text-shadow: 0 2px 8px rgba(0,0,0,.6); - A minimum font size of ~48px at 1080p.
- Avoid pure white on bright backgrounds —
#f8fafcreads as white and bands less.
Tweak it
- Swap
popforslide-upif your brand voice is calmer. - Highlight the active word by giving it a different color via
:nth-childmatched to playback time. - Add
data-fade="out:0.2"to fade the whole caption block as the clip ends.