Personalized video at scale
Render 10,000 personalized clips overnight on a queue of stateless workers.
A queue, a pool of stateless workers, a shared cache, and an S3 bucket. That's the entire architecture for shipping six-figure-per-day personalized video volumes on a single mid-sized box.
What you'll learn
- How to size a worker pool against vCPU count
- A queue pattern that survives crashed workers and partial reruns
- How HyperFrames' frame cache turns repeat renders into near-free copies
The throughput target
That number assumes a 6-second 1080p clip at CRF 18, a template under 200ms of layout work, and the frame cache warm. Your numbers will be in the same order of magnitude.
The architecture
[CSV / DB] → [variants chunker] → [SQS / Redis queue]
│
┌────────────────────────┼────────────────────────┐
[worker 1] [worker 2] [worker N]
hyperframes render hyperframes render hyperframes render
│ │ │
└──────────► [S3 bucket: out/{user_id}.mp4] ◄─────┘
│
[done queue]Workers are stateless. Crash one mid-job and the message goes back on the queue. No coordination, no leader election, no shared filesystem required (beyond the cache, which is optional).
Sizing the worker pool
A good first guess: one worker per 2 vCPUs, each worker invoked with --workers 2. That leaves headroom for FFmpeg's threads and the message-handling loop.
| vCPUs | Workers | --workers per render | Concurrent renders |
|---|---|---|---|
| 4 | 2 | 2 | 2 |
| 8 | 4 | 2 | 4 |
| 16 | 8 | 2 | 8 |
| 32 | 12 | 2 | 12 |
Past 16 vCPUs, memory bandwidth and disk I/O start to dominate. Add another box before adding more workers per box.
The worker
A worker is twenty lines of code around hyperframes render:
// worker.mjs
import { execFile } from "node:child_process";
import { promisify } from "node:util";
const run = promisify(execFile);
while (true) {
const msg = await queue.receive(); // long-poll
if (!msg) continue;
const { template, vars, out } = JSON.parse(msg.Body);
try {
await run("hyperframes", [
"render", template,
"--out", `/tmp/${msg.MessageId}.mp4`,
"--vars", ...Object.entries(vars).map(([k, v]) => `${k}=${v}`),
"--workers", "2",
"--quiet",
]);
await s3.putObject({ Bucket: BUCKET, Key: out, Body: fs.createReadStream(...) });
await queue.delete(msg);
} catch (e) {
// leave the message on the queue; visibility timeout will re-deliver
console.error("render failed", msg.MessageId, e);
}
}That's it. Run N copies under pm2 or as a systemd template unit.
The frame cache
HyperFrames caches rendered frames keyed by the resolved input (HTML + tokens + viewport + codec settings). When you re-run a batch where most rows haven't changed — say, the template has a global "year" token and you've only updated a handful of names — unchanged frames come back from disk instead of being rendered again.
# warm the cache for a representative subset, then run the full batch
hyperframes render card.html --variants sample.json --workers 4
hyperframes render card.html --variants full.json --workers 4To bypass the cache for one run (debugging, mostly):
hyperframes render card.html --variants variants.json --no-cacheOutput to S3
Two patterns work. Either write to local disk and upload after a successful render, or mount an S3 filesystem (s3fs, mountpoint-s3) and let --out write straight to the bucket. The first is more reliable; the second is simpler.
For the upload pattern, key by a content hash of the variant so retries are idempotent:
const key = `out/${sha256(JSON.stringify(vars))}.mp4`;Observability
Pipe --json to your log shipper. Each render emits one JSON line with the input hash, render duration, and frame count. Grep for slow renders, alert on failures, dashboard the throughput.
hyperframes render card.html --variants chunk.json --json | tee -a /var/log/hf-render.logTweak it
- Shard
variants.jsonby content hash and route each shard to a dedicated worker — cache hit rates skyrocket. - Use spot instances for the worker pool; the at-most-once queue semantics tolerate eviction.
- Render at 720p for the first pass, 1080p only for high-engagement segments.