Personalized video at scale

Render 10,000 personalized clips overnight on a queue of stateless workers.

A queue, a pool of stateless workers, a shared cache, and an S3 bucket. That's the entire architecture for shipping six-figure-per-day personalized video volumes on a single mid-sized box.

What you'll learn

How to size a worker pool against vCPU count
A queue pattern that survives crashed workers and partial reruns
How HyperFrames' frame cache turns repeat renders into near-free copies

The throughput target

Renders/day on 16 vCPUs

That number assumes a 6-second 1080p clip at CRF 18, a template under 200ms of layout work, and the frame cache warm. Your numbers will be in the same order of magnitude.

The architecture

html

[CSV / DB] → [variants chunker] → [SQS / Redis queue]
                                       │
              ┌────────────────────────┼────────────────────────┐
        [worker 1]                [worker 2]                [worker N]
   hyperframes render        hyperframes render        hyperframes render
              │                        │                        │
              └──────────► [S3 bucket: out/{user_id}.mp4] ◄─────┘
                                       │
                                 [done queue]

Workers are stateless. Crash one mid-job and the message goes back on the queue. No coordination, no leader election, no shared filesystem required (beyond the cache, which is optional).

Sizing the worker pool

A good first guess: one worker per 2 vCPUs, each worker invoked with --workers 2. That leaves headroom for FFmpeg's threads and the message-handling loop.

vCPUs	Workers	`--workers` per render	Concurrent renders
4	2	2	2
8	4	2	4
16	8	2	8
32	12	2	12

Past 16 vCPUs, memory bandwidth and disk I/O start to dominate. Add another box before adding more workers per box.

The worker

A worker is twenty lines of code around hyperframes render:

// worker.mjs
import { execFile } from "node:child_process";
import { promisify } from "node:util";
const run = promisify(execFile);

while (true) {
  const msg = await queue.receive();         // long-poll
  if (!msg) continue;
  const { template, vars, out } = JSON.parse(msg.Body);

  try {
    await run("hyperframes", [
      "render", template,
      "--out", `/tmp/${msg.MessageId}.mp4`,
      "--vars", ...Object.entries(vars).map(([k, v]) => `${k}=${v}`),
      "--workers", "2",
      "--quiet",
    ]);
    await s3.putObject({ Bucket: BUCKET, Key: out, Body: fs.createReadStream(...) });
    await queue.delete(msg);
  } catch (e) {
    // leave the message on the queue; visibility timeout will re-deliver
    console.error("render failed", msg.MessageId, e);
  }
}

That's it. Run N copies under pm2 or as a systemd template unit.

HyperFrames caches rendered frames keyed by the resolved input (HTML + tokens + viewport + codec settings). When you re-run a batch where most rows haven't changed — say, the template has a global "year" token and you've only updated a handful of names — unchanged frames come back from disk instead of being rendered again.

bash

# warm the cache for a representative subset, then run the full batch
hyperframes render card.html --variants sample.json --workers 4
hyperframes render card.html --variants full.json   --workers 4

To bypass the cache for one run (debugging, mostly):

bash

hyperframes render card.html --variants variants.json --no-cache

Output to S3

Two patterns work. Either write to local disk and upload after a successful render, or mount an S3 filesystem (s3fs, mountpoint-s3) and let --out write straight to the bucket. The first is more reliable; the second is simpler.

For the upload pattern, key by a content hash of the variant so retries are idempotent:

const key = `out/${sha256(JSON.stringify(vars))}.mp4`;

Observability

Pipe --json to your log shipper. Each render emits one JSON line with the input hash, render duration, and frame count. Grep for slow renders, alert on failures, dashboard the throughput.

bash

hyperframes render card.html --variants chunk.json --json | tee -a /var/log/hf-render.log

Tweak it

Shard variants.json by content hash and route each shard to a dedicated worker — cache hit rates skyrocket.
Use spot instances for the worker pool; the at-most-once queue semantics tolerate eviction.
Render at 720p for the first pass, 1080p only for high-engagement segments.

Personalized video at scale

Render 10,000 personalized clips overnight on a queue of stateless workers.

A queue, a pool of stateless workers, a shared cache, and an S3 bucket. That's the entire architecture for shipping six-figure-per-day personalized video volumes on a single mid-sized box.

What you'll learn

How to size a worker pool against vCPU count
A queue pattern that survives crashed workers and partial reruns
How HyperFrames' frame cache turns repeat renders into near-free copies

The throughput target

Renders/day on 16 vCPUs

That number assumes a 6-second 1080p clip at CRF 18, a template under 200ms of layout work, and the frame cache warm. Your numbers will be in the same order of magnitude.

The architecture

html

[CSV / DB] → [variants chunker] → [SQS / Redis queue]
                                       │
              ┌────────────────────────┼────────────────────────┐
        [worker 1]                [worker 2]                [worker N]
   hyperframes render        hyperframes render        hyperframes render
              │                        │                        │
              └──────────► [S3 bucket: out/{user_id}.mp4] ◄─────┘
                                       │
                                 [done queue]

Workers are stateless. Crash one mid-job and the message goes back on the queue. No coordination, no leader election, no shared filesystem required (beyond the cache, which is optional).

Sizing the worker pool

A good first guess: one worker per 2 vCPUs, each worker invoked with --workers 2. That leaves headroom for FFmpeg's threads and the message-handling loop.

vCPUs	Workers	`--workers` per render	Concurrent renders
4	2	2	2
8	4	2	4
16	8	2	8
32	12	2	12

Past 16 vCPUs, memory bandwidth and disk I/O start to dominate. Add another box before adding more workers per box.

The worker

A worker is twenty lines of code around hyperframes render:

// worker.mjs
import { execFile } from "node:child_process";
import { promisify } from "node:util";
const run = promisify(execFile);

while (true) {
  const msg = await queue.receive();         // long-poll
  if (!msg) continue;
  const { template, vars, out } = JSON.parse(msg.Body);

  try {
    await run("hyperframes", [
      "render", template,
      "--out", `/tmp/${msg.MessageId}.mp4`,
      "--vars", ...Object.entries(vars).map(([k, v]) => `${k}=${v}`),
      "--workers", "2",
      "--quiet",
    ]);
    await s3.putObject({ Bucket: BUCKET, Key: out, Body: fs.createReadStream(...) });
    await queue.delete(msg);
  } catch (e) {
    // leave the message on the queue; visibility timeout will re-deliver
    console.error("render failed", msg.MessageId, e);
  }
}

That's it. Run N copies under pm2 or as a systemd template unit.

The frame cache

bash

# warm the cache for a representative subset, then run the full batch
hyperframes render card.html --variants sample.json --workers 4
hyperframes render card.html --variants full.json   --workers 4

To bypass the cache for one run (debugging, mostly):

bash

hyperframes render card.html --variants variants.json --no-cache

Output to S3

For the upload pattern, key by a content hash of the variant so retries are idempotent:

const key = `out/${sha256(JSON.stringify(vars))}.mp4`;

Observability

Pipe --json to your log shipper. Each render emits one JSON line with the input hash, render duration, and frame count. Grep for slow renders, alert on failures, dashboard the throughput.

bash

hyperframes render card.html --variants chunk.json --json | tee -a /var/log/hf-render.log

Tweak it

Shard variants.json by content hash and route each shard to a dedicated worker — cache hit rates skyrocket.
Use spot instances for the worker pool; the at-most-once queue semantics tolerate eviction.
Render at 720p for the first pass, 1080p only for high-engagement segments.

Personalized video at scale

What you'll learn

The throughput target

The architecture

Sizing the worker pool

The worker

The frame cache

Output to S3

Observability

Tweak it

Next

Personalized video at scale

What you'll learn

The throughput target

The architecture

Sizing the worker pool

The worker

The frame cache

Output to S3

Observability

Tweak it

Next