Render 10,000 variants overnight
A walk through the actual infrastructure we use to ship ten thousand personalized video variants in a single overnight job. CI architecture, fanout, caching, and the math of variant pricing.
The first time a customer asked us to render ten thousand video variants, we said yes before we knew how. That was nine months ago. We have since shipped that customer's overnight batch every weekday since, plus a half-dozen similar batches for other teams, and the architecture has hardened into something we can describe without hand-waving. I want to spend this post walking through it, with real numbers, because "render at scale" is one of those features that is easy to claim and hard to do correctly.
The setup: a fintech customer running a referral campaign. Each user gets a personalized 12-second video showing their referral count, their dollars earned, their next milestone, and a sign-off card. The variant is parameterized by user ID. They have around 10,000 active referrers. Every weeknight at 11pm Eastern, our system renders fresh videos for everyone whose stats changed that day — typically 4,000 to 7,000 variants — and uploads them to S3 by 6am.
The math: 5,000 variants × 12 seconds × 30 fps = 1.8 million frames per night. At ~10ms per frame on warm Chromium that is around 5 CPU-hours of work. Spread across 32 cores it is ten minutes wall time. In practice the batch takes 35-50 minutes, because the long pole is encoding and upload, not rendering. We will get to those.
The architecture, briefly
The system has four components and they live in this order.
The orchestrator is a small Node service (the same shape we document for the GitHub Actions integration) that wakes up on a schedule, queries the customer's data warehouse for the delta of users whose stats changed, and enqueues a render job per user into Redis. The job payload is the user's parameter bag: { name, referralCount, earnings, milestone, ... }. The orchestrator does not render anything itself. Its only job is to fan out.
The renderer pool is a horizontally-scaled fleet of containers each running hyperframes serve --workers=4. Each container has four warm Chromium instances and pulls jobs from Redis. For a given job, the renderer fetches the parameter bag, materializes the composition (a single HTML template with substituted values), renders to MP4, uploads to S3, writes the result back to Redis. Average render time per 12s variant: 3.2 seconds.
The encoder layer is, surprisingly, not a separate service. ffmpeg runs in-process within each renderer worker, fed by frame capture. We tried separating it for theoretical clarity and reverted — the inter-process overhead exceeded the parallelism gain. Encode-while-capture is the right architecture for this scale.
The CDN layer is plain S3 with CloudFront in front of it. We write MP4s with content-hash filenames, set a far-future cache header, and let the orchestrator update a users/<id>.json pointer that downstream systems read. Cache invalidation is "don't" — the URL changes when the content changes.
The single template
Every variant in a 10,000-job batch is the same HTML template with different parameters. This is critical for two reasons.
The first reason is creative consistency. If you have 10,000 different files, you have 10,000 things that can drift. A change to the template — different easing, larger title, new sign-off — must propagate to every variant in the next batch. With one template, the change is one diff.
The second reason is rendering performance. We pre-warm the Chromium instances by loading the template once with placeholder parameters, then for each job we update the parameter values in-place via window.postMessage. Chromium does not have to reparse the document, reload fonts, or recompile CSS. The warm path is around 1.4 seconds per variant. The cold path (full reload) is around 3 seconds.
Here is the template's parameter contract, simplified:
<script>
window.addEventListener("message", (e) => {
if (e.data?.type !== "hf-params") return;
const p = e.data.params;
document.documentElement.style.setProperty("--ref-count", p.referralCount);
document.documentElement.style.setProperty("--earnings", p.earnings);
document.getElementById("name").textContent = p.name;
document.getElementById("milestone").textContent = p.milestone;
window.dispatchEvent(new Event("hf-params-ready"));
});
</script>The renderer worker sends the message before starting the seek loop, and waits for hf-params-ready to confirm the DOM is updated. Three milliseconds of overhead per variant, versus two seconds of cold reload. Worth it.
The caching strategy
Here is a surprise: about 40% of our nightly batch never actually renders, because we cache aggressively.
The cache key for a variant is sha256(template_hash + parameter_bag). When a job comes in, we hash the parameters with the template version and check Redis. If the hash is present, we already rendered this variant — point the user at the existing MP4 and skip the work.
You might expect parameter bags to be unique enough that caching never hits. They are not. The fintech customer has many users with referralCount: 0, earnings: 0, milestone: "first referral". Those users get the same video. We render it once, serve it to all of them. The cache hit rate hovers around 38-44% depending on the day.
The lesson generalizes: when you parameterize a video, your parameter space is much smaller than your audience. A 10,000-user campaign might have 6,000 distinct parameter bags. The other 4,000 are duplicates. Cache them.
The failure modes at scale
Three things break when you go from rendering one video to rendering ten thousand. Each one took us a month or two to find.
Memory leaks in Chromium. A warm Chromium instance, rendering for hours, accumulates memory. Around the 300th variant, RSS creeps past 2GB. Around the 500th, the renderer crashes. The fix is to recycle workers after every N renders — we use N=200. Each recycle costs us ~800ms but prevents a far more expensive crash. The recycle is invisible to the orchestrator because Redis just rebalances pending jobs to other workers.
Thundering herd on S3. When 5,000 workers all try to upload to the same bucket prefix simultaneously, you can hit S3's per-prefix throughput limits and get 503s. We fixed this by hashing the user ID into the S3 key prefix, distributing writes across hundreds of prefixes. S3 scales by prefix; we just had to give it the prefixes.
Font load failures under load. Our first deploys fetched fonts from Google Fonts on every render. At 5,000 simultaneous renders, Google Fonts started rate-limiting us and some fonts arrived as 503s. The composition rendered with fallback fonts and the customer was unhappy. We now bundle every font as a base64-embedded data URL in the template. The HTML is larger; the failure mode is gone.
The cost math
I want to talk about money because nobody else does, and the math is important for understanding when this approach makes sense.
A single 12-second variant on our infrastructure costs about $0.004 in compute. Add storage (negligible at MP4 sizes) and bandwidth (CloudFront at $0.085/GB, average MP4 is ~3MB, so $0.00026/variant). Total: roughly half a cent per variant.
At 5,000 variants per night, the batch costs us about $23 in cloud compute and bandwidth. For a campaign that runs every weekday for a month: ~$500 total. Compare to the labor cost of producing 5,000 personalized videos in After Effects: it does not exist as a category because it is impossible.
This is the cost structure that changes the question. When personalized video at scale costs half a cent per recipient, marketers stop asking "should we make a video for this campaign" and start asking "should we make a hundred thousand videos for this campaign." The pricing is what unlocks the volume.
What you need to run this yourself
You do not need our infrastructure to render at scale. You need three things. The developer hub has working examples of each.
A queue. Redis, SQS, anything with at-least-once delivery and a sensible retry model. We use Redis because we already had it; you should use whatever your team operates.
Renderer workers. Containerized HyperFrames instances with hyperframes serve --workers=N — or, if you are already on Vercel, the Vercel integration handles the worker fleet for you. Run them on EC2, on Fly, on Kubernetes, on whatever you operate. The renderer is stateless; horizontal scaling is the same as for any web worker.
A storage and distribution layer. S3 + CloudFront is the obvious answer. Cloudflare R2 + Cloudflare CDN is cheaper. Whatever you pick, make sure your filenames are content-hashed so cache invalidation is trivial.
The orchestration is the place where every team customizes. Our orchestrator queries Postgres; yours might query Snowflake, Segment, or a SaaS marketing platform. The contract with the renderer is the same: enqueue a parameter bag, get back an MP4 URL.
The next bottleneck
At our current scale, the bottleneck is upload, not render. A 12-second 1080p MP4 is around 3MB. Five thousand variants is 15GB of upload from our render fleet to S3 every night. At our current network bandwidth that takes about 22 minutes — longer than the actual rendering. We are exploring two paths.
The first is rendering directly into S3 via multipart upload as the encoder produces bytes. The pipe goes straight from ffmpeg to S3 without ever touching the renderer's local disk. Early prototypes show a 30% latency reduction. We will ship it as --upload-direct once the failure modes are characterized.
The second is rendering smaller files. Most of our customers' viewers watch on phones; 720p is plenty. We are adding adaptive resolution per variant based on the destination platform. A variant going to email might render at 480p with a CTA to "watch in HD on the website." This is media engineering 101 but we have been late to it.
When neither of those is the bottleneck, the next one is going to be content generation — coming up with parameter bags interesting enough to justify the variants. That is a creative problem, and it is the one the agentic loop solves. But that is a different post.
A note on observability
One thing we underestimated when we started running production batches at this scale is how important observability becomes. When a single render fails, you read the error and fix it. When 1 in 200 of 5,000 renders fails, you cannot read 25 errors and fix them individually. You need aggregated metrics.
Our observability stack for the batch pipeline is, in priority order: a Prometheus exporter on every renderer that ships per-job timing and outcome; a dashboard that breaks failures down by error class; structured JSON logs for every failed render, ingested into a search-friendly store; and a daily Slack digest of the previous night's batch ("4,832 variants rendered in 38 minutes, 12 failed, here are the categories").
The dashboard is the artifact we look at every morning. It has three numbers that matter: total variants in the batch, p95 render time, and failure rate. If any of those drift, we investigate before the customer notices. Most mornings, all three are flat and the page is uninteresting. That is the goal.
When this approach is wrong
I want to be honest about a category of work where batch rendering at this scale is the wrong answer.
If your videos are bespoke — every one is a different composition, with different brand, different structure, different timing — you do not have a batch problem. You have a creative production problem. HyperFrames helps with that too, but the leverage is in agentic authoring, not in batch infrastructure. The customers I described in this post all have one template per campaign. They get leverage from variant volume, not from variant diversity.
If your videos need to be long-form — three-minute documentaries, five-minute explainers — the batch math changes. Three minutes at 30fps is 5,400 frames per variant. Five thousand variants would be 27 million frames. The render cost climbs from cents to dollars per variant, and the wall time of the batch climbs from an hour to most of a day. That is still tractable but it is no longer "overnight." Plan accordingly.
If your videos have heavy 3D, particle systems, or complex shaders, the render time per variant climbs beyond what a pool of CPU workers can absorb. You start to want GPU acceleration. We are working on GPU renderer pools, but they are not what we run in production today. For the heavy 3D case, render in an AE-class tool and use HyperFrames for the templated overlays.
Until then: yes, you can render ten thousand variants overnight. Yes, the math works. Yes, the failure modes are tractable. The infrastructure has caught up. Now go ship a campaign.
Cite this postBibTeX · APA · Markdown
@misc{team2026render,
author = {HyperFrames Team},
title = {Render 10,000 variants overnight},
year = {2026},
url = {https://hyperframes.video/blog/render-10k-variants-overnight},
note = {HyperFrames blog}
}HyperFrames Team. (2026, May 13). Render 10,000 variants overnight. HyperFrames. https://hyperframes.video/blog/render-10k-variants-overnight
[Render 10,000 variants overnight](https://hyperframes.video/blog/render-10k-variants-overnight) — HyperFrames Team, 2026
We build the deterministic HTML-to-video pipeline at HyperFrames. We write here when we have something concrete to say.
Animated meme generator (deterministic, scriptable)
Build a scriptable meme video generator in HTML — top-text bottom-text reveal, punchline punch-scale, shaky-cam emphasis — and render reproducible MP4s from a CSV.
Generate 1,000 personalized videos from a CSV
One template, one CSV, one thousand MP4s. The pattern, the cost model, and the GitHub Actions matrix.
Why deterministic video rendering matters in CI
Two renders of the same source should produce byte-identical MP4s. Here is why that property is rare, and why it changes how teams ship video.
Building with HyperFrames? Come hang out.
We're on GitHub, in Discord, and the playground is one click away. Bring weird ideas — we collect them.