Build a Year-in-Review video generator (Spotify-Wrapped style)
The Wrapped formula — a per-user year-end recap video — fits inside an HTML template and a deterministic renderer. Here's the architecture.
Every December for the last five years, Spotify Wrapped has eaten a week of cultural attention. The format has now generalized — Strava, GitHub, Apple Music, Duolingo all ship their own per-user year-end recap. The mechanics are not magic: a small set of HTML scene templates, a CSV of per-user stats, a deterministic renderer, and a queue.
Here is how it actually fits together.
The architecture
Three pieces:
- A scene library. 6-10 HTML scenes, each parameterized by user stats. "Top artist of the year." "Minutes listened." "Top genre." "Compared to last year." Each scene is a 4-6 second
render(t)animation. - A per-user data pipeline. SQL or warehouse query that, for every user, produces a JSON blob:
{ topArtist, minutes, topGenre, vsLastYear, ... }. - A renderer that joins the two. For each user, fill the templates with their data, render each scene, concatenate into one MP4.
The fundamental insight: the scenes are the same; the data is different. Spotify's Wrapped looks personalized because it is personalized — but the underlying templates are the same for every user.
The per-scene template
A KPI-card-pop scene, parameterized:
The template knobs:
TITLE— "Minutes listened in 2026"VALUE— the user's specific numberDELTA— comparison to last yearLABEL— small caps labelACCENT— brand color
The data joins at render time. The renderer pulls one row of CSV/JSON per user, substitutes the variables, and renders.
The concatenation step
Six scenes × 5 seconds = 30 seconds of video. Either:
- Render each scene to MP4, then concatenate with
ffmpeg concat. Simple, fault-tolerant, scales out naturally — each scene is an independent render job. - Render the whole 30-second timeline as one
render(t). More efficient (one Chromium load per user), but the template is more complex.
For a first version, go with option 1. The cost is small (one extra ffmpeg invocation per user) and the debug story is much better — if scene 4 has a glitch, you re-render scene 4 only.
The scale problem
Spotify has 600 million users. You probably don't. But "year-end recap for our entire user base" is still a large enough number that the rendering has to be batched.
For 100,000 users at 30 seconds each, plan for:
- ~1.5 seconds of render per user-second of video at 1080p on a modern Chromium. So 45 render-seconds per user.
- 100,000 × 45s = 1.25 million seconds = 350 hours = 14.6 days on one machine.
- Parallelize across 100 worker machines: 3.5 hours.
The whole pipeline is described in render 10k variants overnight — same shape, just more users.
The variable choices that matter
The thing that turns a year-in-review video from "data dump" to "shareable" is the editorial pacing:
- Start small. Open on a single quiet stat. Not the biggest number.
- Build to a hero number. The fourth or fifth scene should be the one people screenshot.
- End on a tagline. "Your 2026, in 30 seconds" or similar. Brand mark.
- One color per year. Wrapped uses a different palette every year. Do the same — it makes the share-back instantly recognizable as "this year's Wrapped" vs last year's.
Editorial discipline matters more than render speed.
The CSV / JSON shape
For each user, one row:
{
"user_id": "abc123",
"name": "Jordan",
"minutes_listened": 42384,
"top_artist": "The National",
"top_genre": "indie folk",
"vs_last_year_pct": 18.4,
"top_song": "Bloodbuzz Ohio"
}The CSV is the contract between the data team and the render team. Add a column, add a scene. Remove a column, remove a scene. No coordinator meetings required.
The same pattern in plain CSV form is in batch personalized videos from CSV. The KPI scene specifically is detailed in animated KPI cards that look like money.
The headline
A year-in-review video generator is a CSV, a scene library, and a render queue. None of the three pieces are exotic. The hard part is the editorial taste of the scenes — and that part doesn't get easier with a bigger render farm. Build the scenes first; the renderer is small infrastructure work behind them.
Cite this postBibTeX · APA · Markdown
@misc{okafor2026build,
author = {Marcus Okafor},
title = {Build a Year-in-Review video generator (Spotify-Wrapped style)},
year = {2026},
url = {https://hyperframes.video/blog/year-in-review-video-generator},
note = {HyperFrames blog}
}Marcus Okafor. (2026, May 19). Build a Year-in-Review video generator (Spotify-Wrapped style). HyperFrames. https://hyperframes.video/blog/year-in-review-video-generator
[Build a Year-in-Review video generator (Spotify-Wrapped style)](https://hyperframes.video/blog/year-in-review-video-generator) — Marcus Okafor, 2026
Marcus leads design and motion at HyperFrames. Before that he shipped editorial motion for newsrooms and product launches. He thinks every easing curve has a personality.
An animated invoice summary video (the per-customer billing recap)
Send customers a 20-second recap of their monthly invoice — top metrics, charges, savings — rendered from their data. Quietly more memorable than an email.
A birthday card video generator (per-recipient, from CSV)
Per-recipient birthday cards from a CSV. One HTML template, hundreds of personalized MP4s, sent at scale.
Real estate listing video template (one HTML, many listings)
Real estate listing videos are a $2B/year market of nearly-identical 30-second clips. Here's how to template one and render thousands.
Building with HyperFrames? Come hang out.
We're on GitHub, in Discord, and the playground is one click away. Bring weird ideas — we collect them.