Generative motion design: LLMs writing CSS animations
How well do current LLMs actually produce motion design? An honest field test of the major models on animation prompts, the failure modes that keep recurring, and the few-shot patterns that fix them.
I had a small, dispiriting moment last fall. A client asked if they could "just have the AI do the motion." I made the right professional noises, but I went home that night and ran the test myself: I gave the same brief to three different LLMs and graded the output the way I would grade a junior designer's first attempt. The result was more interesting than I expected. The models were not as bad as I had hoped (this is honest), and not nearly as good as the client thought (this is also honest).
This post is a longer version of that experiment. I have spent the last six months poking at how well frontier LLMs produce motion design — CSS keyframes, easing curves, timing, composition — and I want to share what I have found. The TLDR: models in 2026 can write competent animation. They cannot, yet, write tasteful animation. The gap is small in lines of code and very large in feeling. The good news is that the gap is closable with the right prompt.
The test rig
I asked four models — pick your current frontier favorites — to produce a 4-second CSS animation of a single headline word "ARRIVES" with subtle character stagger. The brief specified 1080p, 60fps, white text on near-black background, no JavaScript, brand-appropriate easing, "should feel premium." I rendered each one through HyperFrames (deterministic, so the model's output is what you see, no encoder noise) and graded.
The grades, on my private scale of 1-10:
- Model A: 7/10. Good defaults, sensible easing, one weird overshoot.
- Model B: 6/10. Technically correct, visually generic, easing was
ease-in-outon everything. - Model C: 8/10. Surprisingly tasteful default. Used a custom cubic-bezier I would have used.
- Model D: 5/10. Worked, but felt like 2017.
These numbers are not a benchmark. They are one designer's eye on one brief. But the patterns in the failures are stable enough that I want to spend the rest of the post on them.
Failure mode 1: ease-in-out everywhere
The single most common LLM motion failure is using ease-in-out (or worse, ease) for every transition. The browser default, the most commonly-seen value in training data, the easing of least imagination.
I wrote a whole post about why this is the wrong default in easing that looks like money, so I will not relitigate. But for LLMs specifically, the fix is simple: name a curve in the prompt.
A prompt like "use a settle easing of cubic-bezier(.16, 1, .3, 1) for elements arriving, and cubic-bezier(.7, 0, .84, 0) for elements departing" lifts the median output a full grade point. The model is happy to use specific curves; it just defaults to generic ones when you do not specify.
Failure mode 2: overshoot mismatch
The second most common failure is overshoot misplaced. The model will, unprompted, sprinkle cubic-bezier(.34, 1.56, .64, 1) (a bouncy overshoot) on every element. Or it will use overshoot on the wrong element — on a body subtitle instead of the headline.
The issue here is one of editorial judgment. Overshoot is a choice you make about which element gets the dramatic moment. A model has no concept of "which element matters most"; it sees all elements as roughly equal candidates for the dramatic treatment.
The fix is to be explicit: "the word ARRIVES is the hero. It gets overshoot. Everything else uses the settle curve." The model will respect this hierarchy if you state it.
Failure mode 3: timing that does not breathe
Models love to pack animation into 300ms. I think this is because most CSS animation examples in training data are micro-interactions (hover states, button presses), which are correctly short. For editorial motion — title reveals, cinematic title cards, hero animations — the duration that feels right is more like 800-1200ms for the lead element, with 60-100ms of stagger between characters or sub-elements.
Asked unprompted, models will write animation-duration: 0.4s for everything. Asked with the constraint "this is editorial motion, durations should be in the 700-1200ms range," the output gets noticeably better.
A trick that works particularly well: give the model a budget rather than a target. "The hero word should arrive over 900ms; the staggered characters should span the first 600ms; the final 300ms is the settle." The budget framing maps to how human designers think about timing, and the model picks it up.
Failure mode 4: forgetting animation-fill-mode: both
A specific technical failure that costs grades. CSS animations default to animation-fill-mode: none, which means the element returns to its un-animated state after the animation ends. For static frames at the end of a render — which is most editorial motion — you want forwards or both.
LLMs omit this maybe 70% of the time. The fix in prompts is: "Set animation-fill-mode: both on all animations." That is the literal sentence I add. It works.
Failure mode 5: no stagger, or wrong stagger
Models are weirdly bad at character or element staggers. They will either (a) animate everything simultaneously, which is dull, or (b) compute the stagger as a JavaScript loop with setTimeout, which violates the no-JS constraint I set.
The right pattern in 2026 CSS is to use animation-delay with an index variable. Pure CSS, no JS:
.word span {
display: inline-block;
animation: arrive 900ms var(--settle) both;
animation-delay: calc(var(--i) * 60ms);
}
.word span:nth-child(1) { --i: 0; }
.word span:nth-child(2) { --i: 1; }
.word span:nth-child(3) { --i: 2; }
/* ... */Show the model this pattern once, in the prompt, and it will apply it correctly from then on. Do not show it the pattern, and it will reach for JavaScript.
What the prompt that fixed everything looks like
After enough iterations I converged on a "motion brief" prompt template that consistently produces 8-9/10 results. It is long but mostly a list of constraints:
Produce a CSS-only animation for the headline word "ARRIVES" rendered at
1920×1080, 60fps, intended for HyperFrames deterministic render.
Constraints:
- Editorial motion, not UI motion. Durations 700-1200ms range.
- One hero moment. The headline word gets `cubic-bezier(.7, -.5, .4, 1.4)`.
- Everything else uses the settle: `cubic-bezier(.16, 1, .3, 1)`.
- Character stagger of 60ms between characters, via animation-delay.
- animation-fill-mode: both on all animations.
- No JavaScript. No web fonts. No external resources.
- Background is #0a0a0a. Text is #f4f4f4. One accent in #ff5a5f if needed.
- The animation should resolve by t=2.5s, then hold.That prompt, against current frontier models, produces output I would ship in a portfolio piece, with maybe one easing tweak. The model is doing the typing; I am doing the judgment.
The "easing taste gap" is real but narrow
A finding that surprised me: the gap between "model output" and "designer output" is not in creativity or in technique — it is almost entirely in easing. Two animations with identical keyframes and identical timing will feel completely different depending on the curve. The model gets the keyframes and timing roughly right; it gets the curve wrong in a specific, identifiable way.
This is good news for the field. It means the gap is closable not by waiting for better models, but by encoding the missing taste in tooling. We do this in HyperFrames by shipping a small library of named curves you can reference in MDX or HTML: --ease-settle, --ease-launch, --ease-anticipate, --ease-whisper. Models pick up named curves more reliably than four-number tuples. The cognitive load is lower.
If you are building any kind of LLM-assisted animation tool, this is the lever I would pull first: name the curves, document them well, prompt the model to use them by name. The output quality jumps.
What models still cannot do
Honesty section. There are things current LLMs cannot do well in motion, and I want to name them.
- Composition. "Make a hero shot for a fintech ad" produces something that looks like an LLM made it. Not bad; not distinct. The compositional choices — what is foregrounded, what is held back, what moves and what stays still — are where models still feel generic.
- Brand voice. A motion that "feels like Stripe" vs one that "feels like Square" requires the model to have internalized the brand. Some can, with explicit reference; most cannot, without.
- Pacing across longer sequences. A 4-second animation is easy. A 30-second sequence with multiple beats, breathing room, and rhythm is hard. Models struggle to hold an arc.
The first two are likely closable in the next year as models get better at style mimicry. The third one I am less sure about — pacing is editorial, and editorial is the slow part.
How HyperFrames fits
The reason we care about this at HyperFrames is simple: when an LLM writes the animation and HyperFrames renders it, the loop is tight. The model writes HTML. The render is deterministic. The reviewer (human or LLM) sees exactly what the model produced. Iteration is possible.
Compare this to "model writes a prompt, prompt goes to Sora, Sora produces something different each time." The loop is broken. The model cannot reliably tell whether its change helped, because the renderer added noise. We wrote about this in why agents need deterministic rendering.
The practical workflow that has emerged in our own team: a model writes the first draft of an animation, we render it deterministically in the playground, we look at the result, and we either accept it or write a single-sentence note for the model to revise. The note is almost always about easing. Almost always.
Where this goes
My current bet is that within two years, an LLM-plus-deterministic-renderer pipeline will produce production-quality editorial motion for the majority of marketing content. The pieces are all there: the models can write the HTML, the renderer can produce the MP4, the only missing piece is the layer of taste that picks the right curve. That layer is mostly tooling and prompting, not model capability.
The pieces I care about, as a designer, are the ones above that layer: composition, brand voice, pacing. Those will remain human work for longer. Which is, to be honest, the way I want it.
If you are running these experiments yourself, the playground is free and deterministic — paste the model's output, render it, judge it. That is the whole workflow. Bring better prompts than I did and you will get better results.
Cite this postBibTeX · APA · Markdown
@misc{okafor2026generative,
author = {Marcus Okafor},
title = {Generative motion design: LLMs writing CSS animations},
year = {2026},
url = {https://hyperframes.video/blog/llms-writing-css-animations},
note = {HyperFrames blog}
}Marcus Okafor. (2026, May 15). Generative motion design: LLMs writing CSS animations. HyperFrames. https://hyperframes.video/blog/llms-writing-css-animations
[Generative motion design: LLMs writing CSS animations](https://hyperframes.video/blog/llms-writing-css-animations) — Marcus Okafor, 2026
Marcus leads design and motion at HyperFrames. Before that he shipped editorial motion for newsrooms and product launches. He thinks every easing curve has a personality.
Code a Wes Anderson title card (and 4 other director styles)
Five title-card styles from five directors, in pure HTML and CSS. Symmetric framing, serif italics, and the typography choices that signal each one.
CSS gradient animation that doesn't look like 2014
Animated gradients, mesh gradients, and conic sweeps in pure CSS — five techniques that hold up at video resolution, with MP4 export.
Motion graphics in 80 lines
A complete title sequence — bouncy text, parallax backdrop, signal-color accent, cinematic ease — written in 80 lines of plain HTML. No framework. No tooling beyond the browser.
Building with HyperFrames? Come hang out.
We're on GitHub, in Discord, and the playground is one click away. Bring weird ideas — we collect them.