PolygonalBeta
← Lab notes
APR 02 · 7 MIN · STACK

We tried every video model. Here's what we shipped on.

An honest look at the model landscape in spring 2026, and what we learned from a year of pairing rendering with cognition in production.

Spring 2026. The video model landscape has consolidated, but the gap between marketing demos and production reality is still wide. Here's what we learned from a year of pairing rendering with cognition.

What we tested

Every major open-weights video model. Several closed-source models behind partner agreements. Internal stacks built on diffusion, on autoregressive, and on hybrids of the two.

What we shipped on

A hybrid. The identity backbone is ours. The frame generator pulls from a base we tuned heavily. Rendering, identity, and cognition each get their own loop. Pairing them is the unsolved part of the field, and most of our engineering effort.

What we didn't ship on

Anything that couldn't hold a single character across a 30-second session. That cut a lot of the model landscape.

We're not going to claim "powered by [model name]" — the rendering and cognition stack is its own thing, and we'd rather be honest about that than borrow the brand of an upstream model.