blizzard072@kaist.ac.kr
jchoo@kaist.ac.kr
"The rocket exhaust to glow softly as if traveling through space."
💡 Hover to see infinite scalability
Vector graphics (SVGs) power the modern web. They are infinitely scalable with perfect clarity at any resolution, 54× smaller than video files, editable with CSS and code, and portable across any device. Yet animating them meaningfully is incredibly difficult, even for LLMs. This is because SVGs are made of thousands of low-level elements (paths, groups, shapes) that lack semantic structure. Naively animating these elements leads to chaotic, incoherent results.
Vector Prism re-organizing vector graphics to have a semantically meaningful structure before animating them, enabling high-quality, user-controllable animations that were previously impossible.
When you say "make the buttons bounce," there is no "buttons" in the code, but just scattered <path> elements with no semantic structure.
We need to identify which elements correspond to "buttons" first.
We render each SVG element multiple ways (highlighted, isolated, zoomed), collect (possibly incorrect) predictions from a vision-language model, then use Dawid-Skene model to aggregate these noisy predictions into reliable semantic labels. This turns weak, contradictory signals into robust semantic decisions.
Vector Prism achieves state-of-the-art performance across both instruction-following metrics and file efficiency, demonstrating that proper semantic structure unlocks superior animation quality.
@article{yun2026vectorprism,
title={Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure},
author={Yun, Jooyeol and Choo, Jaegul},
journal={arXiv preprint arXiv:2512.14336},
year={2026}
}