For the last few years, visual AI has largely been evaluated by its ability to produce aesthetically pleasing pixels. Diffusion models excelled at turning text prompts into beautiful images and realistic worlds. Yet, for specialized tasks such as graphics design or 3D modeling, the end user requires artifacts that support continuous iteration rather than just a final rendered output. The most advanced visual AI tools are now moving away from generating the final image itself, instead focusing on producing the underlying source code.
The Two Stacks of Visual Generation
There are two distinct approaches to visual generation. The first is pixel-native generation. These systems produce images or videos directly, typically within a latent space. They remain dominant when the goal is generating a cinematic shot, a moodboard, or a photorealistic image because they excel at texture, atmosphere, and lighting.
The second approach is code-native generation. According to A16z, these systems generate a structured representation that must then be executed or rendered by another engine. The model does not output the final pixels; it outputs the program responsible for creating them. This program can take many forms, including:
- SVG files (Scalable Vector Graphics)
- HTML/CSS layouts and React components
- Lottie JSON files for animation
- Blender scripts or USD scene graphs for 3D environments
The Advantage of Structured Code
This distinction is critical because professional production workflows prioritize what happens after the initial generation. A generated image serves as an output, but a generated visual program functions as an artifact. This artifact can be edited, reused, versioned, and integrated into existing software stacks while being validated against technical constraints.
The value of code-native generation becomes clearest when considering post-draft modifications. If a model generates a logo as a raster image and one curve is incorrect, the user must manually mask or redraw it. Conversely, if the output is SVG, the designer can precisely edit the path, primitive, gradient, or text element directly. In UI design, an HTML/CSS output allows designers to inspect the DOM, swap in real components, test responsive states, and check accessibility—capabilities impossible with a static screenshot.
Optimizing for Workflow and Compute
This structural approach also offers significant benefits for test-time compute efficiency. In pixel-native generation, seeking improvements often means sampling more outputs; generating twenty images and picking the best one is essentially a new roll of the dice. While diffusion models can respond to feedback, that feedback tends to be global and imprecise. By contrast, code-native systems allow for highly efficient improvements because they solve a well-defined and validatable coding problem. This shift reframes visual generation as a coding task, unlocking editability and iteration loops that pixel-based models cannot match.
Ultimately, the move toward generating source code represents a fundamental change in how AI interacts with creative and technical production pipelines, transforming AI from a mere image creator into an integrated engineering component.