Every three months, the industry is reborn. A model is released, timelines glow, benchmarks tumble, and in a thousand meetings it is decided to finally rethink everything from scratch. It is the most reliable ritual of the agent era — and its greatest misunderstanding. Because the faster the leaps come, the less the individual leap decides: what everyone gets, differentiates no one. The model, around which everything revolves, has meanwhile become the most interchangeable component in the stack — more fungible than any library, more short-lived than any framework. What is not interchangeable, on the other hand, lies one layer beneath and looks suspiciously like homework.
Those competing with agents in 2026 are not competing on models. They are competing on constitutions.
The Napkin
A web form over coffee and cake
Summer 2024, birthday party of a couple we're friends with — old friends from college, the food was phenomenal, and at some point the group sat outside in the courtyard on beer tent benches, between leftovers, glasses, and the children's coloring books. I took one of their colored pencils, wiped a few cake crumbs off a napkin, and roughly sketched a web form: a few fields, a few steps, arrows in between; in one of the boxes I simply scribbled: “Table with last name, first name, date of birth, six demo records.” That's how you tell friends what you are currently building.

Then I photographed the little work of art with my phone and sent the photo to my bot — an assistant who knew my first draft of a semantic design system: the components and the instructions on how a bot should use them. It read the sketch, built the data object from it, and passed it on to the UI builder. A few minutes later, a finished web form lay on the phone screen: multi-step, functional, compliant with the design system. The phone made the rounds, between coffee cups and a genuinely good, partially eaten cake.
What was noteworthy was less the speed than what the system left out: it did not copy the sketch. From the crooked lines to the colored-pencil aesthetic, none of it made its way into the form. The system recognized what it is — a wizard with these steps, fields with this meaning, a process with this skeleton — and recreated it according to the guidelines of its design system. The scribbled box with a single sentence became a real table: three columns, six demo rows, exactly as noted. The form remained on the napkin. The intent arrived.
In 2024, this left the bystanders amazed. Today it is — an assessment from my own practice, not industry statistics — the standard among those who seriously practice agentic coding: you build a framework including a design system, feed in specifications, and out comes a web product. “The Foreign Service” asked about the external law between agents, about passports and credentials; this one writes the internal law of the software — and begins where all law begins: with the question of what should apply. The answer from the evening in the courtyard has stood firm ever since: the system preserved the intent, not the form.
Two Valid Skepticisms
A defense of the eye-rollers
Anyone rolling their eyes at this point has good reasons — two kinds of them, actually.
The first skepticism is known to anyone who has survived a design system initiative. Not that the idea was wrong — on the contrary: I have been building design systems for years and consider them central for any product meant to scale. The skepticism was never directed at the idea. It was directed at what usually became of it in practice. For three waves, this genre was built, celebrated, and buried: style guides created for audits that shined in pitch decks; component libraries whose maintenance no one budgeted for after launch; token architectures understood by exactly one team, which then left. The magnificent volume on the shelf: printed, signed, and unread. (The term “living document” carried a burden of proof in those years that it rarely redeemed.) Whoever said “window-dressing fluff” was not describing the idea, but its standard case.
The second skepticism is more recent and sits deeper: the suspicion that the model is everything after all. It too was justified. As long as model leaps were the only source of new capabilities, model-hunting was simply the correct strategy — whoever had the better model won, and everything else was decoration. For a while, the fetish was simply true.
Both skepticisms were therefore completely right — until something shifted that had nothing to do with quality. The style guides did not get better, the models did not get more modest. What changed was something else: who reads.
The design system never had a quality problem. It had a reader problem.
The Second Reader
What has changed — and it was not the humans
The reader who was missing all those years has arrived. They are just not human. Before every task, a machine reads the artifacts of the project — design system, documentation, directives. Not by plowing through everything from front to back; that would be a waste. It reads the table of contents and finds its way from there — via ontologies, references, links — to exactly the place the task requires. No fatigue, no silent agreement that no one needs chapter three anyway; but also no blind trudging. Developer Experience and Agent Experience thus merge into one: what this second reader cannot find, no longer exists for production.
At the same time, the speed ratio upon which every engineering culture was built has reversed: generating is now faster than understanding. This creates massive software structures that no human can fully penetrate anymore. Generated in a single afternoon, they become the digital ruins of tomorrow.1
Let's take as a rule of thumb: models double their practical capabilities roughly every three months. If that is even approximately true, an uncomfortable calculation follows: whoever owns only code owns the artifact that loses value the fastest — a snapshot of what the generation before last considered a good solution. This is where the real divergence of this era arises: the model leap devalues code at the same rate that it revalues constitutions.
The Constitution
Preamble, norm, standing instructions, contract, case law
So what is this constitution that survives the leap? Let's read it the way lawyers read a constitution — from the beginning. It consists of four artifact classes and a preamble, and none of them are code. All five are aggregate states of the same substrate: Intent.
The Preamble — Intent
Every constitution begins with a sentence that enforces nothing and justifies everything. “We the People” becomes “I the user” in the software constitution: what is recorded here is the purpose that all norms serve — recorded word for word, not summarized. This applies in both directions of the same mechanism: the user articulates what they want, and the application unfolds around it; the developer articulates what should apply, and the agents build accordingly. How quickly the exact wording is lost is shown by a small experiment: tell an agent “build me a square,” and it builds a square. Tell a second one days later, and you get a rectangle — the four-sidedness was recognized, the actual condition got lost along the way. The preamble is the place where “square” remains until someone changes it with justification. The intent drives things forward; what makes it durable comes further down. Because a preamble enforces nothing — but without it, no norm knows what it is there for.
The Norm — the design system
Then the text of the constitution itself begins. The norm of the software is its design system, understood as a generative source: it determines what is even allowed to be generated — it shares little more than the name with the old style guide for humans. Exactly that was on display on the napkin in the courtyard — the system read the sketch against its norm and recreated it in compliance with the design system. The closest relative is older than one might think: WAI-ARIA has been providing screen readers for years with a purely semantic description of what an interface means, decoupled from its presentation.2 The design system of the agent era is the same promise to a new addressee. If you keep the norm purely semantic, you can ultimately even swap out its renderer without touching the application core. The norm includes the legal definitions: a hierarchical glossary in which every term is defined in exactly one layer and deeper layers only reference it — “for the purposes of this constitution, a wizard is …”. Without it, the terms drift, and with the terms drift the agents.
And the norm answers a question that never arose in the past: When is building actually done? Three cadences are available — pre-build, where specifications solidify into a static product; on the fly, where every user gets their own rendering; and a hybrid that interweaves both. Pre-build freezes, on the fly flows, the cache decides the aggregate state. As a rule of thumb for the assignment: texts are more volatile than layouts, layouts more volatile than semantic contracts. One of these three cadences, however, is more than a technical option — it is a value decision; more on that in a moment. What remains to be noted is the division of labor: the norm determines what software is allowed to become. What it looks like is a matter of interpretation afterward.
The Standing Instructions — Directives
Norms do not enforce themselves. Enforcement belongs to the directives: standing instructions,3 which every agent reads before every task. Their substance is use cases, networked documentation, and first principles — material in which an agent finds a reason to stop even when the conversation has long since flattered its way off course.
How to learn this discipline is something I can show with one of my own scars — not a single one, more of a pattern that repeated itself across many projects. A model's first plan was never the best. So I had it improved: “Make it watertight.” That worked — and reliably tipped into the opposite. The agent kept waterproofing, long past the actual task, and lost sight of the goal and proportionality. My one-sided watertight directive was a failed directive: it only knew one pole. This turned into a ritual in three prompts — first the plan, then “make it watertight,” then “go over it again and cut out everything that is over-engineered.” Trim the Fat. Whether two separate instructions really work better than one combined one is something I never cleanly tested in an A/B test. My impression says yes. The lesson, however, stands: directives are allowed to hold contradictory poles in the same manifest — the tension is not an inconsistency, it is built-in self-correction that the agent processes as a loop.
What something like this looks like productively is shown by a single language guideline from my learning portal: it determines what is translated and what is not, how proper names are handled, which character set applies, how technical terms remain consistent — written cleanly once, read by every translation agent before every task. Today, a stable five percent of visitors read the portal in Easy Language, which simply would not exist without this directive discipline. The scar and the portal teach the same thing: a directive with only one pole is a drift with an official seal.
The Contract — Tests
In software history, tests were the canary in the coal mine: tolerated, occasionally fed, and the first to be cut in times of doubt. In the constitution, they move into the position of the contract — the only artifact class that verifiably survives a model leap. The reasoning for this is older than any AI. Nietzsche let his Zarathustra know that one human at best almost understands another — machines included. Complete understanding is not to be had. What is to be had: the exact wording, recorded syllable for syllable — what the user said, what the developer said, every “I”. The wording turns into requirements, requirements turn into tests. And if something doesn't fit in the end, the cascade runs backward — from the red test via the ticket to the original statement and its subtext, the same mechanism that ran through the agent world in the previous article as “back-propagating”. This changes nothing about classic requirements engineering, and that is no side note: clarifying requirements remains the discipline. Tests are their executable final form4 — the step at which acceptance criteria take on machine form, whether as specification tests, broad property tests, mutation testing, or persona-driven end-to-end runs. Test and code are separate instances here — one holds the goal, the other executes and reports back. A red test is, if you will, a successful constitutional complaint.
A test is the only requirement that cannot lie.The Contract Clause
Case Law — Smart Caches
The Deletion Question
What remains of your software if you delete all the code tonight?The Rebuild Test, in one question
The Rebuild Test
Cities burn, land registries do not
Six Questions for Your Stack
- Can an agent read your design system — or only your frontend team? If the norm lives in Figma screenshots6 and convention folklore, it does not exist for the second reader.
- Does your test suite survive a model leap — or does it test implementation details? Contracts bind to behavior; whoever tests internals has signed the contract with the wrong partner.
- Does every agent read your latest directive change before every task — or does it lie in a wiki that no one opens? A standing instruction that doesn't hang at the workplace is an anecdote.
- Which part of your application is never allowed to have two faces — and where is that recorded? If the answer is “in the team's head,” it's a no.
- If a model twice as good appears tomorrow: how many days until your stack benefits from it? The number is your constitutional maturity in days.
- Do your decision justifications lie where the agent works — or where management searches? Precedent without findable justification is superstition with a version number.
Whoever answers yes to all six no longer needs a constitution — they have had one for a long time.
Compound Interest
Market, profession, and an uncomfortable question of ownership
The Discarded Document
You receive your next model as a gift. No one writes your constitution for you.The accounting rule of the agent era
- This is how I described it in AI Fundamentals (Chapter 6: The Art of Prompting).
- W3C, Accessible Rich Internet Applications (WAI-ARIA) — a W3C standard since 2014: a semantic description layer that communicates the role, state, and meaning of UI elements to screen readers independently of their presentation. Twenty-five years of accessibility work turn out, in retrospect, to be a dress rehearsal for machine-readable interfaces.
- This genre only got a name late in the game: agent frameworks now package such standing instructions as “skills” — reusable instruction packages that an agent pulls as needed. In 2024, there was neither the term nor the product for it.
- The nice derivation — “test” from Latin testis, the witness — is historically shaky, unfortunately: more likely the word comes from testum, the clay vessel in which precious metals were tested for authenticity in the fire. For our purposes, both readings win. A test is a witness that tests your gold in the fire.
- Anyone who feels a slight twinge when reading the six questions: that is not rhetoric. That is your backlog.
- My conviction: every software company should already be serving its design system via an in-house MCP server — machine-readable, versioned, queryable by the agent. Figma has understood this and provides its own MCP server right along with it — which only shifts the question: does anyone who prototypes directly with the real production components even need the drawn intermediate thing anymore?
