The Constitution of Software

Every three months a new model arrives — and yet, in 2026, it is not the model that decides, but what lies one layer beneath: design system, directives, tests, and Smart Caches. Part two of the series reads software like a constitution and asks the question that tests every architecture: What remains if all code is deleted tonight?

SeriesParadigm ShiftPart 2 of 5
June 16, 2026 17 min
AI agents
AI development
Meta
This post was originally written in German. You are reading a translation.

Every three months, the industry is reborn. A model is released, timelines glow, benchmarks tumble, and in a thousand meetings it is decided to finally rethink everything from scratch. It is the most reliable ritual of the agent era — and its greatest misunderstanding. Because the faster the leaps come, the less the individual leap decides: what everyone gets, differentiates no one. The model, around which everything revolves, has meanwhile become the most interchangeable component in the stack — more fungible than any library, more short-lived than any framework. What is not interchangeable, on the other hand, lies one layer beneath and looks suspiciously like homework.

Those competing with agents in 2026 are not competing on models. They are competing on constitutions.

The Napkin

A web form over coffee and cake

Summer 2024, birthday party of a couple we're friends with — old friends from college, the food was phenomenal, and at some point the group sat outside in the courtyard on beer tent benches, between leftovers, glasses, and the children's coloring books. I took one of their colored pencils, wiped a few cake crumbs off a napkin, and roughly sketched a web form: a few fields, a few steps, arrows in between; in one of the boxes I simply scribbled: “Table with last name, first name, date of birth, six demo records.” That's how you tell friends what you are currently building.

Watercolor of a beer tent set in a garden: on the wooden table a paper napkin with a hand-scribbled form sketch — table with last name, first name, date of birth and six demo records —, next to it colored pencils, an open notebook, a coffee cup, a partially eaten cake and a champagne glass.
The napkin that held everything: a form made of colored pencil, among coffee, cake, and a champagne glass. The form remained on the paper — the intent moved on.

Then I photographed the little work of art with my phone and sent the photo to my bot — an assistant who knew my first draft of a semantic design system: the components and the instructions on how a bot should use them. It read the sketch, built the data object from it, and passed it on to the UI builder. A few minutes later, a finished web form lay on the phone screen: multi-step, functional, compliant with the design system. The phone made the rounds, between coffee cups and a genuinely good, partially eaten cake.

What was noteworthy was less the speed than what the system left out: it did not copy the sketch. From the crooked lines to the colored-pencil aesthetic, none of it made its way into the form. The system recognized what it is — a wizard with these steps, fields with this meaning, a process with this skeleton — and recreated it according to the guidelines of its design system. The scribbled box with a single sentence became a real table: three columns, six demo rows, exactly as noted. The form remained on the napkin. The intent arrived.

In 2024, this left the bystanders amazed. Today it is — an assessment from my own practice, not industry statistics — the standard among those who seriously practice agentic coding: you build a framework including a design system, feed in specifications, and out comes a web product. “The Foreign Service” asked about the external law between agents, about passports and credentials; this one writes the internal law of the software — and begins where all law begins: with the question of what should apply. The answer from the evening in the courtyard has stood firm ever since: the system preserved the intent, not the form.

Two Valid Skepticisms

A defense of the eye-rollers

Anyone rolling their eyes at this point has good reasons — two kinds of them, actually.

The first skepticism is known to anyone who has survived a design system initiative. Not that the idea was wrong — on the contrary: I have been building design systems for years and consider them central for any product meant to scale. The skepticism was never directed at the idea. It was directed at what usually became of it in practice. For three waves, this genre was built, celebrated, and buried: style guides created for audits that shined in pitch decks; component libraries whose maintenance no one budgeted for after launch; token architectures understood by exactly one team, which then left. The magnificent volume on the shelf: printed, signed, and unread. (The term “living document” carried a burden of proof in those years that it rarely redeemed.) Whoever said “window-dressing fluff” was not describing the idea, but its standard case.

The second skepticism is more recent and sits deeper: the suspicion that the model is everything after all. It too was justified. As long as model leaps were the only source of new capabilities, model-hunting was simply the correct strategy — whoever had the better model won, and everything else was decoration. For a while, the fetish was simply true.

Both skepticisms were therefore completely right — until something shifted that had nothing to do with quality. The style guides did not get better, the models did not get more modest. What changed was something else: who reads.

The design system never had a quality problem. It had a reader problem.

The Second Reader

What has changed — and it was not the humans

The reader who was missing all those years has arrived. They are just not human. Before every task, a machine reads the artifacts of the project — design system, documentation, directives. Not by plowing through everything from front to back; that would be a waste. It reads the table of contents and finds its way from there — via ontologies, references, links — to exactly the place the task requires. No fatigue, no silent agreement that no one needs chapter three anyway; but also no blind trudging. Developer Experience and Agent Experience thus merge into one: what this second reader cannot find, no longer exists for production.

At the same time, the speed ratio upon which every engineering culture was built has reversed: generating is now faster than understanding. This creates massive software structures that no human can fully penetrate anymore. Generated in a single afternoon, they become the digital ruins of tomorrow.1

Let's take as a rule of thumb: models double their practical capabilities roughly every three months. If that is even approximately true, an uncomfortable calculation follows: whoever owns only code owns the artifact that loses value the fastest — a snapshot of what the generation before last considered a good solution. This is where the real divergence of this era arises: the model leap devalues code at the same rate that it revalues constitutions.

The Constitution

Preamble, norm, standing instructions, contract, case law

So what is this constitution that survives the leap? Let's read it the way lawyers read a constitution — from the beginning. It consists of four artifact classes and a preamble, and none of them are code. All five are aggregate states of the same substrate: Intent.

The Preamble — Intent

Every constitution begins with a sentence that enforces nothing and justifies everything. “We the People” becomes “I the user” in the software constitution: what is recorded here is the purpose that all norms serve — recorded word for word, not summarized. This applies in both directions of the same mechanism: the user articulates what they want, and the application unfolds around it; the developer articulates what should apply, and the agents build accordingly. How quickly the exact wording is lost is shown by a small experiment: tell an agent “build me a square,” and it builds a square. Tell a second one days later, and you get a rectangle — the four-sidedness was recognized, the actual condition got lost along the way. The preamble is the place where “square” remains until someone changes it with justification. The intent drives things forward; what makes it durable comes further down. Because a preamble enforces nothing — but without it, no norm knows what it is there for.

The Norm — the design system

Then the text of the constitution itself begins. The norm of the software is its design system, understood as a generative source: it determines what is even allowed to be generated — it shares little more than the name with the old style guide for humans. Exactly that was on display on the napkin in the courtyard — the system read the sketch against its norm and recreated it in compliance with the design system. The closest relative is older than one might think: WAI-ARIA has been providing screen readers for years with a purely semantic description of what an interface means, decoupled from its presentation.2 The design system of the agent era is the same promise to a new addressee. If you keep the norm purely semantic, you can ultimately even swap out its renderer without touching the application core. The norm includes the legal definitions: a hierarchical glossary in which every term is defined in exactly one layer and deeper layers only reference it — “for the purposes of this constitution, a wizard is …”. Without it, the terms drift, and with the terms drift the agents.

And the norm answers a question that never arose in the past: When is building actually done? Three cadences are available — pre-build, where specifications solidify into a static product; on the fly, where every user gets their own rendering; and a hybrid that interweaves both. Pre-build freezes, on the fly flows, the cache decides the aggregate state. As a rule of thumb for the assignment: texts are more volatile than layouts, layouts more volatile than semantic contracts. One of these three cadences, however, is more than a technical option — it is a value decision; more on that in a moment. What remains to be noted is the division of labor: the norm determines what software is allowed to become. What it looks like is a matter of interpretation afterward.

The Standing Instructions — Directives

Norms do not enforce themselves. Enforcement belongs to the directives: standing instructions,3 which every agent reads before every task. Their substance is use cases, networked documentation, and first principles — material in which an agent finds a reason to stop even when the conversation has long since flattered its way off course.

How to learn this discipline is something I can show with one of my own scars — not a single one, more of a pattern that repeated itself across many projects. A model's first plan was never the best. So I had it improved: “Make it watertight.” That worked — and reliably tipped into the opposite. The agent kept waterproofing, long past the actual task, and lost sight of the goal and proportionality. My one-sided watertight directive was a failed directive: it only knew one pole. This turned into a ritual in three prompts — first the plan, then “make it watertight,” then “go over it again and cut out everything that is over-engineered.” Trim the Fat. Whether two separate instructions really work better than one combined one is something I never cleanly tested in an A/B test. My impression says yes. The lesson, however, stands: directives are allowed to hold contradictory poles in the same manifest — the tension is not an inconsistency, it is built-in self-correction that the agent processes as a loop.

What something like this looks like productively is shown by a single language guideline from my learning portal: it determines what is translated and what is not, how proper names are handled, which character set applies, how technical terms remain consistent — written cleanly once, read by every translation agent before every task. Today, a stable five percent of visitors read the portal in Easy Language, which simply would not exist without this directive discipline. The scar and the portal teach the same thing: a directive with only one pole is a drift with an official seal.

The Contract — Tests

In software history, tests were the canary in the coal mine: tolerated, occasionally fed, and the first to be cut in times of doubt. In the constitution, they move into the position of the contract — the only artifact class that verifiably survives a model leap. The reasoning for this is older than any AI. Nietzsche let his Zarathustra know that one human at best almost understands another — machines included. Complete understanding is not to be had. What is to be had: the exact wording, recorded syllable for syllable — what the user said, what the developer said, every “I”. The wording turns into requirements, requirements turn into tests. And if something doesn't fit in the end, the cascade runs backward — from the red test via the ticket to the original statement and its subtext, the same mechanism that ran through the agent world in the previous article as “back-propagating”. This changes nothing about classic requirements engineering, and that is no side note: clarifying requirements remains the discipline. Tests are their executable final form4 — the step at which acceptance criteria take on machine form, whether as specification tests, broad property tests, mutation testing, or persona-driven end-to-end runs. Test and code are separate instances here — one holds the goal, the other executes and reports back. A red test is, if you will, a successful constitutional complaint.

A test is the only requirement that cannot lie.The Contract Clause

Case Law — Smart Caches

The question remains what happens to the judgments. Smart Caches are the case law of the constitution: the best possible valid solutions that are not changed without justification. What is preserved is what has been recognized — the wizard type, the field semantics, the process skeleton. The appearance, from the pixel layout to the raw input, stays out. A solution enters the cache like a case in court: only when frequency, tests, and security checks have been passed and a justification is present is it elevated to precedent — and it is only downgraded on the record. The justification for the judgment is provided by Architecture Decision Records, which sit alongside the code instead of in a distant system that only full-text search ever sees again: the cache is the executable half of the precedent, the ADR the justified one — and the agent reads both in the same place. Why this effort? Because there is pressure on the boiler from four sides: untested things must not roll out, human approval needs a stable state, the user's mental model needs recognizability, and every unnecessary regeneration burns money.

This also redeems the value decision from earlier. The question is not when you need a cache. The question is in which parts of your application fluidity has no place — where two faces of the same function would be a breach of trust. The hybrid from the norm chapter gets its mechanics here: the cache is pre-build material on an on-the-fly substrate. And yes, forgetting also wants to be regulated — otherwise the case law becomes the memory of old mistakes. What remains is the short legal form: a cache is a decision against fluidity — made, justified, on the record.

The Deletion Question

Four artifact classes, one preamble. If this feels like a toolbox, a thought experiment helps that leaves no tools behind.

Tonight, let's say at three o'clock, the entire code of your organization is deleted. Every line. The implementation of months or years — gone. What burns is everything that a last-generation model wrote. What does not burn, if it exists: the preamble with the recorded intent, the norm, the standing instructions, the contract, the case law along with its justifications.

What remains of your software if you delete all the code tonight?The Rebuild Test, in one question

The Rebuild Test

Cities burn, land registries do not

Anyone who has a constitution answers the question surprisingly calmly: they rebuild — as a process with a plan, just as cities were rebuilt after fires, as long as land registries and building codes were preserved.

There is a name for this: Rebuild Affordance — the property of a system to be able to be rebuilt from its constitution at any time, by any sufficiently good model. The Rebuild Test is the most honest standard of maturity a software organization can apply today: give the new model the norm, directives, tests, and intent — and have it rebuild the same thing. If the new build upholds the contracts, you have in days what used to be a migration roadmap spanning quarters: modern code that verifiably does the same thing. This is exactly where an old anti-pattern flips. The Big Bang Rewrite — the boogeyman that generations of architects have rightly warned against — becomes a real option: if a model twice as good is available every three months and a rebuild takes days instead of years, rebuilding is suddenly a decision you can make — no longer a fate you suffer.

And it goes further than it initially seems: if the rebuild becomes trivial for your own software, it also becomes trivial for the software you have bought so far. What that means for the providers — for entire business models that rested on the toil of rebuilding — is covered in the next part of this series: “Cold on demand”. Until then, the benchmark itself suffices: rebuildability is the proof that a constitution holds — and the only one that can be executed.

Six Questions for Your Stack

Monday morning, first coffee

Something like this is not checked in strategy workshops, but during a walkthrough with your own set of keys. Six questions for Monday morning:5

  1. Can an agent read your design system — or only your frontend team? If the norm lives in Figma screenshots6 and convention folklore, it does not exist for the second reader.
  2. Does your test suite survive a model leap — or does it test implementation details? Contracts bind to behavior; whoever tests internals has signed the contract with the wrong partner.
  3. Does every agent read your latest directive change before every task — or does it lie in a wiki that no one opens? A standing instruction that doesn't hang at the workplace is an anecdote.
  4. Which part of your application is never allowed to have two faces — and where is that recorded? If the answer is “in the team's head,” it's a no.
  5. If a model twice as good appears tomorrow: how many days until your stack benefits from it? The number is your constitutional maturity in days.
  6. Do your decision justifications lie where the agent works — or where management searches? Precedent without findable justification is superstition with a version number.

Whoever answers yes to all six no longer needs a constitution — they have had one for a long time.

Compound Interest

Market, profession, and an uncomfortable question of ownership

Now zoom out. If constitutional maturity decides how much of every model leap arrives, then it compounds: the same new model generation triggers a wave of returns — rebuilds, modernization, freed-up capacity — for the team with a constitution, and a wave of rework for the team without one. The gap between the two grows with every leap — and does not ask about the talent of those involved. It is the same protection mechanism that in “The Foreign Service” protected the human from their agent, only built in one layer deeper: there, a recorded wording kept the proxy on course, here it keeps the product on course while the generator underneath is swapped out. And with the work, the profession shifts: whoever builds software today writes its code less and less often — and its constitution more and more often.

Which makes one question uncomfortably concrete: who actually owns the intent, if it is the only remaining asset — the one who had it, or the one who records it? The accounting rule, however, already stands: models are depreciated. Constitutions accrue interest.

The Discarded Document

By the way, the napkin from the courtyard no longer exists. It was crumpled up that same evening and accompanied cake scraps into the trash — nobody noticed, not even me. The form from back then has long been history, a showcase from one evening. What has remained is the way of building like this. What was on the napkin had left the paper long ago: the intent. It lives on in the design system — from there, the exact same form could be rebuilt on any given afternoon. The drawing was only its vehicle for one evening.

So go ahead and celebrate the next model leap — it will be remarkable again. But do not confuse the gift with the asset.

You receive your next model as a gift. No one writes your constitution for you.The accounting rule of the agent era

By the way, there has long been a name for writing and maintaining such constitutions: Code Context Engineering. Who carries this discipline, what role it becomes, and what that means for every career in software building — that is discussed in “The Book of Hours”, the final part of this series. Until then, it remains open, and it is bigger than it looks: who writes this constitution?

  1. This is how I described it in AI Fundamentals (Chapter 6: The Art of Prompting).
  2. W3C, Accessible Rich Internet Applications (WAI-ARIA) — a W3C standard since 2014: a semantic description layer that communicates the role, state, and meaning of UI elements to screen readers independently of their presentation. Twenty-five years of accessibility work turn out, in retrospect, to be a dress rehearsal for machine-readable interfaces.
  3. This genre only got a name late in the game: agent frameworks now package such standing instructions as “skills” — reusable instruction packages that an agent pulls as needed. In 2024, there was neither the term nor the product for it.
  4. The nice derivation — “test” from Latin testis, the witness — is historically shaky, unfortunately: more likely the word comes from testum, the clay vessel in which precious metals were tested for authenticity in the fire. For our purposes, both readings win. A test is a witness that tests your gold in the fire.
  5. Anyone who feels a slight twinge when reading the six questions: that is not rhetoric. That is your backlog.
  6. My conviction: every software company should already be serving its design system via an in-house MCP server — machine-readable, versioned, queryable by the agent. Figma has understood this and provides its own MCP server right along with it — which only shifts the question: does anyone who prototypes directly with the real production components even need the drawn intermediate thing anymore?