Crazy 800

Slop is not volume, but volume without selection pressure. A day in the field — five hundred generated thumbnails in the morning, four hundred reviewing constraint personas in the afternoon — shows why volume becomes a quality mechanism as soon as you review it as patiently as you generate it: the double coin of generation and verification.

June 18, 2026 18 min

AI agents

Practice

AI & society

Michael Überschär

Architect of AI Fundamentals

Has been dealing with human-computer interfaces for over two decades and with how design work changes when a human no longer draws, but a swarm designs and a second swarm reviews.

This post was originally written in German. You are reading a translation.

The industry has agreed on a swear word: Slop — the old word for pig feed, repurposed for mass-generated junk. The term lands, and it rings true. But the same designers who despise Slop continue to manually refine three drafts a week and call it diligence. Behind this lies a certainty that hardly anyone voices because it seems so self-evident: that volume is the opposite of quality.

This certainty is wrong. Volume is the new quality mechanism — under one condition that decides everything. Those who merely generate en masse truly produce Slop. Those who generate en masse and review en masse achieve a quality that manual labor could never reach.

Slop is not a property of volume — Slop is volume without selection pressure.

Watercolor of a Victorian naturalist standing before a large glass cabinet full of lined-up, labeled cards, examining individual specimens with a pen and sample tray; next to it an empty birdcage, in the background bookshelves and a garden. — The breeder in front of the cabinet: hundreds of specimens, one hand that reviews — volume only becomes quality when someone curates it.

The Thumbnail Hour

Five hundred drafts before lunch

What this looks like is shown by a morning from my own practice. My learning portal had crossed the mark of a hundred published pages, and each needed a thumbnail: a visual entrance door that condenses the content into a miniature. The assignment for the swarm fit into one sentence — read every page, develop five to six design ideas, build clickable miniatures from them. Behind each tile was a sub-agent that had actually read the content, the text, the diagrams, the tone, and derived its design idea from that, instead of rolling the dice for a generic icon.

Then, for an hour, something happened that appears in no textbook of my profession. A demo selection page filled up, a hundred pages times five variants. I clicked on favorites; the selection array moved into the clipboard and back to the agents (the most solemn form a judgment of taste still takes today). For everything that didn't convince me, the instruction was: another five, another ten. After about an hour, five hundred to eight hundred miniatures had spanned out, and from them I condensed the inventory that is online today.

What stays with me from this morning is less the volume than the silence. In the past, productivity had a sound — scratching pens in a workshop, thunderstorms of typing, the babble of voices in a review. This hour had no sound of its own. The most productive design hour of my week sounds like nothing — just the fan.

Why I still don't trust this silence blindly has a reason. It wears Bengali characters and gets its say later in this text.

Eight Sketches in Eight Minutes

An appreciation of the tired hand

Before this morning seems self-evident, it is worth looking into the room where I learned my craft. You know it: workshop atmosphere, a ticking timer, felt-tip pens, a sheet of paper, folded three times, eight fields. Crazy 8 — eight sketches in eight minutes. The method was invented so designers wouldn't marry their first idea. It forces the hand to keep drawing when the head has long been satisfied (the most valuable sketch was rarely the first — mostly it was the sixth, shortly after the point where everyone groaned).

This discipline deserves appreciation, because under its conditions it was the smartest answer available. For thirty years, good design was convergence craftsmanship: one variant at a time, one reviewer pair at a time, and every additional round cost someone's afternoon. The methods of my guild — paper prototypes, hallway tests, the review before the release — were, upon closer inspection, all austerity programs for expensive variants. As a rule of thumb from my own practice: a single thumbnail classically cost about an hour of manual labor — reading the article, collecting ideas, designing, implementing. For a hundred pages, that is a hundred hours, two and a half work weeks for entrance doors. So we agreed on eight sketches and called it a creativity technique. That sounded better.

Eight was never the number of creativity — eight was the number of exhaustion.

The Double Price Drop

Two price tags fall at the same time

This exhaustion had an economic foundation, and precisely that has crumbled. Half the story is now told by everyone: generating has become almost free, ten drafts cost barely more than one, fifty barely more than ten. Whoever only sees this half lands unerringly at Slop. The second half is told less often, and it is the more consequential one: reviewing has become almost free in the same move. The same agents that design can review — clicking through, comparing, holding against rules, and justifying their judgments (a reviewer who doesn't get tired and isn't offended when you discard their draft). Two price tags have fallen at the same time: the one for the variant and the one for the reviewing eye upon it.

With this, the business foundation of convergence craftsmanship tips over. It was less a method out of conviction than one out of scarcity. The scarcity to which the entire craft was calibrated disappeared overnight — on both sides of the desk. The question remains that governs the rest of this day: If generating and reviewing both drop toward zero — who or what holds the quality?

The Aviary

Darwin at the desk, part one: variation

What takes the place of scarcity has a historical predecessor, and it coos. Charles Darwin kept pigeons^{1, crossbred them, compared the litters — the dovecote was his laboratory, selective breeding his method of insight.^{2 Exactly this method has moved into my professional life, with one difference I want to state clearly for once: nothing that breathes is bred here. Layouts, forms, entrance doors to learning pages are bred.}}

The breeding goal first. None of this begins with a swarm; it begins with a document. Requirements, style rules, target audiences, taboos — I define what a draft must fulfill before any agent starts; the breeding goal lives in intent and tests instead of pixels. Only then comes the staging: ten drafts, twenty, thirty, fifty — Crazy 8, expanded by two zeros. The swarm knows three tones here. Mute: „Make ten variants.“ Discursive: „Make ten and discuss pros and cons for each variant.“ Primed: „Give me a recommendation beforehand.“ The third tone is the most comfortable and, in critical decisions, the most dangerous, because a recommendation upfront narrows the exploration before it has begun. The swarm doesn't have to guess what I want: it gets dictated before it moves its first feather.

Mutation and recombination. Three survive the first relay — let's say: variant 2, 7, and 13. What follows is the actual process, and it consists of two parallel assignments: sub-variants of the survivors, meaning deep drilling in one direction, and recombinations between them, like the grid of 7 with the color logic of 13. Then select again, span out again. The loop breathes: the solution space expands, contracts, expands anew along the survivors, contracts again. So that the density converges toward quality instead of toward the baroque, two judge passes with opposing objective functions run over the favorites at the end: the first checks if everything is watertight; the second deletes everything that is over-engineered — both poles are in the same manifesto.^{3 One more word on modality: the eye scans twenty layouts next to each other in seconds, it reads twenty code variants one after the other — which is why breeding unfolds its full effect in the visual. With every breath, the solution space becomes denser around the good and thinner around the arbitrary.}

The feed costs. Before the next relay starts, my eye falls on an inconspicuous number: the token budget of the morning. An aviary of this size eats, and it eats money. Therefore, the variants combine against a library of reusable components, instead of building every subroutine from scratch — the variance lies in the arrangement, the foundation remains paid for and tested. This budget management is a survival condition: without it, the method dies in the roll-out from the token bill. It has a name: Generation Economy. If you can't feed the aviary, you aren't breeding lines: you are collecting beginnings.

Variation is only half of breeding — without an environment that sifts out, the aviary is merely full.

Four Hundred Eyes

Darwin at the desk, part two: the environment sifts

In the afternoon, the swarm reverses direction. In the morning, it designed; now it reviews. On the screen, instead of miniatures, there are now logs, and the same economy that allows fifty drafts at once allows four hundred reviewers at once — the four-eyes principle becomes a four-hundred-eyes principle.

Environments instead of reviewers. The first reflex would be to start the same reviewer four hundred times. That finds the same thing four hundred times. The variance is created by constraint sets (rulebooks that each lock in one condition of use): you may only use the keyboard; you operate everything by voice; you are sitting on a smartphone, in Dark Mode, with these personality traits and this patience. Each set is less a reviewer than an environment in which a draft must survive — an artificial habitat that sharpens exactly one survival condition. The load-bearing axes here are functional: language, cognitive load, tech familiarity, age — they are dictated to the model as explicit specifications and overwrite what it would assume on its own. Thus, a hundred, two hundred, four hundred agents go over the software like a shotgun blast, deliberately spread wide because collective coverage counts; a consolidation agent then condenses the findings into patterns. And so that no caricatures are clicking, role and reasoning are separate layers: beneath every persona role runs an internal monologue that argues every action against the rulebook and role plausibility before the persona executes it. Without this layer, the agent is acting in a play; with it, it simulates. The workshop poster has become an environment with mandatory logging.

The log as artifact. The actual test artifact of these swarms is the reasoning chain of the monologue, the reasoning trace — the click path is only its shadow. Friction patterns can be bundled from reasoning chains: where a persona hesitates, why it aborts, which phrasing it reads twice. This is the place for the primary evidence — an excerpt from a real run, anonymized, with date:

Persona run · March 11, 2026 · Constraint set: exclusively keyboard, no mouseTask: open the term „Adversarial Examples“ in the glossary.„Tab — immediately ‚Skip to content’, good. But do I want the content or to search directly? Tabbing through the A–Z list costs 26 stops, I won't do that. So the search field, one tab further. Except the field carries no label — just the instruction ‚Enter text to find pages. Use arrow keys to navigate.’ Only by reading it do I understand that I switch to the hits with the arrow keys after entering. I type ‚Adversarial’, arrow down, Enter. Goal achieved — but I had to piece the path together myself.“Friction: The search field has no label of its own; how to get from the input to the hit list is only in the placeholder instruction — whoever skims over it is stuck.

What you are reading there is the difference between a metric and a witness: the run documents its own hesitation, justifies its decision, and names the point where the interface lost it. I assert that reviewing is structurally cheaper than generating beyond my own desk: AI safety research builds entire oversight processes on this asymmetry. One of them, the debate procedure, I described in my book: two AI systems compete against each other, each wanting to convince a human judge — the core assumption being that it is easier to recognize a lie than to find the truth yourself.⁴

The click says what happened — the reasoning chain says why it will happen again.

The stack of disputes. No human reads four hundred review reports, and nobody has to. The consolidation sorts the findings into two classes: things the brigade agrees on — they move bundled into the pipeline — and things it argues about. My reading time belongs to the argument. Where four hundred environments come to the same finding, my nod is enough; where they diverge, there is information. Only this sounds like less work than it is: whoever decides which of two well-reasoned contradictions gets to change the product needs more judgment than before, and they need it more often. The bottleneck hasn't disappeared — it has migrated to the judgment over dissent.

There remains the objection that drops first in every discussion about simulated users: personas from a language model are supposedly shaped by its biases and therefore unrepresentative. The representation debate asks a real societal question — it just answers one that UX testing never asked.^{5 What is tested is whether a form survives under the condition „small screen, low tech familiarity, Easy Language“; the condition is dictated, set as a constraint for the model instead of read out of it. The hardest demonstration of this logic, however, waits somewhere else: in languages of which I cannot decipher three words. The swarm reviews even where I cannot even read what it reviews.}

Bengali, Latin

The story of a gap

My portal — basiswissen-ki.de, a freely accessible AI learning offering — appears in 28 languages — status as of the publication of this article, because it keeps growing whenever the backlog is idle and a few tokens are left. Every page exists twice: standard language and Easy Language, even where the target language has no Easy Language tradition of its own. The total volume today is around 35 million words — about 275 times as much as is in the book (350 pages, excluding the roughly 650 sources) that I sat at concurrently during those same years, by hand, night after night. A stable roughly five percent of visitors have activated the simple language, says the telemetry of my own portal. The whole thing was set up as a roadmap: each language prioritized by reach, complexity, writing direction, and the training data quality of the translation models; at the turn of the year 2025/26, when the translation wave ran, the portal stood at a good 250 book pages per language. This is only possible because the same breeding logic is at work here as with the thumbnails: agents generate, brigades sift — AIs review AIs, against guidelines that I set up in double variant per language, with idiom rules, glossary, consistent naming of UI elements.

And exactly there sits the story I have owed you since this morning. A single paragraph was missing from the guidelines: „Use this character set for this language.“ No one had written it, so no one reviewed it. Bengali and Russian flipped into the Latin alphabet in places — transliteration instead of script, right in the middle of the body text. Neither the swarm caught it nor I, and it would have been visually easy to catch; exactly that makes the matter so instructively embarrassing. I only discovered it when everything seemed finished.^{6 The consequence was a massive refactoring in April 2026: all language guidelines revised, newer models sent over all 28 editions once more, this time with a character set clause. This second pass had a price that no design discussion ever mentions. It turns out very differently: a Germanic language — Dutch, Swedish — runs cheapest, the model commands it in its sleep; Romance and Slavic are above that; an Indic one like Bengali costs three and a half times as much, because the model wrestles for every syllable there. In numbers: around 42 million tokens for the cheapest, a good 150 million for the most expensive; summed across all twenty-eight, a pass amounts to almost two billion tokens, at list prices about $2,700.^{7 And the most expensive languages are, of all things, the ones I don't read: The missing line about the character set was, counted in tokens, the most expensive paragraph I had never written.}}

I will leave the lesson unvarnished: the swarm reviews the environments you have built for it — an environment that is missing does not sift. Scaling also scales mistakes; it just does it more quietly. A swarm of four hundred eyes is as farsighted as the environments you build for it — and as blind as the ones you forget.

The Embossing

What actually constituted work this day

In the early evening, the day lies spread out like a bank statement: in the morning five hundred generated miniatures, in the afternoon four hundred reviewing environments, in between the memory of two alphabets nobody ordered. Only in this sum does it become clear what connects the two movements.

The word hardly needs more coloring in, the object lesson was delivered by the day. What remains open is what occupies me as I shut down the computer: I was responsible for quality today like rarely before — but I drew, pushed, designed nothing. What was my work on this day, if not a single draft came from my hand?

From Draftsperson to Breeder

The method gets its name

The answer is in no job profile, but it has a name, and it is time to set it: Generative Design Exploration — not a new procedure, but the name for what this day was from start to finish. My work consisted of four things: the intent that preceded every relay; the constraints that turned review agents into environments; the gap in these environments for which I am liable; and the judgment on the disputes that the swarm submitted to me for decision. Darwin would have had a sober word for this: selective breeding — the selection takes place in the dovecote, the breeding goal is created at the breeder's desk. The UX professional breeds: they design requirement spaces, survival conditions, testing climates, and leave the drawing to the swarm.

If that sounds like loss, like a job that has coagulated into administration: the opposite is true. I wrote in my book about an artist who went through 900 prompt attempts for a single image: whoever wrestles for nuances that long isn't playing the lottery — they are creating. The threshold of originality lay in the relentlessness of the vision.⁸

The relentlessness has remained. Only its subject has moved — from the single stroke to the conditions that judge thousands of strokes.

The draftsperson designed solutions — the breeder designs the conditions under which solutions survive.

Border Control

The same swarm, a different front

One question remains, and it leads out of the studio. Interfaces are increasingly rendered in the moment of their use, tailored to the individual. In such a world, a manipulated variant is no longer an A/B artifact that could be found in aggregates: it only exists for its victim, for a single moment, and deletes itself with the page change. Adversarial Hyperpersonalization (personalization that works against the user's interests) leaves an impact, but no evidence.

„The Foreign Service“, the kickoff of this series, left a question open exactly here: how do you recognize systems that work against their users when every manipulation only has a single witness? The operational answer has been on my screen since this afternoon. A brigade of hundreds of constraint personas each sees its own freshly rendered interface; the consolidation overlays the renderings and makes drift visible — systematic deviations that pile up where a provider treats tired users differently than wide-awake ones. No single human can provide this proof. A swarm can — as the only actor with enough eyes in enough places.

Thus, this methodology hinges on a question that is larger than any design team. That swarms review will be as self-evident in a few years as the spam filter is today. Who directs them, who owns the breeding goal and the judgment — that is the actual distribution question, and a market will be decided by it as well: whoever can define selection pressure possesses the monopoly on quality that, just yesterday, resided in agencies and design departments. The same brigade that sifts out layouts today is tomorrow the only entity that even gets to see a manipulation at all.

End of the Workday

Who was missing today

Evening is here, the fan spins down, and for a few minutes the room sounds the way the end of the workday has always sounded. On the second monitor are still the miniatures from the morning, five hundred attempts, a curated inventory. It was a good day: fast, thorough, in a quality that I alone wouldn't have reached even in a month. The profession has switched sides in the process — bred instead of drawn, and the judgment was my only manual labor.

Only upon standing up do I notice who was missing from this scene all day. In every „studio“^{9 where I learned, a second person sat somewhere: the intern, the working student, the new one — someone who hung up the sketches, logged the reviews, and along the way, simply by being present, got the map of the profession into their head. Today there was nobody. The swarm needs no errand boy, and precisely for that reason, nobody stood next to me who could have learned by watching.}

Nobody learned anything today — and that is the only line in the log that worries me.

Whoever considers this a footnote underestimates it. What becomes of a profession that no longer trains its beginners anywhere, and which software company survives a world without this foundation — „The Book of Hours“ will explore that. Until then, what this day has shown holds true: volume became quality on the day we started reviewing it the way we generate it — en masse, patiently, and with a breeding goal written by a human.

Key terms

Generative Design ExplorationThe name for the method of this day: a human sets the breeding goal and constraints, a swarm generates variants, a second swarm sifts them out. The designer no longer draws, they breed.
Double coinGenerating and reviewing as two sides of a single economic process. Volume without review is Slop; volume with mass review becomes quality.
Four-hundred-eyes principleThe escalation of the four-eyes principle: hundreds of review agents, each in its own constraint set, go over an interface widely spread; a consolidation agent condenses the findings into patterns.
Constraint-SetA rulebook that locks in one condition of use each (only keyboard, only voice, small screen, low tech familiarity). Each set is an environment in which a draft must survive.
Generation EconomyThe budget management of the method: combining variants against a library of paid, tested components instead of building each one anew. Without it, breeding dies from the token bill.
SlopOriginally the word for pig feed, today for mass-generated junk. Not a property of volume, but the consequence of missing selection pressure.
Adversarial HyperpersonalizationPersonalization that works against the user's interests: tailored per person, fleeting, invisible in the aggregate. Only a swarm of constraint personas makes the drift visible at all.