The industry has agreed on a swear word: Slop — the old word for pig feed, repurposed for mass-generated junk. The term lands, and it rings true. But the same designers who despise Slop continue to manually refine three drafts a week and call it diligence. Behind this lies a certainty that hardly anyone voices because it seems so self-evident: that volume is the opposite of quality.
This certainty is wrong. Volume is the new quality mechanism — under one condition that decides everything. Those who merely generate en masse truly produce Slop. Those who generate en masse and review en masse achieve a quality that manual labor could never reach.
Slop is not a property of volume — Slop is volume without selection pressure.

The Thumbnail Hour
Five hundred drafts before lunch
What this looks like is shown by a morning from my own practice. My learning portal had crossed the mark of a hundred published pages, and each needed a thumbnail: a visual entrance door that condenses the content into a miniature. The assignment for the swarm fit into one sentence — read every page, develop five to six design ideas, build clickable miniatures from them. Behind each tile was a sub-agent that had actually read the content, the text, the diagrams, the tone, and derived its design idea from that, instead of rolling the dice for a generic icon.
Then, for an hour, something happened that appears in no textbook of my profession. A demo selection page filled up, a hundred pages times five variants. I clicked on favorites; the selection array moved into the clipboard and back to the agents (the most solemn form a judgment of taste still takes today). For everything that didn't convince me, the instruction was: another five, another ten. After about an hour, five hundred to eight hundred miniatures had spanned out, and from them I condensed the inventory that is online today.
What stays with me from this morning is less the volume than the silence. In the past, productivity had a sound — scratching pens in a workshop, thunderstorms of typing, the babble of voices in a review. This hour had no sound of its own. The most productive design hour of my week sounds like nothing — just the fan.
Why I still don't trust this silence blindly has a reason. It wears Bengali characters and gets its say later in this text.
Eight Sketches in Eight Minutes
An appreciation of the tired hand
Before this morning seems self-evident, it is worth looking into the room where I learned my craft. You know it: workshop atmosphere, a ticking timer, felt-tip pens, a sheet of paper, folded three times, eight fields. Crazy 8 — eight sketches in eight minutes. The method was invented so designers wouldn't marry their first idea. It forces the hand to keep drawing when the head has long been satisfied (the most valuable sketch was rarely the first — mostly it was the sixth, shortly after the point where everyone groaned).
This discipline deserves appreciation, because under its conditions it was the smartest answer available. For thirty years, good design was convergence craftsmanship: one variant at a time, one reviewer pair at a time, and every additional round cost someone's afternoon. The methods of my guild — paper prototypes, hallway tests, the review before the release — were, upon closer inspection, all austerity programs for expensive variants. As a rule of thumb from my own practice: a single thumbnail classically cost about an hour of manual labor — reading the article, collecting ideas, designing, implementing. For a hundred pages, that is a hundred hours, two and a half work weeks for entrance doors. So we agreed on eight sketches and called it a creativity technique. That sounded better.
Eight was never the number of creativity — eight was the number of exhaustion.
The Double Price Drop
Two price tags fall at the same time
This exhaustion had an economic foundation, and precisely that has crumbled. Half the story is now told by everyone: generating has become almost free, ten drafts cost barely more than one, fifty barely more than ten. Whoever only sees this half lands unerringly at Slop. The second half is told less often, and it is the more consequential one: reviewing has become almost free in the same move. The same agents that design can review — clicking through, comparing, holding against rules, and justifying their judgments (a reviewer who doesn't get tired and isn't offended when you discard their draft). Two price tags have fallen at the same time: the one for the variant and the one for the reviewing eye upon it.
With this, the business foundation of convergence craftsmanship tips over. It was less a method out of conviction than one out of scarcity. The scarcity to which the entire craft was calibrated disappeared overnight — on both sides of the desk. The question remains that governs the rest of this day: If generating and reviewing both drop toward zero — who or what holds the quality?
The Aviary
Darwin at the desk, part one: variation
What takes the place of scarcity has a historical predecessor, and it coos. Charles Darwin kept pigeons1, crossbred them, compared the litters — the dovecote was his laboratory, selective breeding his method of insight.2 Exactly this method has moved into my professional life, with one difference I want to state clearly for once: nothing that breathes is bred here. Layouts, forms, entrance doors to learning pages are bred.
The breeding goal first. None of this begins with a swarm; it begins with a document. Requirements, style rules, target audiences, taboos — I define what a draft must fulfill before any agent starts; the breeding goal lives in intent and tests instead of pixels. Only then comes the staging: ten drafts, twenty, thirty, fifty — Crazy 8, expanded by two zeros. The swarm knows three tones here. Mute: „Make ten variants.“ Discursive: „Make ten and discuss pros and cons for each variant.“ Primed: „Give me a recommendation beforehand.“ The third tone is the most comfortable and, in critical decisions, the most dangerous, because a recommendation upfront narrows the exploration before it has begun. The swarm doesn't have to guess what I want: it gets dictated before it moves its first feather.
Mutation and recombination. Three survive the first relay — let's say: variant 2, 7, and 13. What follows is the actual process, and it consists of two parallel assignments: sub-variants of the survivors, meaning deep drilling in one direction, and recombinations between them, like the grid of 7 with the color logic of 13. Then select again, span out again. The loop breathes: the solution space expands, contracts, expands anew along the survivors, contracts again. So that the density converges toward quality instead of toward the baroque, two judge passes with opposing objective functions run over the favorites at the end: the first checks if everything is watertight; the second deletes everything that is over-engineered — both poles are in the same manifesto.3 One more word on modality: the eye scans twenty layouts next to each other in seconds, it reads twenty code variants one after the other — which is why breeding unfolds its full effect in the visual. With every breath, the solution space becomes denser around the good and thinner around the arbitrary.
The feed costs. Before the next relay starts, my eye falls on an inconspicuous number: the token budget of the morning. An aviary of this size eats, and it eats money. Therefore, the variants combine against a library of reusable components, instead of building every subroutine from scratch — the variance lies in the arrangement, the foundation remains paid for and tested. This budget management is a survival condition: without it, the method dies in the roll-out from the token bill. It has a name: Generation Economy. If you can't feed the aviary, you aren't breeding lines: you are collecting beginnings.
Variation is only half of breeding — without an environment that sifts out, the aviary is merely full.
Four Hundred Eyes
Darwin at the desk, part two: the environment sifts
In the afternoon, the swarm reverses direction. In the morning, it designed; now it reviews. On the screen, instead of miniatures, there are now logs, and the same economy that allows fifty drafts at once allows four hundred reviewers at once — the four-eyes principle becomes a four-hundred-eyes principle.
Environments instead of reviewers. The first reflex would be to start the same reviewer four hundred times. That finds the same thing four hundred times. The variance is created by constraint sets (rulebooks that each lock in one condition of use): you may only use the keyboard; you operate everything by voice; you are sitting on a smartphone, in Dark Mode, with these personality traits and this patience. Each set is less a reviewer than an environment in which a draft must survive — an artificial habitat that sharpens exactly one survival condition. The load-bearing axes here are functional: language, cognitive load, tech familiarity, age — they are dictated to the model as explicit specifications and overwrite what it would assume on its own. Thus, a hundred, two hundred, four hundred agents go over the software like a shotgun blast, deliberately spread wide because collective coverage counts; a consolidation agent then condenses the findings into patterns. And so that no caricatures are clicking, role and reasoning are separate layers: beneath every persona role runs an internal monologue that argues every action against the rulebook and role plausibility before the persona executes it. Without this layer, the agent is acting in a play; with it, it simulates. The workshop poster has become an environment with mandatory logging.
The log as artifact. The actual test artifact of these swarms is the reasoning chain of the monologue, the reasoning trace — the click path is only its shadow. Friction patterns can be bundled from reasoning chains: where a persona hesitates, why it aborts, which phrasing it reads twice. This is the place for the primary evidence — an excerpt from a real run, anonymized, with date:
Persona run · March 11, 2026 · Constraint set: exclusively keyboard, no mouseTask: open the term „Adversarial Examples“ in the glossary.„Tab — immediately ‚Skip to content’, good. But do I want the content or to search directly? Tabbing through the A–Z list costs 26 stops, I won't do that. So the search field, one tab further. Except the field carries no label — just the instruction ‚Enter text to find pages. Use arrow keys to navigate.’ Only by reading it do I understand that I switch to the hits with the arrow keys after entering. I type ‚Adversarial’, arrow down, Enter. Goal achieved — but I had to piece the path together myself.“Friction: The search field has no label of its own; how to get from the input to the hit list is only in the placeholder instruction — whoever skims over it is stuck.
What you are reading there is the difference between a metric and a witness: the run documents its own hesitation, justifies its decision, and names the point where the interface lost it. I assert that reviewing is structurally cheaper than generating beyond my own desk: AI safety research builds entire oversight processes on this asymmetry. One of them, the debate procedure, I described in my book: two AI systems compete against each other, each wanting to convince a human judge — the core assumption being that it is easier to recognize a lie than to find the truth yourself.4
The click says what happened — the reasoning chain says why it will happen again.
The stack of disputes. No human reads four hundred review reports, and nobody has to. The consolidation sorts the findings into two classes: things the brigade agrees on — they move bundled into the pipeline — and things it argues about. My reading time belongs to the argument. Where four hundred environments come to the same finding, my nod is enough; where they diverge, there is information. Only this sounds like less work than it is: whoever decides which of two well-reasoned contradictions gets to change the product needs more judgment than before, and they need it more often. The bottleneck hasn't disappeared — it has migrated to the judgment over dissent.
There remains the objection that drops first in every discussion about simulated users: personas from a language model are supposedly shaped by its biases and therefore unrepresentative. The representation debate asks a real societal question — it just answers one that UX testing never asked.5 What is tested is whether a form survives under the condition „small screen, low tech familiarity, Easy Language“; the condition is dictated, set as a constraint for the model instead of read out of it. The hardest demonstration of this logic, however, waits somewhere else: in languages of which I cannot decipher three words. The swarm reviews even where I cannot even read what it reviews.
Bengali, Latin
The story of a gap
My portal — basiswissen-ki.de, a freely accessible AI learning offering — appears in 28 languages — status as of the publication of this article, because it keeps growing whenever the backlog is idle and a few tokens are left. Every page exists twice: standard language and Easy Language, even where the target language has no Easy Language tradition of its own. The total volume today is around 35 million words — about 275 times as much as is in the book (350 pages, excluding the roughly 650 sources) that I sat at concurrently during those same years, by hand, night after night. A stable roughly five percent of visitors have activated the simple language, says the telemetry of my own portal. The whole thing was set up as a roadmap: each language prioritized by reach, complexity, writing direction, and the training data quality of the translation models; at the turn of the year 2025/26, when the translation wave ran, the portal stood at a good 250 book pages per language. This is only possible because the same breeding logic is at work here as with the thumbnails: agents generate, brigades sift — AIs review AIs, against guidelines that I set up in double variant per language, with idiom rules, glossary, consistent naming of UI elements.
And exactly there sits the story I have owed you since this morning. A single paragraph was missing from the guidelines: „Use this character set for this language.“ No one had written it, so no one reviewed it. Bengali and Russian flipped into the Latin alphabet in places — transliteration instead of script, right in the middle of the body text. Neither the swarm caught it nor I, and it would have been visually easy to catch; exactly that makes the matter so instructively embarrassing. I only discovered it when everything seemed finished.6 The consequence was a massive refactoring in April 2026: all language guidelines revised, newer models sent over all 28 editions once more, this time with a character set clause. This second pass had a price that no design discussion ever mentions. It turns out very differently: a Germanic language — Dutch, Swedish — runs cheapest, the model commands it in its sleep; Romance and Slavic are above that; an Indic one like Bengali costs three and a half times as much, because the model wrestles for every syllable there. In numbers: around 42 million tokens for the cheapest, a good 150 million for the most expensive; summed across all twenty-eight, a pass amounts to almost two billion tokens, at list prices about $2,700.7 And the most expensive languages are, of all things, the ones I don't read: The missing line about the character set was, counted in tokens, the most expensive paragraph I had never written.
I will leave the lesson unvarnished: the swarm reviews the environments you have built for it — an environment that is missing does not sift. Scaling also scales mistakes; it just does it more quietly. A swarm of four hundred eyes is as farsighted as the environments you build for it — and as blind as the ones you forget.
The Embossing
What actually constituted work this day
In the early evening, the day lies spread out like a bank statement: in the morning five hundred generated miniatures, in the afternoon four hundred reviewing environments, in between the memory of two alphabets nobody ordered. Only in this sum does it become clear what connects the two movements.
The word hardly needs more coloring in, the object lesson was delivered by the day. What remains open is what occupies me as I shut down the computer: I was responsible for quality today like rarely before — but I drew, pushed, designed nothing. What was my work on this day, if not a single draft came from my hand?
From Draftsperson to Breeder
The method gets its name
The answer is in no job profile, but it has a name, and it is time to set it: Generative Design Exploration — not a new procedure, but the name for what this day was from start to finish. My work consisted of four things: the intent that preceded every relay; the constraints that turned review agents into environments; the gap in these environments for which I am liable; and the judgment on the disputes that the swarm submitted to me for decision. Darwin would have had a sober word for this: selective breeding — the selection takes place in the dovecote, the breeding goal is created at the breeder's desk. The UX professional breeds: they design requirement spaces, survival conditions, testing climates, and leave the drawing to the swarm.
If that sounds like loss, like a job that has coagulated into administration: the opposite is true. I wrote in my book about an artist who went through 900 prompt attempts for a single image: whoever wrestles for nuances that long isn't playing the lottery — they are creating. The threshold of originality lay in the relentlessness of the vision.8
The relentlessness has remained. Only its subject has moved — from the single stroke to the conditions that judge thousands of strokes.
The draftsperson designed solutions — the breeder designs the conditions under which solutions survive.
Border Control
The same swarm, a different front
One question remains, and it leads out of the studio. Interfaces are increasingly rendered in the moment of their use, tailored to the individual. In such a world, a manipulated variant is no longer an A/B artifact that could be found in aggregates: it only exists for its victim, for a single moment, and deletes itself with the page change. Adversarial Hyperpersonalization (personalization that works against the user's interests) leaves an impact, but no evidence.
End of the Workday
Nobody learned anything today — and that is the only line in the log that worries me.
- A reviewer for his publisher advised Darwin in 1859, instead of writing about the origin of species, to write a book just about pigeons — „everyone is interested in pigeons“, and it would thus get „on every table in the kingdom“. Darwin didn't do it.
- The mechanics are significantly older than their current appearance: John Holland described variation, selection, and recombination in 1975 in Adaptation in Natural and Artificial Systems as a formal search procedure — genetic algorithms. So the algorithm is not new. What is new is the phenotype: finished software.
- This footnote originally had four sentences, an excursion on mutation testing and a second example. Then I treated it like my layouts. This is what survived.
- AI Fundamentals, Chapter 10: Alignment and Safety.
- The objection honors the discipline with a standard it never applied to itself. „Representative“ often enough meant in classic usability practice: the eight test subjects who had time on Thursday afternoon and came to the lobby for a voucher. We took that seriously, and it worked. Representativeness is only seriously demanded from the simulation — a compliment disguised as an objection.
- It seemed.
- These dollar amounts are list prices — what such a run would have cost via the API. I never paid them: I run the pipeline via a flat-rate subscription (Claude Max, 20×) into which the same tokens disappear. The effort is real, my bill was not.
- AI Fundamentals, Chapter 7: Creative Machines.
- „Studio“ is sugarcoated — I learned in software sweatshops. The second chair stood there anyway.
