AI Timeline

A timeline showing that AI was declared dead at least three times — and came back every time.

1837Milestones

Babbage's Analytical Engine: The Idea of the Computer

The history of AI begins not with computers, but with the idea of them. In the 1830s, the British mathematician Charles Babbage designed the Analytical Engine — first describing it in detail in 1837 — on paper, the world's first general-purpose, programmable computing machine. His design was a century ahead of its time: it already had a calculating unit that Babbage called the mill, a memory (the store), programming via punched cards, and even conditional branching — the basic building blocks of every computer today. The machine was never built in his lifetime; it was too complex for 19th-century mechanics. Yet it is the distant ancestor of every computing machine — and thus of the hardware on which artificial intelligence can run at all. For an honest assessment: the Analytical Engine remained an unfinished design, and it was a calculator, not a thinking machine. It provided the foundation, computation — not intelligence.

In the 1830s, the British mathematician Charles Babbage designed the Analytical Engine, first describing it in 1837 — the first design for a general-purpose, programmable computer.

His design already had the building blocks of today's computers: a calculating unit (the mill), a memory (the store), programming via punched cards, and even conditional branching.

Babbage's machine was the distant ancestor of every computer — and thus of the hardware on which AI can run in the first place.

Anti-hype: the Analytical Engine was never finished in Babbage's lifetime — it remained a design on paper. And it was a calculator, not an AI: the foundation, not thinking itself.

People:Charles Babbage

1843Papers

Ada Lovelace: The First Program — and a Bold Vision

Charles Babbage had designed the machine — but it was Ada Lovelace who saw what it might truly be capable of. In 1843, the British mathematician translated an article on Babbage's Analytical Engine and added her own notes, which far surpassed the original in length and depth. In her Note G she described a procedure by which the machine would compute the so-called Bernoulli numbers — often called the first published computer program. Even more far-sighted was her second insight: the machine need not be limited to numbers but could process symbols of any kind, and even compose music. With this, Lovelace conceived the idea of general-purpose computing a century too early. For an honest assessment: whether she really was the first programmer is contested — Babbage himself had sketched programs earlier, and the Bernoulli routine arose in exchange with him. At the same time, Lovelace held that the machine could originate nothing truly new on its own — an objection that Alan Turing explicitly disputed in 1950.

In 1843, Ada Lovelace translated an article on Babbage's Analytical Engine and added her own extensive notes, which far surpassed the original text.

Her Note G contains a procedure for computing the Bernoulli numbers — often called the first published computer program.

With foresight she saw that the machine could do more than calculate: it could process symbols and even compose music — the idea of general-purpose computing.

Anti-hype: whether Lovelace was the first programmer is contested (Babbage wrote programs earlier; the Bernoulli routine arose in exchange with him). She also held the machine could originate nothing truly new — an objection Turing disputed in 1950.

People:Ada Lovelace

1936Papers

The Turing Machine: What Computation Even Means

Before anyone could ask whether machines can think, it first had to be clarified what a machine can compute at all. The British mathematician Alan Turing answered this question in 1936 in his paper On Computable Numbers. In it he described a strikingly simple thought model — a tape, a read-write head, a few rules — that later came to be called the Turing machine. With it, Turing pinned down exactly what is computable and what is not. His most important insight: a single universal Turing machine can imitate any other. This is the theoretical blueprint of the general-purpose computer — a machine that, with the right program, can do anything computable. With it, Turing became the founder of computer science and created the foundation on which the idea of thinking machines first became possible. For an honest assessment: the Turing machine is a mathematical idea, not a built device, and it was about computability, not intelligence. The question of whether machines can think Turing only posed in 1950. And the name Turing machine was coined by others.

In 1936, Alan Turing published the paper On Computable Numbers, describing a simple thought model of computation — what later came to be called the Turing machine.

With it, Turing pinned down what is computable at all. A universal Turing machine can imitate any other — the theoretical blueprint of the general-purpose computer.

With it, Turing became the founder of computer science. That a single machine can compute anything computable is the basis for machines later learning to think.

Anti-hype: the Turing machine is a mathematical idea, not a device — and it was about computability, not intelligence. Whether machines can think Turing asked only in 1950. The name, too, was coined by others.

People:Alan Turing

1943Papers

McCulloch & Pitts: The First Artificial Neuron

Thirteen years before the Dartmouth Conference, in the middle of the war, the real birth certificate of artificial neural networks appeared. The neurophysiologist Warren McCulloch and the self-taught logician Walter Pitts — barely twenty and without any academic credentials — published “A Logical Calculus of the Ideas Immanent in Nervous Activity” in the Bulletin of Mathematical Biophysics in 1943. Their idea was radically simple: a neuron can be described as a binary switching element that fires on an all-or-none basis once the sum of its inputs crosses a threshold. Building on pure propositional logic, they proved that networks of such units can compute any logical function — and that networks with feedback loops even possess a form of memory. In their conclusion they noted that such networks can compute whatever a Turing machine can. With that they delivered the first mathematical model of the neuron as a logical computing unit. The catch that would shape the next decade: their neuron could not learn.

The first mathematical model of the neuron as a logical computing unit: McCulloch and Pitts cast the workings of the nervous system in formal propositional logic.

All or nothing: a neuron fires when the sum of its inputs exceeds a threshold. Networks of such units compute any logical function; feedback loops create memory.

The decisive limitation: no learning. Weights and thresholds were fixed, the network had to be designed by hand. Only Hebb (1949) and Rosenblatt's Perceptron (1957) brought learning rules.

The impact reached far beyond biology: von Neumann's computer architecture (EDVAC, 1945), Wiener's cybernetics and ultimately every artificial neural network rest on this work.

People:Warren S. McCulloch, Walter Pitts

Organizations:University of Illinois, College of Medicine, University of Chicago

1948Papers

Shannon's Information Theory: The Bit Is Born

In 1948, Bell Labs published a paper that founded the digital world: Claude Shannon's A Mathematical Theory of Communication. Shannon showed how information can be measured mathematically — independent of its meaning. He introduced the bit as the smallest unit of information and coined the notion of entropy: a measure of how much uncertainty a message resolves on average. With it he laid the foundation for data compression, error-free transmission, and ultimately every computer. For AI this is more than prehistory: concepts such as cross-entropy and Kullback-Leibler divergence, used today as training objectives for neural networks, come directly from Shannon's theory. For an honest assessment: Shannon described the transmission of messages, not thinking. Information theory is a mathematical tool on which AI builds — it is not itself artificial intelligence.

In 1948, Claude Shannon at Bell Labs published A Mathematical Theory of Communication, founding information theory.

He introduced the bit as the unit of information and defined entropy — how much uncertainty a message resolves on average.

Central to AI: cross-entropy and KL divergence — straight from Shannon's theory — are today's standard training objectives in machine learning.

Anti-hype: Shannon described message transmission, not intelligence. Information theory is a foundation AI builds on — not an AI result. (The term bit was suggested by colleague John Tukey.)

People:Claude Shannon

Organizations:Bell Labs

1949Papers

Hebbian Learning: How the Brain Might Learn

In 1949, the Canadian psychologist Donald Hebb published The Organization of Behavior and put forward a simple, far-reaching idea: when two connected nerve cells repeatedly fire together, their connection grows stronger. With it, Hebb gave the first concrete mechanism for how learning might work at the level of individual synapses. For AI this became a founding principle: learning means adjusting the strength of connections — exactly what artificial neural networks do, such as the later Hopfield networks. For an honest assessment: the famous slogan that neurons which fire together wire together did not come from Hebb at all — it is credited to the neuroscientist Carla Shatz (1992). And Hebb's rule alone does not yet explain modern deep learning, because it lacks targeted error correction.

In 1949, psychologist Donald Hebb published The Organization of Behavior, formulating how learning in the brain might work at the level of synapses.

Hebb's rule: when two connected nerve cells repeatedly fire together, their connection grows stronger.

The idea — that learning means adjusting connection strengths — became a founding principle of learning neural networks (e.g. Hopfield networks).

Anti-hype: the well-known slogan (cells that fire together wire together) did not come from Hebb but is credited to Carla Shatz (1992). Hebb's rule alone does not explain modern deep learning — it lacks error correction.

People:Donald Hebb

1950Papers

The Turing Test: The Imitation Game

The philosophical foundation for machine intelligence and the first AI benchmark. In 1950, Alan Turing published the paper 'Computing Machinery and Intelligence' in Mind and reframed the question 'Can machines think?' Rather than philosophical definitions, Turing proposed the practical 'Imitation Game': a human evaluator judges text transcripts of conversations between a human and a machine. The evaluator tries to identify the machine — the machine passes the test if the evaluator cannot reliably tell them apart. What matters is not the correctness of the answers, but how closely they resemble human responses. This indistinguishability test can be generalized to all human capabilities, verbal as well as non-verbal (robotics). Turing's behavioral approach established the conceptual foundation for all AI research and influenced ELIZA, ChatGPT, and all modern conversational AI systems.

Indistinguishability test: an evaluator attempts to tell a machine apart from a human via text conversation

Shifted focus from philosophical definitions to behavioral demonstrations of intelligence

Posed the fundamental question 'Can machines think?' and proposed an operational approach

Established the first AI benchmark and influenced all subsequent conversational AI developments

People:Alan Turing

Organizations:University of Manchester, Mind Journal

1956Breakthroughs

Logic Theorist: The First Reasoning Program

In the same summer that the term “artificial intelligence” was coined at Dartmouth, Allen Newell, Herbert Simon and the often-forgotten programmer Cliff Shaw demonstrated what is fondly called “the first AI program” — with a footnote. Their Logic Theorist proved mathematical theorems: it took on the propositional logic of Whitehead and Russell's “Principia Mathematica” and independently found proofs for 38 of the first 52 theorems. What was remarkable was the how: instead of blindly trying every possibility, the program searched heuristically — estimating which steps were promising and working backward from the goal. For one theorem it even found a shorter proof than the original; by some accounts Russell was pleased, while a journal rejected the submitted proof. It was all written in IPL, a list-processing language that prefigured McCarthy's LISP. The caveat: game-playing programs such as Samuel's checkers already existed — the Logic Theorist was the first built to deliberately model human reasoning on an open intellectual task.

Often called “the first AI program” — more precisely: the first program built to model human reasoning on an open intellectual task (game-playing programs came earlier).

Heuristic search instead of brute force: working backward from the goal, estimating which steps (substitution, detachment, chaining) were worthwhile — inspired by Pólya's heuristics.

Proved 38 of the first 52 theorems of Chapter 2 of the “Principia Mathematica” — for one theorem even shorter than the original.

Written in the list-processing language IPL (chiefly by Shaw), which influenced McCarthy's LISP; the heuristic approach led directly to the General Problem Solver (1957).

People:Allen Newell, Herbert A. Simon, John Clifford Shaw

Organizations:RAND Corporation, Carnegie Institute of Technology

1956Conferences

Dartmouth Conference: The Birth of AI

The historic moment when Artificial Intelligence was born as a field of research. From June 18 to August 17, 1956, the first AI Summer Research Conference took place at Dartmouth College. John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon held a bold vision: 'Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.' During this eight-week workshop, McCarthy coined the term 'Artificial Intelligence,' laying the foundation for a new scientific discipline. Some participants attended for only a few weeks, others stayed throughout: Herbert Simon and Allen Newell, for example, demonstrated their Logic Theorist in the first weeks, while Ray Solomonoff was present for the full eight weeks — discussions took place on the top floor of the Mathematics Department. Three historic AI centers emerged from this conference: Carnegie Mellon with Newell and Simon, MIT with Minsky, and Stanford with McCarthy.

The birth of AI as an independent research discipline through an 8-week workshop with leading thinkers

John McCarthy coined the term 'Artificial Intelligence,' defining a new field of research

Established the research program: machine language, abstraction, problem-solving, and self-improvement

Brought together the founding fathers of AI: McCarthy, Minsky, Shannon, Rochester, and future Nobel laureate Herbert Simon

People:John McCarthy, Marvin Minsky, Nathaniel Rochester, Claude Shannon

Organizations:Dartmouth College, IBM, Bell Labs

1957Papers

Perceptron: The First Learning Neural Network

The birth of machine learning through the first trainable artificial neuron. In 1957, Frank Rosenblatt at the Cornell Aeronautical Laboratory developed the perceptron — the first neural network capable of learning from experience. In January 1957 he published the technical report 'The Perceptron: A Perceiving and Recognizing Automaton' (Project PARA, Report 85-460-1). The formal scientific publication followed in November 1958 in Psychological Review. Inspired by biological neurons, the perceptron combined weighted inputs via a Heaviside step function to produce binary outputs. The innovative perceptron learning rule corrected weights whenever an example was misclassified — an early precursor to learning in modern neural networks (and not to be confused with the later delta rule by Widrow and Hoff, 1960). Initially simulated on an IBM 704 and publicly announced in 1958, the Mark I Perceptron hardware was not completed until around 1960. Although limited to linearly separable problems, the perceptron laid the conceptual foundation for all subsequent neural architectures.

First trainable artificial neuron with weighted inputs and a Heaviside step function

Binary classification via threshold decision, effective for linearly separable patterns

Frank Rosenblatt's perceptron learning rule corrected weights with every misclassification, enabling automatic learning

The limitation to linearly separable problems later led to the XOR critique by Minsky and Papert

People:Frank Rosenblatt

Organizations:Cornell Aeronautical Laboratory, US Navy

1958Breakthroughs

LISP: The Language of AI

In 1958, John McCarthy at MIT designed a programming language that put symbolic computation at its centre: LISP, short for List Processing. Instead of mainly processing numbers, LISP manipulated lists of symbols — exactly what symbolic AI needed. For decades, LISP became the language of AI research: expert systems, language processing and planning systems were built in it. McCarthy's language also introduced ideas that are taken for granted today: recursion, automatic memory management (garbage collection), functions as data, and interactive evaluation. Steve Russell turned McCarthy's theoretical eval mechanism into the first interpreter — making LISP runnable. For an honest assessment: LISP was not the first high-level programming language (Fortran came in 1957), but it is the second-oldest still in use — and for AI the most influential.

John McCarthy designed LISP in 1958 at MIT for symbolic computation (lists instead of numbers) — for decades THE language of AI research (expert systems, NLP, planning).

Introduced ideas that are standard today: recursion, automatic garbage collection, functions as data, interactive evaluation (REPL).

Built on the list processing of IPL; Steve Russell implemented McCarthy's eval as the first interpreter and made LISP runnable.

Anti-hype: not the first high-level language (Fortran 1957 came earlier) — but the second-oldest still in use, and the most influential for AI.

People:John McCarthy, Steve Russell

Organizations:MIT

1959Breakthroughs

Arthur Samuel: Self-Learning AI & the Term “Machine Learning”

Years before the Dartmouth Conference, Arthur Samuel at IBM taught a machine to play checkers — and, at the same time, to learn. His program ran on the IBM 701 from 1952, but what mattered most appeared in 1959, in his paper “Some Studies in Machine Learning Using the Game of Checkers.” The program improved itself: it played tens of thousands of games against itself and adjusted the weights of its evaluation function based on the outcomes. The term “machine learning” appears in that paper's title in its modern sense for the first documented time — Samuel is conventionally credited with coining it. Richard Sutton later recognised Samuel's self-play as the earliest use of temporal-difference learning, which sits at the heart of modern reinforcement learning. The 1956 television demonstration and a much-cited victory over a supposed master made headlines — but both were heavily overstated: against genuinely strong players the program lost clearly, and checkers was not fully solved until decades later.

In the title of his 1959 paper Samuel used the term “machine learning” — the first documented use in its modern sense; he is conventionally credited as its originator.

The first publicly demonstrated self-learning program: it tuned the weights of its own evaluation function and memorised positions (rote learning).

By playing tens of thousands of games against itself it prefigured the self-play later perfected by AlphaZero — for Sutton, the earliest use of temporal-difference learning.

Anti-hype: the celebrated 1962 win was against an overrated opponent; against world-class players the program lost. Checkers was not fully solved until 2007 (Chinook).

People:Arthur Lee Samuel

Organizations:IBM

1965Milestones

DENDRAL: Pioneer of Expert Systems

In the mid-1960s, AI took a decisive turn. At Stanford University, Edward Feigenbaum and the geneticist and Nobel laureate Joshua Lederberg began work on DENDRAL — a program often regarded as the first expert system, and in any case the first to apply AI to scientific reasoning. Instead of searching in general like earlier systems, DENDRAL used the knowledge of human chemists: from the data of a mass spectrometer it inferred the structure of organic molecules. The lesson shaped a decade of AI — knowledge is power. It is not the cleverest general algorithm that wins, but the one with the most domain expertise. DENDRAL thus paved the way for the expert-systems boom of the 1980s. For an honest assessment: DENDRAL itself was a research project that ran successfully for many years — not a single product. But its method of entering all knowledge painstakingly by hand later became an Achilles' heel: it made the commercial expert systems of the 1980s brittle and expensive, and so contributed to the AI winter.

From the mid-1960s, Edward Feigenbaum, Joshua Lederberg and colleagues at Stanford University built DENDRAL — often called the first expert system, and the first to apply AI to scientific reasoning.

DENDRAL inferred the structure of organic molecules from mass-spectrometry data — using the knowledge of human chemists rather than general search.

The lesson: knowledge is power. Instead of general problem-solvers, AI now bet on narrow, knowledge-rich domains — the start of expert systems.

Anti-hype: DENDRAL itself was a years-long, successful project. But its method — hand-coded knowledge — became the weakness of the 1980s commercial expert systems and contributed to the AI winter.

People:Edward Feigenbaum, Joshua Lederberg, Bruce Buchanan

Organizations:Stanford University

1965Papers

Fuzzy Logic: The Logic of Imprecision

An important mathematical breakthrough for handling uncertainty and approximate reasoning. In 1965, Lotfi Zadeh at UC Berkeley published the landmark paper 'Fuzzy Sets' - a response to the inability of classical logic to deal with vague and incomplete information. His key insight was that humans make decisions based on imprecise, non-numerical information. Fuzzy logic allows membership degrees between 0 and 1, in contrast to binary yes/no logic. With over 100,000 citations to date, Zadeh's work became the foundation for soft computing and modern AI approaches. The 'precise logic of imprecision' made it possible to model uncertainty, incompleteness, and contradictory information mathematically. Fuzzy logic found application in expert systems, control systems, and later in modern AI architectures for approximate decision-making.

Lotfi Zadeh's 1965 paper 'Fuzzy Sets,' with over 100,000 citations, substantially changed how uncertainty is handled

Enabled mathematical modeling of vagueness, incompleteness, and contradictory information

Found application in expert systems, control systems, and approximate decision-making processes

Laid the groundwork for soft computing and modern AI approaches to handling imperfect information

People:Lotfi Zadeh

Organizations:UC Berkeley, Information and Control

1966Breakthroughs

ELIZA: The First Chatbot

The birth of human-machine conversation and an unintended experiment in human psychology. From roughly 1964 to 1966, Joseph Weizenbaum developed ELIZA at MIT — the first program explicitly designed for conversations with humans. Using remarkably lean code and simple pattern-matching technology, ELIZA simulated conversations, particularly in the DOCTOR variant as a Rogerian therapist. The surprise lay not in the technology but in the human response: users, including Weizenbaum's own secretary, developed emotional attachments to the program and even demanded privacy for their 'therapy sessions.' Weizenbaum described and criticized this phenomenon early on — the tendency to attribute human qualities to rudimentary programs. The term 'ELIZA effect' itself was coined and popularized only later, in the 1990s. ELIZA demonstrated the power of simple illusion and laid the foundation for all modern chatbots.

The first computer program explicitly developed for human-machine conversation, completed in 1966

Used simple pattern-matching and substitution methodology — the program got by with remarkably little code

Created the illusion of understanding and emotional intelligence without genuine language comprehension

Made visible what would later be called the 'ELIZA effect' and cautioned against projecting human qualities onto rudimentary programs

People:Joseph Weizenbaum

Organizations:MIT, MIT AI Laboratory

1969Papers

Perceptrons: The Book That Helped Trigger the AI Winter

In 1969, MIT researchers Marvin Minsky and Seymour Papert published the book Perceptrons. With mathematical rigour they showed what a single-layer perceptron — the simplest form of a neural network — can and cannot do. Their most famous result: such a network cannot even learn the simple XOR function, because it cannot be separated by a single line. The impact was enormous: confidence in neural networks collapsed, and funding dried up for over a decade — a major contribution to the first AI winter. For an honest assessment: Minsky and Papert by no means disproved neural networks. They only analysed the single-layer variant; multilayer networks solve XOR easily — which later became practical from 1986 onward with the backpropagation algorithm. The story that the book single-handedly killed the field is partly a myth. But the collapse in money and attention was very real.

In 1969, Marvin Minsky and Seymour Papert published Perceptrons, analysing mathematically what single-layer perceptrons can — and cannot — do.

Their famous result: a single-layer perceptron cannot learn the simple XOR function, because it is not linearly separable.

The book is seen as a co-trigger of the first AI winter: funding for neural networks dried up for over a decade.

Anti-hype: Minsky and Papert did not disprove neural networks as such — multilayer networks solve XOR (later via backpropagation, 1986). That the book alone killed the field is partly myth; the funding collapse, however, was real.

People:Marvin Minsky, Seymour Papert

Organizations:MIT

1969Breakthroughs

Shakey: The First Intelligent Mobile Robot

The birth of autonomous robotics through the integration of reasoning, planning, and physical action. From 1966 to 1972, Charles Rosen's team at SRI International developed Shakey — the first mobile robot capable of reflecting on its own actions. The 2-meter-tall robot combined a TV camera, sonar rangefinder, processors, and 'cat whiskers' as bump detectors into an autonomous system. Shakey's remarkable capabilities included environmental perception, drawing inferences from implicit facts, plan creation, and error compensation — all controllable via natural English. The ARPA (now DARPA)-funded project was the first to unite logical reasoning with physical action, laying the groundwork for autonomous systems. Shakey's innovations led to the A* search algorithm, visibility graph methods, and the influential computational variant of the Hough transform (Duda & Hart, SRI 1972). In 1970, Life Magazine called Shakey 'the first electronic person.'

First mobile robot capable of reflecting on its own actions and independently planning complex tasks

Combined TV camera, sonar, processors, and sensors into an autonomous mobile system

Developed the STRIPS planning system for automatic task decomposition and route finding

United computer vision, navigation, and logical reasoning in a single physical system

People:Charles Rosen, Nils Nilsson, Bertram Raphael

Organizations:SRI International, DARPA

1970Milestones

SHRDLU: Understanding Language in the Blocks World

Around 1970, Terry Winograd at MIT built a program that astonished the field: SHRDLU. You could give it instructions in plain English — say, to put the red cube on the green block — and it carried them out in a virtual world of coloured blocks. SHRDLU understood more than just commands: it resolved ambiguous sentences, remembered what had been said, answered questions about its world, and could even explain why it had done something. For many it was the impressive high point of symbolic AI — proof that machines can understand language remarkably well. For an honest assessment: SHRDLU's understanding only worked in its tiny, closed blocks world. It could not be transferred to the real, messy world with its endless everyday knowledge. Over time, SHRDLU became a cautionary tale about the limits of such microworlds — Winograd himself later turned away from this approach.

Around 1970, Terry Winograd at MIT built SHRDLU — a program that understood commands in plain English and manipulated a virtual blocks world.

SHRDLU could resolve ambiguous sentences, remember what had been said, answer questions, and even explain why it had done something.

It was seen as the impressive high point of symbolic AI — proof that machines can understand language remarkably well within a limited world.

Anti-hype: SHRDLU's understanding only worked in its tiny blocks world. It could not transfer to the real world — a cautionary tale about the limits of such microworlds.

People:Terry Winograd

Organizations:MIT

1970Papers

Hidden Markov Models Established

The mathematical foundation for speech recognition and sequence modeling. From the late 1960s through 1970, Leonard Baum, Lloyd Welch, and Ted Petrie at the Institute for Defense Analyses developed Hidden Markov Models and established the Baum-Welch algorithm. These statistical models modeled hidden states in sequences and provided one of the first practical approaches to capturing latent states in time-dependent data. From the mid-1970s onward, HMMs found their first practical application in speech recognition through James Baker at Carnegie Mellon and later at IBM. The method transformed automatic speech recognition from simple template-matching procedures to statistical approaches. HMMs became the standard for sequence modeling across numerous fields: from bioinformatics and financial analysis to gesture recognition. The Baum-Welch algorithm, later recognized as a special case of the Expectation-Maximization algorithm formalized in 1977, laid the foundation for modern probabilistic machine learning methods.

The Baum-Welch algorithm as a special case of Expectation-Maximization for HMM parameter estimation

First practical application in speech recognition from the mid-1970s at Carnegie Mellon and IBM

Transformed sequence modeling from template matching to statistical probabilistic approaches

Laid the mathematical foundation for modern probabilistic machine learning methods

People:Leonard Baum, Lloyd Welch, Ted Petrie

Organizations:Institute for Defense Analyses

1972Milestones

Prolog: Programming with Logic

In 1972, a programming language was born at the University of Marseille that thought quite unlike any other: Prolog, short for Programmation en Logique. Its creators Alain Colmerauer and Philippe Roussel — building on the theory of Robert Kowalski — pursued a compelling idea. Instead of telling the computer step by step how to do something, in Prolog you merely describe the facts and rules of a world. The system then draws the logical conclusions itself. Prolog became the most important language of symbolic AI: in expert systems, in language processing, and as the heart of Japan's ambitious Fifth Generation project. For an honest assessment: logic programming never became the dominant paradigm of AI. Japan's grand project, which bet entirely on Prolog, fell well short of its promises. And the breakthrough owes as much to Robert Kowalski's theory as to the language itself.

In 1972, Alain Colmerauer and Philippe Roussel at the University of Marseille created the language Prolog — short for Programmation en Logique.

Prolog is declarative: you describe facts and rules, and the system derives the logical conclusions itself — instead of prescribing step by step how.

Prolog became the most important language of logical, symbolic AI — in expert systems, language processing, and Japan's Fifth Generation project.

Anti-hype: logic programming never became the dominant AI paradigm; Japan's Fifth Generation project built on it fell short of expectations. Robert Kowalski's theory mattered too, not just the language.

People:Alain Colmerauer, Philippe Roussel, Robert Kowalski

Organizations:University of Aix-Marseille

1974Milestones

The First AI Winter

A period of drastic cuts to research funding and declining confidence in Artificial Intelligence. After the exaggerated promises of the 1960s came bitter reality: AI programs could only solve trivial versions of the problems they were supposed to tackle. In the United Kingdom, the Lighthill Report of 1973 delivered a devastating critique, prompting the Science Research Council to scale back funding for undirected AI research. In the United States, the DARPA — spurred by the Mansfield Amendment — turned away from purpose-free research over several years; the sharp cut to speech-understanding funding in 1974/75 hit the project at Carnegie Mellon and led to the cancellation of a 3-million-dollar contract. This winter lasted until around 1980 and taught the AI community an important lesson: realistic expectations are the key to sustainable progress.

DARPA in the US and the British Science Research Council drastically cut funding for undirected AI research in the mid-1970s

Professor James Lighthill sharply criticized AI research in 1973 for failing to meet its goals and pointed to the problem of combinatorial explosion

DARPA canceled the 3-million-dollar contract with Carnegie Mellon for speech-understanding systems after disappointing results

AI programs of the early 1970s were limited to trivial versions of real problems and functioned like intelligent 'toys'

People:James Lighthill, J.C.R. Licklider, Hans Moravec

Organizations:DARPA, British Science Research Council, Carnegie Mellon University

1980Papers

Neocognitron: The Ancestor of CNNs

In 1980, the Japanese researcher Kunihiko Fukushima introduced a neural network far ahead of its time: the Neocognitron. Its model was nature — specifically the visual cortex, as the Nobel laureates Hubel and Wiesel had studied it in cats. There, simple and complex cells process visual stimuli in stages. Fukushima rebuilt this principle: a multilayered network that recognises features layer by layer — regardless of where in the image they appear. With it, the Neocognitron anticipated the core ideas of today's convolutional neural networks, the networks that have dominated image recognition since 2012. For an honest assessment: the Neocognitron did not yet use backpropagation and could not be trained the way modern CNNs are. Only backpropagation (1986) and Yann LeCun's LeNet (1989) turned the architecture into networks that could practically learn. Fukushima's pioneering role is still often underestimated today.

In 1980, Kunihiko Fukushima introduced the Neocognitron — a multilayered neural network for pattern recognition.

Its model was the visual cortex (Hubel and Wiesel): simple and complex cells that recognise features in stages, independent of their position.

The Neocognitron thus anticipated the core ideas of today's convolutional neural networks — local feature filters and hierarchical processing. LeCun's LeNet (1989) built on it.

Anti-hype: the Neocognitron did not yet use backpropagation. Only backpropagation (1986) and LeNet (1989) made it into networks that could practically learn. Fukushima's pioneering role is often underestimated.

People:Kunihiko Fukushima

Organizations:NHK Broadcasting Science Research Laboratories

1980Milestones

The Expert Systems Era of the 1980s

The 1980s mark the heyday of expert systems, when AI became commercially successful for the first time. Companies worldwide adopted these rule-based AI programs, which replicate human expert knowledge in specialized domains. The AI industry grew from a few million dollars in 1980 to billions by 1988. Two-thirds of Fortune 500 companies deployed the technology. Systems like MYCIN achieved an acceptance rate of around 65% for their treatment recommendations in studies — on par with faculty experts, even though MYCIN was never used clinically. But the boom ended in the classic pattern of an economic bubble, as dozens of firms failed and the limitations of the technology became apparent.

The AI industry grew from a few million dollars (1980) to billions (1988)

Two-thirds of Fortune 500 companies deployed expert systems in their day-to-day business

MYCIN's treatment recommendations achieved around 65% acceptance — comparable to human faculty experts

Classic pattern of an economic bubble: boom followed by a massive crash

People:Edward Feigenbaum, Bruce Buchanan, Edward Shortliffe

Organizations:Stanford University, Fortune 500 Companies

1982Papers

Hopfield Networks: Associative Memory

The rebirth of neural networks through associative memory capabilities. In 1982, John Hopfield published the groundbreaking paper 'Neural networks and physical systems with emergent collective computational abilities' in PNAS. His innovation lay in connecting neurobiology with statistical physics: Hopfield networks function as content-addressable memory that reconstructs complete patterns from incomplete or noisy inputs. The recurrent architecture with symmetric bidirectional connections converges to fixed-point attractors through a Lyapunov energy function. The system 'rolls downhill' to the nearest stored memory. Hopfield's work reignited interest in neural networks and laid the theoretical foundation for modern RNNs. Hebbian learning enabled associative pattern storage – a breakthrough for understanding biological and artificial memory systems.

Content-addressable memory that reconstructs complete patterns from incomplete or noisy inputs

Recurrent architecture with symmetric bidirectional connections and emergent collective properties

Lyapunov energy function guides system to fixed-point attractors by 'rolling downhill' to stored memory

Reignited interest in neural networks and laid foundation for modern RNN development

People:John Hopfield

Organizations:California Institute of Technology, Bell Laboratories

1986Papers

Backpropagation Algorithm

The birth of modern machine learning through an elegant training algorithm. In October 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams published the paper 'Learning representations by back-propagating errors' in Nature. This algorithm considerably changed the training of neural networks by providing an efficient method for adjusting weights in multi-layer networks. The procedure repeatedly adjusts the connection weights to minimize the difference between actual and desired output. The decisive innovation lay in the ability to train hidden layers that automatically recognize important features of the task. The mathematical foundations had already been derived earlier — by Paul Werbos (1974) and Seppo Linnainmaa (1970), among others — but it was this paper that made backpropagation widely known and convincingly demonstrated its effectiveness. Backpropagation became the workhorse of machine learning and today enables all modern deep learning applications.

Published in Nature on October 9, 1986 as 'Learning representations by back-propagating errors'

Made efficient training of multi-layer neural networks practically usable and widely known through gradient computation

Hidden layers learned to automatically recognize important features — a significant advance over perceptrons

Laid the mathematical foundation for all modern deep learning applications and Transformer architectures

People:David Rumelhart, Geoffrey Hinton, Ronald Williams

Organizations:University of California San Diego, Carnegie Mellon University, Nature

1987Milestones

The Second AI Winter

The collapse of the specialized AI hardware market and the failure of expert systems. In 1987 the market for Lisp machines collapsed, as Apple and IBM computers became cheaper and more powerful than the expensive AI-specific systems. Expert systems such as XCON proved too maintenance-intensive and inflexible for real-world applications. Jack Schwartz, the new IPTO director, described expert systems as 'clever programming' and cut AI funding 'deeply and brutally.' The decline of Lisp machine manufacturers dragged on for the following years — market leader Symbolics only filed for insolvency in 1993 — leading to a longer and more severe winter than the first one in 1974. This winter lasted until around 1993 and ended the commercial hype around expert systems and specialized AI hardware — though symbolic AI as a research direction persisted.

The market for specialized Lisp machines collapsed in 1987, as Apple and IBM computers became cheaper and more powerful

Expert systems such as XCON proved too maintenance-intensive, rigid, and unable to handle new data

Jack Schwartz cut AI funding at DARPA 'deeply and brutally' and described expert systems as 'clever programming'

The costs of AI-specific hardware far outweighed the promised business returns

People:Jacob T. Schwartz, Marvin Minsky, Roger Schank

Organizations:DARPA, IPTO, Symbolics, Lisp Machines Inc, XCON

1987Datasets

UCI ML Repository: The Dataset Library

The democratization of machine learning research through standardized benchmark datasets. In 1987, UCI PhD student David Aha and fellow students founded the UCI Machine Learning Repository as an FTP archive — a collection of databases, domain theories, and data generators for empirical ML algorithm analysis. This initiative addressed the critical shortage of standardized, freely available datasets for the growing ML community. The repository became the primary source of ML datasets worldwide and gave students, educators, and researchers access to high-quality benchmarks. Over the years it has been cited tens of thousands of times, making it one of the most widely used resources in all of computer science. Today managed by the Center for Machine Learning and Intelligent Systems, the UCI ML Repository offers datasets from healthcare, finance, and countless other domains. The repository fundamentally democratized ML education and research.

Founded in 1987 as an FTP archive by David Aha and UCI students for empirical ML algorithm analysis

Became the primary source of ML datasets for students, educators, and researchers worldwide

Cited tens of thousands of times — one of the most widely used dataset resources in all of computer science

Democratized ML research by providing access to standardized, high-quality benchmark datasets

People:David Aha, Patrick Murphy

Organizations:University of California Irvine, UCI

1988Papers

Bayesian Networks: Reasoning Under Uncertainty

While neural networks and expert systems competed for attention, Judea Pearl at UCLA was building a third great pillar of AI: reasoning under uncertainty. In his book Probabilistic Reasoning in Intelligent Systems (1988) he popularised Bayesian networks — graphs in which nodes are variables and edges are their probability-based dependencies. Instead of the rigid if-then rules and ad-hoc certainty factors of expert systems, they made it possible to combine knowledge and uncertainty cleanly and to infer from them efficiently. Bayesian networks shaped AI and machine learning of the 1990s and 2000s; Pearl received the Turing Award in 2011 and later turned to causal inference — the why behind the data. For an honest assessment: Bayes' theorem itself dates to the 18th century; Pearl's achievement was not to invent probability, but to make probabilistic reasoning structured and computable for AI.

Judea Pearl (UCLA) established reasoning under uncertainty as a third pillar of AI — alongside symbolic AI and neural networks.

Bayesian networks: graphs of variables (nodes) and probability-based dependencies (edges) — replacing ad-hoc certainty factors with clean, efficient inference.

Shaped 1990s/2000s machine learning; Pearl received the 2011 Turing Award and later founded modern causal inference.

Anti-hype: Bayes' theorem dates to the 18th century; Pearl's achievement was making probabilistic reasoning structured and computable for AI — not inventing probability.

People:Judea Pearl

Organizations:UCLA

1989Papers

Universal Approximation Theorem

The mathematical proof of the theoretical power of neural networks. In 1989, Kurt Hornik, Maxwell Stinchcombe, and Halbert White published the foundational paper 'Multilayer feedforward networks are universal approximators' in Neural Networks. Their rigorous proof showed that even a single hidden layer with sufficiently many neurons can approximate any Borel-measurable function to arbitrary precision. This theoretical foundation mathematically justified the use of neural networks and assured researchers that sufficiently large networks can model complex, non-linear relationships in real-world data. Parallel work by George Cybenko and Funahashi arrived at similar results using different techniques. The theorem established universality through widening the hidden layer and became the theoretical pillar for all subsequent deep learning developments. Hornik et al. created the mathematical confidence that enabled the neural network renaissance of the 1990s.

Rigorous mathematical proof of the universal approximation capabilities of neural networks

A single hidden layer with sufficiently many neurons can approximate any Borel-measurable function to arbitrary precision (Cybenko's parallel work showed this for continuous functions)

Proves the ability to model complex, non-linear relationships in real-world data

Provided mathematical justification for the use of neural networks and a theoretical basis for confidence

People:Kurt Hornik, Maxwell Stinchcombe, Halbert White

Organizations:University of California San Diego

1989Breakthroughs

World Wide Web: The Invention of the WWW

The invention that connected the world and created the foundation for modern AI data sources. On March 12, 1989, Tim Berners-Lee submitted his proposal for an 'Information Management System' at CERN — originally called 'Mesh,' later renamed 'World Wide Web.' As a British scientist, he recognized the need for automated information exchange between researchers worldwide. By the end of 1990, he had developed the three fundamental web technologies: HTML (Hypertext Markup Language), HTTP (Hypertext Transfer Protocol), and URI/URL. The first web server, info.cern.ch, ran on a NeXT computer, alongside the first browser/editor 'WorldWideWeb.app.' In 1991, the web became publicly accessible. The exponential growth from around 10 websites (1992) to several hundred thousand (1996) created the data foundation for later AI systems. Without the web, there would be no Common Crawl datasets and no large language models.

Hypertext project with linked documents, browsers, and 'hot spots' — building on earlier hypertext ideas (Ted Nelson, Vannevar Bush's Memex), but deliberately simpler than Nelson's Xanadu

Information Management Proposal submitted on March 12, 1989 at CERN for automated scientific information exchange

HTML, HTTP, and URI/URL developed as fundamental web technologies by the end of 1990

Created the data infrastructure for later Common Crawl collections and large language model training

People:Tim Berners-Lee

Organizations:CERN

1989Papers

LeNet and the Birth of CNNs

The first successful real-world application of Convolutional Neural Networks. In 1989, Yann LeCun at AT&T Bell Labs combined backpropagation with a CNN architecture for handwriting recognition for the first time. This system — later known as the ancestor of the LeNet family — recognized handwritten ZIP codes for the US Postal Service with notable accuracy: roughly 1% error on training data and about 5% on previously unseen test data; when the network was allowed to reject uncertain cases, the error on the remaining digits dropped to about 1%. This performance demonstrated the practical superiority of CNNs over conventional approaches and established the foundation for modern computer vision. It showed that neural networks were not merely theoretical constructs but could solve real business problems. The architecture went through several improvement iterations and culminated in LeNet-5 in 1998 with 99.05% accuracy on MNIST. This work laid the groundwork for all modern CNN architectures.

First successful combination of Convolutional Neural Networks with backpropagation training

Recognized handwritten ZIP codes for the US Postal Service: about 5% error on test data, roughly 1% when uncertain cases were allowed to be rejected

Yann LeCun's pioneering work at Bell Labs established CNNs as a practical computer vision solution

Laid the foundation for all modern CNN architectures from AlexNet to current vision systems

People:Yann LeCun, Bernhard Boser, John Denker

Organizations:AT&T Bell Labs, NIPS

1992Breakthroughs

TD-Gammon: Learning by Playing Against Itself

Long before AlphaGo, a program at IBM showed what reinforcement learning is capable of: in 1992, Gerald Tesauro introduced TD-Gammon, a neural network that learned to play backgammon. The remarkable part was how it learned. TD-Gammon trained almost entirely by playing hundreds of thousands of games against itself and learning from the outcomes — using the temporal-difference method, which corrects predictions step by step. No one had to show it good moves. The network reached near world-class level and even discovered opening moves that human professionals then adopted themselves. For an honest assessment: as impressive as the success was, for a long time it could not be transferred to other games. One reason lies in the dice: backgammon is a game of chance, and the randomness automatically creates variety during practice — an advantage for self-play that deterministic games like chess or Go do not offer.

In 1992, Gerald Tesauro at IBM introduced TD-Gammon — a neural network that learned to play backgammon.

It learned almost entirely by playing against itself, using the reinforcement-learning method temporal difference — with no human games as a template.

TD-Gammon reached near world-class level and discovered new opening moves that professionals adopted — a forerunner of AlphaGo, almost 25 years earlier.

Anti-hype: for a long time the success could not be transferred to other games. Backgammon's dice automatically create variety in practice — a self-play advantage that chess or Go lack.

People:Gerald Tesauro

Organizations:IBM

1992Papers

Q-Learning: Foundation of Reinforcement Learning

In 1992, Chris Watkins and Peter Dayan published the mathematical proof for Q-learning — an algorithm that would significantly change the AI world. Watkins had already developed the core idea in 1989 in his doctoral thesis 'Learning from Delayed Rewards' at King's College Cambridge. Q-learning solved a fundamental problem: how can an agent act optimally without needing a model of its environment? The answer was elegant — through the stepwise optimization of a Q-function that assigns a value to each state-action pair. The 1992 convergence proof showed that, given infinite exploration, Q-learning is guaranteed to find the optimal policy for any finite Markov decision problem. This model-free method became the cornerstone of modern reinforcement learning. From robotics to financial markets, from games to autonomous systems — Q-learning is everywhere. In late 2013, DeepMind introduced a deep variant with Deep Q-Networks (DQN) (published in Nature in 2015), achieving human-level or superhuman performance on a large portion of Atari games. To this day, Q-learning — especially in its Deep Q-Network form — remains a core building block of countless AI systems.

1992 mathematical convergence proof: Q-learning is guaranteed to find optimal policies given infinite exploration

Innovative model-free approach: learning optimal actions without an environment model or transition probabilities

Elegant solution for Markov decision problems through stepwise Q-function optimization

Cornerstone of modern reinforcement learning — still at the core of Deep Q-Networks and countless AI systems today

People:Chris Watkins, Peter Dayan

Organizations:King's College Cambridge, University College London

1993Datasets

Penn Treebank: Syntactic Annotation Transforms NLP

The creation of the fundamental corpus for modern parsing research. In 1993, Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz published the landmark paper 'Building a Large Annotated Corpus of English: The Penn Treebank' in Computational Linguistics. With over 4.5 million words of American English tagged with part-of-speech labels, and around 3 million of those with detailed syntactic (skeletal parsed) annotation, the Penn Treebank significantly transformed computational linguistics. The two-stage process combined automatic POS tagging with human correction to achieve exceptional annotation quality. Over the entire project span of roughly seven years (1989-1996) and in the extended Penn Treebank II, a total of 7 million POS-tagged words, 3 million skeletally parsed texts, and 2 million predicate-argument structures were produced. The Penn Treebank established empirical methods in computational linguistics and became the foundation of modern parsing algorithms. To this day, the Penn Treebank serves modern NLP systems as an important evaluation benchmark for parsing and language modeling.

4.5+ million words with part-of-speech tagging, around 3 million with detailed syntactic annotation — produced through a two-stage semi-automatic process

Established empirical methods in computational linguistics and became the standard benchmark for parsing research

Significantly shifted parsing algorithms from rule-based to statistical approaches

Laid the groundwork for statistical parsing and continues to serve modern NLP systems as an evaluation benchmark

People:Mitchell Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz

Organizations:University of Pennsylvania, Linguistic Data Consortium

1995Papers

AdaBoost: Weak Learners Become Strong

In 1995, Yoav Freund and Robert Schapire developed AdaBoost (Adaptive Boosting), an algorithm that considerably changed machine learning. Their core idea: combine many 'weak learners' into a highly precise prediction model. A weak learner is only marginally better than chance — but hundreds of them together can achieve notable results. AdaBoost adapts itself: incorrect predictions are weighted more heavily in the next iteration, so the system automatically focuses on difficult cases. The theoretical elegance was compelling — Freund and Schapire proved that the training error falls exponentially fast toward zero, as long as each weak learner performs better than chance. In 2003, they received the Godel Prize for this foundation of boosting theory — one of the most prestigious awards in theoretical computer science. AdaBoost found practical applications in biology, computer vision, and speech recognition. The method laid the groundwork for modern ensemble methods and inspired a whole generation of boosting algorithms, up to and including XGBoost.

Adaptive weighting: difficult cases are weighted more heavily for focused learning on problem areas

Weak Learners principle: hundreds of simple classifiers together produce highly precise predictions

Godel Prize 2003: one of the most prestigious awards in theoretical computer science, for founding boosting theory

Foundation of modern ensemble methods: inspired XGBoost and an entire generation of boosting algorithms

People:Yoav Freund, Robert Schapire

Organizations:AT&T Bell Laboratories

1995Papers

Support Vector Machines: Maximum Margin Classification

The establishment of elegant geometric approaches for robust classification. In 1995, Corinna Cortes and Vladimir Vapnik at AT&T Bell Labs published the foundational paper 'Support-Vector Networks' in Machine Learning. SVMs extended Vapnik and Chervonenkis' early maximum margin approach from 1964 (the 'Generalized Portrait') into a practical solution for non-separable training data through the 'soft margin' innovation. The core principle lies in constructing linear decision surfaces in very high-dimensional feature spaces via non-linear input transformations. The kernel trick from 1992 enabled efficient computation without explicit transformation. SVMs maximize the margin between classes, thereby offering high generalization capability. With tens of thousands of citations, the paper became one of the most cited works in machine learning and dominated classification tasks until the deep learning revolution. SVMs remained robust, interpretable, and effective for high-dimensional problems.

Vapnik and Chervonenkis' maximum margin approach from 1964 extended into a practical solution for non-separable data

The kernel trick enables non-linear classification through implicit high-dimensional transformations

The maximum margin principle maximizes the distance between classes for optimal generalization

Established a theoretically grounded alternative to neural networks with generalization guarantees

People:Vladimir Vapnik, Corinna Cortes

Organizations:AT&T Bell Labs

1995Datasets

WordNet: The Semantic Network of Language

The first lexical dictionary built as a semantic network for computational linguistics. In November 1995, George Miller published the foundational paper 'WordNet: A Lexical Database for English' in Communications of the ACM, presenting his vision that had been in development since 1986. WordNet organizes English nouns, verbs, adjectives, and adverbs into synsets — cognitive synonym groups linked by semantic and lexical relations. This structure mirrors human semantic memory and enables navigation through meaningful word and concept networks. Machine-readable dictionaries had existed before, but WordNet was the first to model vocabulary consistently as a network of synsets and meaning relations, combining traditional lexicographic information with modern data processing. With development beginning in 1986 under Miller and his Princeton team, WordNet became the foundation for ImageNet hierarchies and modern NLP systems. The semantic network structure influenced all subsequent knowledge graphs and embedding techniques.

First lexical dictionary built as a semantic network of synsets and meaning relations, with programmatic access

Synsets linked by semantic and lexical relations form a navigable meaning network

Mirrors human semantic memory and connects cognitive science with computational linguistics

Laid the groundwork for ImageNet hierarchies, knowledge graphs, and modern semantic NLP systems

People:George Miller, Christiane Fellbaum

Organizations:Princeton University, Cognitive Science Laboratory

1996Papers

PageRank: Google's Billion-Dollar Algorithm

In 1996, two Stanford PhD students developed an algorithm that would significantly change the internet. Larry Page and Sergey Brin launched the 'BackRub' project with a novel idea: the importance of a web page is measured not only by its content, but by the links pointing to it. As with academic citations, the more often a page is linked to, the more important it is. The PageRank algorithm simulates a 'random surfer' clicking randomly through the web. The more frequently the random surfer reaches a page via the link structure, the higher its importance ranking. Page's web crawler started in March 1996 from his own Stanford homepage. The formal publication of the PageRank paper followed in January 1998 as a Stanford Technical Report. By August 1996, BackRub had already discovered around 75 million URLs — that is, addresses found via links, only a portion of which had actually been crawled. Even the early Stanford prototype delivered more relevant results than contemporary search engines such as Excite or Yahoo!. Stanford received the patent and sold its 1.8 million Google shares in 2005 for $336 million. A university project became one of the most successful search engines — and the foundation of the modern web AI.

Stanford project 'BackRub' analyzed backlink data to determine web importance — the foundation for Google

Innovative link analysis: page importance determined by inbound links rather than keyword frequency alone

Random Surfer Model: a page's importance grows with how frequently the random surfer reaches it via the link structure

Stanford research became Google Inc. — PageRank as the foundation of the world's most valuable search engine

People:Larry Page, Sergey Brin, Rajeev Motwani, Terry Winograd

Organizations:Stanford University, Google Inc.

1997Competitions

Deep Blue Defeats Kasparov

The first match victory of a machine over a reigning world chess champion under tournament conditions. On May 11, 1997, Deep Blue made history when the IBM supercomputer defeated Garry Kasparov in the rematch in New York with a score of 3.5 to 2.5. Following the 1996 defeat, IBM had fundamentally overhauled the system: new chess chips doubled the speed to 200 million positions per second, improved endgame databases and grandmaster consultation refined its playing strength. The decisive sixth game lasted just one hour — after a knight sacrifice, Kasparov quickly found himself in an objectively lost position and resigned as early as move 19, an unprecedented moment in his career. The victory demonstrated for the first time the superiority of computers in complex strategic thinking and marked a turning point in public perception of AI. The prize money of 00,000 for Deep Blue underscored the historic significance of this triumph of machine intelligence.

First victory of a computer over a reigning world chess champion in a match under standard tournament conditions (Deep Blue had already won a single game in 1996)

200 million positions per second, improved endgame databases, and grandmaster consultation

IBM's technical triumph after years of development from ChipTest in 1985, through Deep Thought, to Deep Blue

A turning point in public perception of AI and proof of machine superiority in complex strategic thinking

People:Garry Kasparov, Murray Campbell, Joe Hoane, Feng-hsiung Hsu

Organizations:IBM, World Chess Championship

1997Papers

LSTM: Long Short-Term Memory

The solution to the vanishing gradient problem and the birth of effective sequence modeling. On November 15, 1997, Sepp Hochreiter and Jürgen Schmidhuber published the groundbreaking paper 'Long Short-Term Memory' in Neural Computation. Their innovation solved a fundamental problem of recurrent networks: the vanishing of gradients over longer sequences. LSTM introduced special memory cells with gate mechanisms that enable constant error flow over thousands of time steps. The multiplicative gates learn to open and close access to the constant error carousel. With O(1) complexity per time step and local learning, LSTM clearly outperformed all contemporary RNN methods. The system solved complex long-time-lag problems for the first time that were previously unsolvable. LSTM became the foundation for modern speech recognition, translation, and time series analysis.

Solved vanishing gradient problem through constant error flow over thousands of time steps

Special memory cells with constant error carousels for long-term information storage

Multiplicative gate units learn to open and close access to constant error flow

Enabled effective long-term sequence modeling for speech recognition and time series analysis

People:Sepp Hochreiter, Jürgen Schmidhuber

Organizations:Technical University of Munich, IDSIA

1998Datasets

MNIST: The Machine Learning Standard

The creation of one of the most important benchmark datasets for computer vision beginners. In 1998, Yann LeCun, Corinna Cortes, and Christopher Burges introduced the MNIST dataset — a curated collection of handwritten digits that became the 'Hello World' of machine learning. Based on NIST's Special Database 3 and Special Database 1, MNIST contains 70,000 normalized 28x28-pixel grayscale images: 60,000 for training and 10,000 for testing. Careful preprocessing and anti-aliasing made MNIST ideal for learning purposes without time-consuming data preparation. MNIST appeared in the paper 'Gradient-based learning applied to document recognition' (Proceedings of the IEEE, November 1998). The dataset became the standard benchmark for countless ML algorithms and allowed generations of students to experience their first successes in computer vision. MNIST democratized machine learning education worldwide.

70,000 handwritten digits as 28x28-pixel normalized grayscale images

Curated by Yann LeCun, Corinna Cortes, and Christopher Burges from NIST databases

Became the 'Hello World' of machine learning and the standard benchmark for ML algorithms

Democratized ML education through easy access without time-consuming data preparation

People:Yann LeCun, Corinna Cortes, Christopher Burges

Organizations:AT&T Labs, Courant Institute

2001Papers

Random Forest: Breakthrough of Ensemble Methods

In 2001, Leo Breiman of UC Berkeley published one of the most-cited machine learning papers of all time: 'Random Forests.' His algorithm significantly changed the concept of ensemble methods and became one of the most important tools in modern statistics. The core idea was elegantly simple: instead of a single decision tree, you train hundreds of random trees and let them vote. Each tree sees only a random subset of the data and features — 'bagging' combined with feature randomization. The result: drastically reduced overfitting and exceptional predictive accuracy. Breiman also provided the theoretical foundation with generalization error bounds based on tree strength and correlation. Random Forest became one of the most low-maintenance 'plug-and-play' ML algorithms — minimal tuning, maximum performance. From bioinformatics to financial market analysis, Random Forest continues to dominate countless applications and made ensemble methods the standard tool — in parallel with the boosting line, from which XGBoost later emerged.

Ensemble breakthrough: hundreds of random decision trees vote together for better predictions

Bagging + feature randomization: each tree sees different data and features for diversity

Theoretical grounding: generalization error bounds based on tree strength and correlation

Plug-and-play ML algorithm: minimal tuning with exceptional performance across all domains

People:Leo Breiman, Adele Cutler

Organizations:UC Berkeley Statistics Department, Machine Learning Journal

2005Organizations

Future of Humanity Institute Founded

The institutionalization of AI safety research and existential risk assessment. In 2005, Nick Bostrom founded the Future of Humanity Institute at the University of Oxford as a multidisciplinary research group. Starting with just three researchers, FHI grew into an intellectual center of gravity for brilliant, often eccentric thinkers and expanded to around 40 staff. The institute established new fields of research: existential risks, AI alignment, AI governance, and longtermism. Bostrom's early publications, such as 'The fable of the dragon tyrant' (2005) and 'What is a singleton?' (2006), shaped thinking about AI safety. Despite its relatively brief 19-year existence before closing in 2024, FHI produced notable advances and a new way of thinking about humanity's biggest questions. Oxford's academic legitimization of AI safety research lent the field scientific credibility.

Founded at Oxford University in 2005, grew from 3 to around 40 researchers before closing in 2024

Pioneered existential risks, longtermism, and AI governance as new research fields

Established AI alignment and AI safety as legitimate academic disciplines with global impact

Lent AI safety research scientific credibility and respect through its Oxford affiliation

People:Nick Bostrom, Anders Sandberg

Organizations:Oxford University, Future of Humanity Institute

2005Competitions

DARPA Grand Challenge: The Birth of Autonomous Driving

On October 8, 2005, a blue Volkswagen Touareg named 'Stanley' made history. Led by Sebastian Thrun, the Stanford Racing Team won the DARPA Grand Challenge — the world's first successful autonomous vehicle competition. After the complete failure of all participants in 2004 (best result: 7.4 miles, or 11.9 km), Stanley completed the entire 212 km desert course in 6 hours and 53 minutes. Five vehicles reached the finish line, four of them within the time limit — a marked improvement over the zero finishers the year before. Stanley navigated through three narrow tunnels, over 100 sharp curves, and the dangerous Beer Bottle Pass with its precipices. The innovation was software, not hardware: LiDAR sensors, machine learning, and a log of human driving decisions gave Stanley capabilities no robot had previously possessed. The million prize was just the beginning — Stanley laid the groundwork for Tesla Autopilot, Google Waymo, and the entire autonomous vehicle industry. Today Stanley is on display at the Smithsonian Museum.

Stanford's 'Stanley' became the first autonomous vehicle to complete a 212 km desert course in under 7 hours

Breakthrough from zero successful vehicles (2004) to five finishers (2005), four within the time limit, through better AI

Recognized as a software race: LiDAR, machine learning, and human driving data as the key

The birth of modern self-driving technology — inspired Tesla, Google, and an entire industry

People:Sebastian Thrun, Mike Montemerlo, Stanley Thrun Team

Organizations:DARPA, Stanford University, Stanford AI Lab

2006Papers

Deep Belief Networks: The Renaissance of Deep Learning

Geoffrey Hinton reshaped the AI world in 2006 with his important paper on Deep Belief Networks. After years of neural networks being out of favor, he showed how deep neural networks could be trained efficiently. His innovation: layer-by-layer pre-training with Restricted Boltzmann Machines (RBMs). This 'greedy' learning strategy solved the problem of weight initialization and made deep learning practically applicable. The method stacks RBMs on top of each other and trains each layer individually before refining the entire network. Hinton's work ended the years-long obscurity of neural networks and initiated their renaissance. By 2009, DBNs had already significantly reduced error rates in speech recognition. In 2012, Hinton's team won the ImageNet Challenge (ILSVRC) with AlexNet — a deep convolutional neural network that used GPU training, ReLU, and dropout, and was no longer reliant on the RBM pre-training of DBNs. AlexNet achieved a top-5 error rate of 15.3% compared to 26.2% for the second-best team — a notable improvement. This moment marks the rebirth of neural networks and the beginning of today's AI boom.

A greedy layer-by-layer learning algorithm enabled efficient training of deep neural networks for the first time

Stacking Restricted Boltzmann Machines (RBMs) as building blocks for complex representations

Unsupervised pre-training solved the weight initialization problem of deep networks

Ended the obscurity of neural networks and established the modern deep learning revolution from 2006 onward

People:Geoffrey Hinton, Simon Osindero, Yee-Whye Teh

Organizations:University of Toronto, Neural Computation

2006Competitions

Netflix Prize: The Million-Dollar Algorithm

The democratization of machine learning through a crowdsourcing competition of unprecedented scale — with a public dataset and one million dollars in prize money. On October 2, 2006, Netflix launched this million-dollar challenge: who can improve the recommendation algorithm Cinematch by 10%? With over 100 million ratings from 480,000 users for 17,770 movies, Netflix provided one of the largest public ML datasets ever made available. Over 40,000 teams from 186 countries registered; more than 5,000 of them made it onto the qualifying leaderboard and together submitted around 44,000 valid solutions. When 'BellKor's Pragmatic Chaos' became the first team to hit the 10-percent mark on June 26, 2009, it triggered a 30-day last call that ended on July 26, 2009; the winner, with a 10.06% improvement, was officially crowned at the award ceremony on September 21, 2009. Their recipe for success: an ensemble combination of matrix factorization and Restricted Boltzmann Machines. The competition considerably advanced collaborative filtering and demonstrated the power of crowdsourcing for complex ML problems. Although Netflix never deployed the winning algorithms in production (implementation costs were too high), the competition had a lasting influence on the modern recommendation system industry.

,000,000 prize for a 10% improvement of the Cinematch algorithm over a 3-year competition

100+ million ratings from 480k users for 17,770 movies as a public ML dataset

Considerably advanced collaborative filtering through matrix factorization and Restricted Boltzmann Machines

40,000+ teams from 186 countries, over 5,000 on the qualifying leaderboard with around 44,000 submissions — crowdsourcing power for ML

People:Reed Hastings, Netflix Team, BellKor Pragmatic Chaos Team

Organizations:Netflix, BellKor, AT&T Research

2007Datasets

Common Crawl Foundation Established

The democratization of the internet as training data for artificial intelligence. In 2007, Gil Elbaz founded the Common Crawl Foundation with the mission of archiving the entire public internet and making it freely available. Systematic crawling activity began in 2008; since then the corpus has grown by billions of pages every month and now (as of 2024) stands at over 100 billion web pages and several petabytes of data. This collection became the most important training source for large language models and enabled the development of GPT-3, ChatGPT, LLaMA, and other modern AI systems. Common Crawl differed from commercial approaches through its non-profit nature and free availability. The unfiltered raw data collection does require post-processing, but it democratized access to comprehensive language data and made AI research less dependent on proprietary datasets.

Founded in 2007 with the mission of archiving the entire public internet and making it freely available

Has grown by billions of pages each month since crawling began in 2008 — now (as of 2024) over 100 billion web pages and several petabytes of data

Became the most important training source for GPT-3, ChatGPT, LLaMA, and other modern large language models

Non-profit approach democratized access to comprehensive language data for AI research worldwide

People:Gil Elbaz, Common Crawl Team

Organizations:Common Crawl Foundation, Internet Archive, Alexa Internet

2007Milestones

CUDA: The Graphics Card Becomes the AI Engine

The AI revolution of 2012 did not run only on algorithms — it ran on graphics cards. NVIDIA laid the groundwork in 2007 with CUDA: a platform that let developers run ordinary programs in a C-like language directly on the GPU — no longer just graphics. Announced with the G80 chip in late 2006, released as a public beta in February 2007 and as version 1.0 in June 2007, CUDA made the enormous parallelism of graphics processors broadly accessible for the first time. That is a perfect fit for neural networks, whose computation is at heart matrix multiplication — thousands of small operations at once. Five years later, Krizhevsky, Sutskever and Hinton trained AlexNet on two NVIDIA GTX 580 cards using CUDA — the breakthrough that ignited deep learning. From 2014, NVIDIA's cuDNN provided the optimised building blocks on which TensorFlow, PyTorch and others run today. The honest assessment: CUDA did not invent GPGPU (programmable shaders existed from 2001, BrookGPU from 2004) and did not single-handedly cause deep learning — but it made the necessary compute accessible, and without it the rest would not have been possible.

CUDA (2007, NVIDIA; architects Ian Buck — from the BrookGPU project — and John Nickolls) lets developers run general-purpose programs in a C-like language directly on the GPU — not just graphics.

GPUs compute thousands of operations in parallel. That fits neural networks exactly, whose core is matrix multiplication.

It became the engine of deep learning: AlexNet (2012) trained on two GTX 580 cards using CUDA; from cuDNN (2014) on, virtually every major framework runs on it.

Anti-hype: GPGPU existed before CUDA (shaders 2001, BrookGPU 2004); CUDA did not cause deep learning alone — it made the compute accessible (necessary, not sufficient).

People:Ian Buck, John Nickolls

Organizations:NVIDIA

2008Papers

Zero-Shot Learning: Learning Without Data

The formalization of learning unseen classes through semantic descriptions. In July 2008, Hugo Larochelle, Dumitru Erhan, and Yoshua Bengio published their work 'Zero-data Learning of New Tasks' at the AAAI conference, providing the theoretical predecessor formalization. The actual name 'Zero-Shot Learning' was coined in 2009 by two separate groups: Palatucci and colleagues with 'Zero-Shot Learning with Semantic Output Codes' at NIPS 2009, and Lampert and colleagues with their attribute-based approach at CVPR 2009. The fundamental problem: how can a model classify classes for which no training data is available, only descriptions? The solution lay in semantic embeddings and transfer learning — reusing trained models for new tasks. Larochelle's formalization addressed very large class sets that cannot be fully covered by training data. Experimental analyses demonstrated significant generalization capabilities in this context. This work laid the conceptual foundation for modern few-shot and zero-shot capabilities in GPT-3, GPT-4, and other large language models. Zero-shot learning became a key technology for scalable AI systems.

Classification of classes without training data — using only semantic descriptions of the target classes

Reuse of trained models for entirely new tasks through semantic embeddings

Semantic representations enable generalization to unseen concepts

Laid the foundation for few-shot and zero-shot capabilities in modern large language models

People:Hugo Larochelle, Dumitru Erhan, Yoshua Bengio

Organizations:University of Montreal

2009Datasets

CIFAR datasets established

The creation of a fundamental benchmark for computer vision. In 2009, Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton at the University of Toronto developed the CIFAR-10 and CIFAR-100 datasets. These emerged as labeled subsets of the 80-million-image 'Tiny Images' dataset. CIFAR-10 comprises 60,000 color 32x32-pixel images in ten categories like airplanes, cars, and animals, while CIFAR-100 distributes the same number of images across one hundred finer classes. The datasets became one of the most important benchmarks in computer vision research and enabled standardized comparisons between different algorithms. Notable is the connection to AlexNet: Krizhevsky used CIFAR-10 before 2011 for training small CNNs on single GPUs – a precursor to his later ImageNet success of 2012.

CIFAR-10 with 60,000 images in 10 categories, CIFAR-100 with 100 more detailed classes as computer vision benchmarks

Became one of the most important standardized benchmarks for computer vision algorithms worldwide

Enabled systematic evaluation and comparison of different machine learning approaches

Krizhevsky used CIFAR-10 before 2011 for CNN training – precursor to his AlexNet success in 2012

People:Alex Krizhevsky, Vinod Nair, Geoffrey Hinton

Organizations:University of Toronto, Canadian Institute for Advanced Research, CIFAR

2009Datasets

ImageNet: The Dataset That Changed Everything

The creation of the dataset that made Deep Learning possible. In 2009, Fei-Fei Li and her team presented the ImageNet paper, introducing a visual database that would transform computer vision — at launch it contained around 3.2 million hand-annotated images in approximately 5,200 categories. Expanded to its full size, ImageNet later comprised over 14 million hand-annotated images and around 22,000 categories, based on WordNet hierarchies, addressing the critical bottleneck: the shortage of large, high-quality training data. Annotation was carried out throughout the project by around 49,000 workers from 167 countries via Amazon Mechanical Turk — an unprecedentedly collaborative effort. What began as a poster in a corner of a Miami Beach convention center grew into the annual ImageNet Challenge (ILSVRC) and became one of the three drivers of modern AI development. ImageNet enabled AlexNet's 2012 breakthrough and laid the foundation for autonomous vehicles, facial recognition, and medical imaging.

At launch in 2009, around 3.2 million images; at full scale, over 14 million hand-annotated images in around 22,000 categories by around 49,000 workers from 167 countries

Based on WordNet hierarchies for structured categorization of visual objects

Provided the critical training data for AlexNet's 2012 breakthrough and the development of Deep Learning

Transformed computer vision research and enabled autonomous vehicles, facial recognition, and medical imaging

People:Fei-Fei Li, Jia Deng, Wei Dong, Richard Socher

Organizations:Stanford University, Princeton University

2010Milestones

DeepMind is founded

The birth of an AI lab that would make headlines worldwide. In September 2010, Demis Hassabis, Shane Legg, and Mustafa Suleyman founded DeepMind Technologies in London. Their goal: develop artificial general intelligence by combining insights from neuroscience and machine learning. Hassabis, a former chess prodigy and game developer, brought a unique vision: AI should learn like the human brain. In 2014, Google acquired the startup for an estimated $500 million – one of the largest AI acquisitions in history. DeepMind would later astonish the world with AlphaGo, AlphaFold, and other breakthroughs.

Founded in September 2010 in London as DeepMind Technologies

Demis Hassabis (neuroscientist, game developer), Shane Legg, and Mustafa Suleyman

Acquired by Google in 2014 for an estimated $500 million

Later responsible for AlphaGo, AlphaFold, and other groundbreaking AI systems

People:Demis Hassabis, Shane Legg, Mustafa Suleyman

Organizations:DeepMind, Google

2010Competitions

ImageNet Challenge: The Competition Begins

The establishment of the most important computer vision benchmark in AI history. In 2010, the first ImageNet Large Scale Visual Recognition Challenge (ILSVRC) launched, creating a standardized competition that would shape computer vision research for the next decade. With 1,000 object categories and 1.2 million training images, the Challenge far exceeded the benchmarks available at the time, such as PASCAL VOC with only 20 classes. Evaluation was based on Top-1 and Top-5 error rates — metrics that remain standard to this day. From 2010 to 2017, the top-5 accuracy of winners improved considerably, from 71.8% to 97.3%, eventually surpassing human performance. The annual Challenge attracted over 50 institutions from around the world and catalyzed advances that culminated in AlexNet's notable breakthrough in 2012 — a Top-5 error rate of just 15.3% (approximately 84.7% accuracy).

First ILSVRC in 2010 with 1,000 categories and 1.2 million training images — far beyond PASCAL VOC

Established Top-1 and Top-5 error rates as standard metrics for computer vision evaluation

Annual competition since 2010 attracted over 50 institutions worldwide and drove research advances

Created the competition structure that enabled AlexNet's breakthrough in 2012: a Top-5 error rate of just 15.3% (approximately 84.7% accuracy)

People:Fei-Fei Li, Olga Russakovsky, Alexander Berg

Organizations:Stanford University, ImageNet Team

2011Competitions

Watson Defeats Jeopardy! Champions

IBM's triumph in natural language processing and proof of machine language understanding. On February 16, 2011, IBM's Watson system defeated the two most successful champions of all time in a televised Jeopardy! challenge: Ken Jennings (74 consecutive wins) and Brad Rutter (.25 million in winnings through 2005). Watson, developed by David Ferrucci's DeepQA team, consisted of 90 IBM Power 750 servers (in 10 racks) with 16 terabytes of RAM and 2,880 POWER7 processor cores. The innovation lay in natural language processing: Watson understood questions in natural language and responded more precisely than any standard search technology — without an internet connection. With 7,147 in winnings (donated to charity), Watson dominated its human competitors by more than 0,000. Ken Jennings' famous closing remark, 'I for one welcome our new computer overlords,' underscored the historic significance of this NLP milestone.

Defeated Jeopardy! legends Ken Jennings and Brad Rutter in a televised challenge

First TV demonstration of advanced natural language processing capabilities for millions of viewers

DeepQA system combined knowledge retrieval with complex reasoning without an internet connection

Ken Jennings' 'computer overlords' comment underscored the cultural significance of AI progress

People:David Ferrucci, Ken Jennings, Brad Rutter

Organizations:IBM Research, Jeopardy!, Sony Pictures Television

2011Products

Siri Launch: Voice Assistant Goes Mainstream

On October 4, 2011, Apple significantly changed human-computer interaction with the introduction of Siri on the iPhone 4S. As the first mass-market voice assistant deeply integrated into a smartphone, Siri brought AI into the pockets of millions. 'What's the weather like today?' or 'Find me a good Greek restaurant' — suddenly users could speak naturally to their phones. Siri was not an entirely new invention: it had existed since 2010 as a standalone iOS app by Siri Inc. (acquired by Apple), and Google already offered voice search with Voice Actions. But Apple's seamless integration into the operating system turned the voice assistant into a mass phenomenon. Siri was built on decades of research at SRI International and DARPA's CALO project. Susan Bennett had unknowingly recorded the original voice back in 2005. Steve Jobs, gravely ill in his final days, did not appear at the launch event himself — Tim Cook presented the iPhone 4S. One day after Siri's debut, Jobs passed away. Siri was not perfect — critics noted its rigid commands and limited flexibility. But the goal was achieved: AI had gone mainstream. Siri inspired Amazon Alexa, Google Assistant, and Microsoft Cortana. The era of voice assistants had begun.

First mass-market voice assistant deeply integrated into a smartphone, reaching millions of users worldwide

Advanced natural language processing enabled intuitive human-computer communication

One of Steve Jobs' last major products before his death on October 5, 2011

Founded the modern era of voice assistants and inspired all competitors

People:Steve Jobs, Susan Bennett, Tom Gruber, Adam Cheyer

Organizations:Apple, SRI International, DARPA

2012Papers

Dropout Regularization

Geoffrey Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov substantially changed the training of neural networks in July 2012 with the invention of dropout regularization. This elegant technique prevents overfitting by randomly disabling roughly half of all neurons during training, thereby avoiding complex co-adaptations. Instead of learning specific feature combinations, each neuron learns robust, broadly useful recognition patterns. Published on arXiv on July 3, 2012, the method became, a few months later, one of the building blocks of AlexNet's ImageNet triumph at ILSVRC 2012 — whose results were presented in October 2012 — alongside GPU training, ReLU activation, and network depth, and became the standard in most modern deep learning architectures. Dropout set new records in speech and object recognition and solved the central overfitting problem of deep networks.

Solves the central overfitting problem of deep neural networks

Randomly disabling half of all neurons during training

One of the building blocks of AlexNet's ImageNet breakthrough — alongside GPU training, ReLU, and network depth

Became the standard in most modern deep learning architectures

People:Geoffrey Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov

Organizations:University of Toronto

2012Breakthroughs

AlexNet Success

The turning point for deep learning and modern AI. On September 30, 2012, the results of the ImageNet Challenge were published — AlexNet won by such a wide margin that computer vision was fundamentally changed. Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton from the University of Toronto developed a CNN architecture that beat its competition by a notable 10.9 percentage points — an improvement considered extraordinary in the scientific community. With 60 million parameters and innovative techniques such as ReLU activations and dropout layers, AlexNet demonstrated the practical superiority of deep learning impressively. That was the moment when an interesting theory became a dominant technology. Yann LeCun called it 'an undeniable turning point in the history of computer vision.' The GPU-based implementation paved the way for modern AI development.

AlexNet won the ImageNet 2012 Challenge with a 15.3% error rate — 10.9 percentage points better than the second-place participant (26.2%)

60 million parameters, ReLU activations, dropout layers, and GPU training established new technical standards

Demonstrated the practical superiority of deep learning impressively and ended skepticism toward neural networks

Launched modern AI development and made CNN architectures the standard in computer vision

People:Alex Krizhevsky, Geoffrey Hinton, Ilya Sutskever

Organizations:University of Toronto, ImageNet Challenge, NIPS

2012Breakthroughs

The Deep Learning Revolution

The year that ushered in the modern AI era through the convergence of datasets, GPU power, and neural architectures. 2012 marked the rise of deep learning as the dominant AI technology, catalyzed by AlexNet's impressive ImageNet victory. The convergence of three developments made this possible: Fei-Fei Li's ImageNet dataset provided massive labeled training data, GPU computing delivered the processing power needed for deep networks, and improved training methods such as ReLU activations and dropout regularization overcame longstanding limitations. Geoffrey Hinton's team — Alex Krizhevsky, Ilya Sutskever, and Hinton himself — proved in Krizhevsky's parents' house with two Nvidia cards that deep neural networks were practical. AlexNet proved to be a turning point for computer vision. This success greatly increased interest in deep learning and paved the way for VGG, ResNet, and ultimately today's development of generative AI.

Deep learning established itself as the dominant AI technology and ended the dominance of traditional machine learning approaches

AlexNet's ImageNet victory demonstrated for the first time the practical superiority of deep neural networks

GPU computing enabled the training of large neural networks and fundamentally changed AI research methods

Triggered massive investment in deep learning research and the industrial adoption of neural architectures

People:Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Alex Krizhevsky, Ilya Sutskever

Organizations:University of Toronto, NYU, University of Montreal

2013Papers

Word2Vec: Words as Vectors

The transformation of word representation through semantic vector spaces. On January 16, 2013, Tomas Mikolov and his Google team published the influential paper 'Efficient Estimation of Word Representations in Vector Space'. Word2Vec transformed NLP by representing words as dense, low-dimensional vectors (typically 100 to 300 dimensions) that capture semantic and syntactic relationships — a break from the large, sparse one-hot vectors used in earlier methods. The two architectural variants, CBOW (Continuous Bag of Words) and Skip-Gram, learned from large text corpora that similar words appear in similar contexts. The famous example demonstrated vector arithmetic: king - man + woman = queen. With over 49,000 citations, Mikolov's work became one of the most influential NLP papers. Word2Vec laid the foundation for all modern embedding techniques and enabled semantic reasoning in vector spaces. This innovation paved the way for transformer architectures and modern large language models.

First efficient dense, low-dimensional vector representations of words with semantic relationships

Semantic and syntactic patterns through vector arithmetic: king - man + woman = queen

Enabled analogical reasoning in vector spaces through cosine similarity and distance metrics

Laid the foundation for modern embedding techniques and transformer-based large language models

People:Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean

Organizations:Google, Google Research

2013Papers

VAE: Variational Autoencoders

The development of probabilistic generative models through latent space modeling. On December 20, 2013, Diederik Kingma and Max Welling published the paper 'Auto-Encoding Variational Bayes'. VAEs connect encoder and decoder networks through a probabilistic latent space — typically a multivariate Gaussian distribution. Unlike deterministic autoencoders, the encoder encodes data as distributions rather than single points, enabling continuous interpolation and data generation. The Reparameterization Trick makes randomness differentiable as a model input, enabling standard gradient optimization. In their experiments, VAEs generated handwritten digits (MNIST) and small face images (Frey Faces) — still blurry, but groundbreaking as a proof of concept for variational inference. This work laid the foundation for modern generative AI and shaped later probabilistic approaches through to diffusion models.

Variational inference for efficient approximation of intractable posterior distributions in continuous latent variables

Probabilistic latent space enables continuous interpolation and generation of new data points

Pioneering combination of autoencoder architecture with scalable probabilistic generative modeling through amortized variational inference

Encoder-decoder architecture with Reparameterization Trick for differentiable randomness

People:Diederik P. Kingma, Max Welling

Organizations:University of Amsterdam

2014Papers

Adam: Deep Learning's Default Optimizer

For a neural network to learn, an optimizer must turn its millions of knobs, step by step, in the right direction. In 2014, Diederik Kingma and Jimmy Ba introduced a method for this that quickly became the most widely used in the field: Adam, a name derived from the phrase Adaptive Moment Estimation (and not an acronym). Adam's trick is to keep a separate, automatically adjusted learning rate for every single parameter. To do so it combines two proven ideas — momentum, which carries along the previous direction, and adaptive step sizes in the style of RMSProp. The result: networks train robustly and without tedious fiddling with the learning rate. The paper became one of the most cited in machine learning. For an honest assessment: Adam is no miracle cure. In some cases plain SGD generalizes better to new data. Adam also builds on predecessors such as AdaGrad and RMSProp, and later variants like AdamW (2017) had to iron out weaknesses of the original.

In 2014, Diederik Kingma and Jimmy Ba introduced the Adam optimizer — the name is derived from Adaptive Moment Estimation (not an acronym).

Adam adjusts the learning rate for each parameter automatically, combining two ideas: momentum and adaptive step sizes (as in RMSProp).

Adam became the standard tool for training neural networks — robust and without tedious learning-rate tuning. The paper is among the most cited in machine learning.

Anti-hype: Adam is no miracle cure — in some cases plain SGD generalizes better. It builds on predecessors (AdaGrad, RMSProp); later variants like AdamW (2017) fixed weaknesses.

People:Diederik Kingma, Jimmy Ba

2014Datasets

MS COCO: The Computer Vision Gold Standard

In 2014, a research team including Microsoft Research, Cornell University, and UC Berkeley considerably advanced computer vision research with the COCO dataset (Common Objects in Context). Unlike ImageNet with isolated objects, COCO showed objects in their natural context — as they appear in the real world. 2.5 million annotations across 328,000 images, organized into 91 categories in the original paper, of which 80 form the detection benchmark still in use today — all everyday objects a 4-year-old could recognize. The innovation lay in the detail: pixel-precise segmentation masks instead of only bounding boxes. COCO enabled precise object localization and complex scene understanding for the first time. The dataset became the gold standard for object detection, instance segmentation, and image captioning. From YOLO to Mask R-CNN — all major computer vision models are measured against COCO. Standardized metrics such as mean Average Precision (mAP) made objective model comparisons possible. More than a decade later, COCO remains the most important benchmark in the CV community. Without COCO, there would be no modern object recognition systems in autonomous vehicles, surveillance, or augmented reality.

Objects in natural context rather than isolated — considerably shifted computer vision from artificial to real-world scenes

2.5 million pixel-precise annotations across 328k images — unprecedented annotation quality and depth

Gold standard with mAP metrics for objective model comparisons — defined computer vision evaluation

Foundation for YOLO, Mask R-CNN, and all modern CV systems — from autonomous vehicles to AR

People:Tsung-Yi Lin, Michael Maire, Serge Belongie

Organizations:Microsoft Research, Cornell University, UC Berkeley

2014Papers

GANs - Generative Adversarial Networks

Ian Goodfellow invented Generative Adversarial Networks (GANs) in 2014 during a single night in Montreal after a visit to a bar. His landmark framework pits two neural networks against each other in a minimax game: a generator creates synthetic data, while a discriminator tries to distinguish real from fake. This adversarial training fundamentally transformed generative AI. The original GAN from 2014 only produced small, blurry images (such as digits and faces), but it paved the way for the photorealistic image generation that followed. The paper, published on arXiv in 2014, became one of the most influential AI papers ever and made Goodfellow an AI celebrity. Hundreds of GAN variants followed.

Two neural networks in a minimax game: generator vs. discriminator

Invented in a single night in 2014 in Montreal after a bar visit - it worked immediately

A mathematically elegant framework for adversarial optimization

Fundamentally transformed generative AI - paving the way for the photorealistic image generation that followed

People:Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

Organizations:University of Montreal, NIPS Conference

2014Papers

Attention Mechanism: The Key to Modern LLMs

September 2014: Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio published a paper that would lastingly change the NLP world. 'Neural Machine Translation by Jointly Learning to Align and Translate' solved a fundamental problem in sequence-to-sequence models. Previous encoder-decoder architectures compressed every input sentence into a single fixed-length vector — an information bottleneck for long sentences. Bahdanau attention was a significant step forward: instead of a fixed vector, the model used dynamic attention over different parts of the input sentence. Like the human eye jumping while reading, AI attention moves between relevant words. This 'additive attention' became the conceptual precursor to modern NLP systems. The later Transformer (2017) built on the attention idea, but replaced the additive variant with the more efficient Scaled Dot-Product Attention. Without Bahdanau's attention concept, no Transformers; without Transformers, no GPT family or BERT. This breakthrough happened three years before 'Attention Is All You Need.'

Solved the encoder-decoder bottleneck: variable sentence lengths instead of fixed vector compression

Dynamic attention instead of static encoding: adaptive focus on relevant parts of the input

Learns alignment between languages: which words correspond to each other during translation?

Conceptual precursor to the Transformer: Bahdanau's attention idea paved the way for GPT, BERT, and ChatGPT

People:Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Organizations:University of Montreal, Jacobs University Bremen

2014Products

Amazon Alexa and Echo Launch

Amazon considerably changed the interaction between people and technology with the introduction of Alexa and the Echo smart speaker on November 6, 2014. The Echo launched initially by invitation only and exclusively for Prime members; it was only with the public sale in 2015 that voice AI became accessible to a broad consumer audience, transforming the home into a voice-controlled environment. Building on the Polish speech synthesis technology Ivona — acquired on January 24, 2013 — Amazon created an entirely new user experience. The Echo started as a music control device, but quickly evolved into a universal smart home hub. This innovation established a mass-market category and marked the beginning of a far-reaching development in the smart speaker market that inspired numerous competitors.

Established the mass-market smart speaker category with always-on voice readiness

Made voice AI accessible to millions of consumers through the public sale starting in 2015 — not just tech enthusiasts

Transformed living rooms into voice-controlled smart home hubs

Marked the beginning of a far-reaching market development — Google, Apple, and others followed

People:Jeff Bezos, Amazon Alexa Team

Organizations:Amazon, Ivona (acquired 2013)

2015Breakthroughs

Deep Q-Networks: AI Learns Atari from Pixels

Long before AlphaGo made headlines, in 2015 DeepMind got an AI to learn Atari video games from raw pixels alone — laying the foundation of deep reinforcement learning. In February 2015 the team led by Volodymyr Mnih presented “Human-level control through deep reinforcement learning” in Nature (an earlier precursor had appeared in 2013). A single neural network that saw only the screen and the score learned 49 different Atari games — with the same architecture, without per-game tuning. Technically, DeepMind combined a convolutional network with Q-learning, an experience-replay memory (introduced by Lin in the early 1990s) and a stabilising target network. Precision matters in the assessment: the system reached human level on about half of the games and beat all prior methods on 43 of 49 — yet on sparse-reward games such as Montezuma's Revenge it failed almost completely. Even so, it was the proof that deep networks and reinforcement learning fit together at scale — the bridge from the Q-learning of the 1990s to AlphaGo and AlphaZero.

Learning from raw pixels: the system saw only the screen and the score — no hand-crafted features, no per-game knowledge.

Convolutional network + Q-learning + an experience-replay memory (Lin, early 1990s) + a target network added in 2015 that stabilised training.

Anti-hype: human level on about half of the 49 games (43/49 better than prior methods) — near zero on sparse-reward games (Montezuma's Revenge).

The launch of deep reinforcement learning; it made DeepMind famous before AlphaGo — the bridge from Q-learning to AlphaGo and AlphaZero.

People:Volodymyr Mnih, David Silver, Demis Hassabis

Organizations:Google DeepMind

2015Papers

Batch Normalization: A Key Advance in Neural Network Training

On February 11, 2015, Sergey Ioffe and Christian Szegedy of Google published a paper that fundamentally changed the training of deep neural networks. Their diagnosis: the 'Internal Covariate Shift' — the input distribution of each layer shifts during training, making learning unstable. Their elegant solution: Batch Normalization normalizes the activations of each layer for every mini-batch. The effect was notable: roughly 14 times fewer training steps to reach the same accuracy. Higher learning rates became possible, dropout was often unnecessary, and initialization became less critical. The technique acted simultaneously as a regularizer and an accelerator. Their ImageNet ensemble achieved a 4.8% top-5 error rate, surpassing human raters (approximately 5.1%). Interesting to note: later research (Santurkar et al. 2018) showed that the actual mechanism relies less on taming covariate shift than on smoothing the loss landscape — the original explanation is thus considered revised today. With well over 60,000 citations, the paper inspired countless normalization methods: GroupNorm, LayerNorm, InstanceNorm. Today Batch Normalization is standard in many modern architectures from ResNet to modern CNNs — Transformers, however, mostly rely on the Layer Normalization it helped inspire.

Solved the Internal Covariate Shift problem by normalizing activations in every mini-batch

Roughly 14 times fewer training steps to reach the same accuracy — enabling higher learning rates and robust initialization

Dual benefit: acceleration AND regularization — often a dropout replacement in modern architectures

4.8% ImageNet top-5 error with ensemble — surpassed human raters (approximately 5.1%) and set a new standard

People:Sergey Ioffe, Christian Szegedy

Organizations:Google Inc., ICML Conference

2015Papers

YOLO: You Only Look Once

The transformation of real-time object detection through unified single-pass architecture. On June 8, 2015, Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi presented the groundbreaking paper 'You Only Look Once: Unified, Real-Time Object Detection'. YOLO broke the traditional two-stage paradigm of object detection and formulated detection as a regression problem for spatially separated bounding boxes. A single neural network predicts bounding boxes and class probabilities directly from complete images in one evaluation. With 45 fps base performance and Fast YOLO at an astounding 155 fps, the system was hundreds to thousands of times faster than existing detectors. The grid-based architecture divided images into cells, with each cell predicting objects in its center. YOLO learned generalizing object representations and significantly outperformed other methods in domain transfer.

45 fps base performance, Fast YOLO 155 fps – hundreds to thousands of times faster than existing detectors

Single-pass architecture formulates object detection as regression problem instead of two-stage paradigm

Grid-based cell division with direct bounding box and class probability prediction

Enabled real-time computer vision for autonomous vehicles, surveillance, and mobile applications

People:Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

Organizations:University of Washington, Allen Institute, Facebook AI Research

2015Breakthroughs

DeepMind AlphaGo Development

In October 2015, DeepMind achieved a historic breakthrough: AlphaGo became the first AI system to defeat a professional Go player on a full board without a handicap. AlphaGo beat European Go champion Fan Hui 5 to 0, conquering the most complex classic board game in the world — a decade earlier than experts had predicted. The match remained secret at first; the success was publicly announced only on January 27, 2016, together with the publication in the journal Nature. Go is astronomically more complex than chess — roughly a googol (10^100) times more legal positions, with more possible board configurations than atoms in the known universe. This noteworthy achievement demonstrates the power of neural networks and Monte Carlo tree search.

First computer victory against a professional Go player on a full board without a handicap (Fan Hui 5 to 0)

Novel approach using deep neural networks instead of hard-coded algorithms

Mastering 10^170 possible board configurations — more than atoms in the universe

The breakthrough came a decade earlier than predicted by AI experts

People:Demis Hassabis, David Silver, DeepMind Team

Organizations:DeepMind, Google

2015Products

Tesla Autopilot: Driver Assistance for the Mass Market

On October 14, 2015, Tesla released software version 7.0, activating Autopilot for the first time on Model S vehicles. The hardware had already been installed in the vehicles since September 2014 — a full year before the software was enabled. The system used Mobileye technology with a front-facing camera, radar, and 12 ultrasonic sensors. Drivers could now use adaptive cruise control, lane-keeping assist, and automatic parking — features that had previously been reserved for luxury vehicles. Tesla described it as Level 2 autonomy: the system assists the driver but does not replace them. Musk emphasized at the release: 'We advise drivers to keep their hands on the steering wheel.' In the very first year, the Tesla fleet logged hundreds of millions of kilometers with Autopilot active — by the end of 2016, Tesla reported approximately 222 million miles driven. The concept — pre-installing hardware and unlocking features via software update — showed the automotive industry a new way forward. From Mercedes to pure tech providers like Mobileye, numerous players were advancing their own driver assistance systems.

Software update of October 14, 2015 activated pre-installed hardware — a new concept for the automotive industry

Mobileye-based sensor suite: front-facing camera, radar, and 12 ultrasonic sensors for Level 2 driver assistance

Adaptive cruise control, lane-keeping assist, and automatic parking — previously premium-only features

Hundreds of millions of kilometers in the first year alone — demonstrating mass-market readiness for driver assistance systems

People:Elon Musk, Tesla Engineering Team

Organizations:Tesla Inc., Mobileye

2015Products

TensorFlow: Google's ML framework goes open source

The democratization of machine learning through Google's powerful internal tool. On November 9, 2015, Google open-sourced TensorFlow under Apache 2.0 license and made their second-generation ML system available to everyone. TensorFlow replaced the internal DistBelief system and offered double the speed with improved scalability and production readiness. As a universal computational flow graph processor, TensorFlow enabled not only deep learning but any differentiable computation. The flexible Python interface, auto-differentiation, and first-class optimizers revolutionized ML development. Google's strategy: community-based development accelerates AI progress for everyone. Developed with over 30 authors from the Google Brain team, TensorFlow became one of the leading ML platforms and enabled millions of developers to create advanced AI applications.

Apache 2.0 license made Google's powerful internal ML system freely available to everyone

Replaced DistBelief with double speed and improved scalability

Flexible Python interface and auto-differentiation significantly improved ML development

Enabled millions of developers access to advanced AI technology

People:Martín Abadi, Ashish Agarwal, Paul Barham, Jeff Dean

Organizations:Google, Google Brain

2015Papers

ResNet: Residual Networks Transform Deep Learning

The solution to the degradation problem of very deep networks and the birth of ultra-deep architectures. On December 10, 2015, Kaiming He's team at Microsoft Research published the paper 'Deep Residual Learning for Image Recognition' and significantly changed deep learning. Until then, training accuracy deteriorated as networks were stacked ever deeper — not primarily due to vanishing gradients, but because deep networks were simply harder to optimize. ResNet introduced residual connections — skip connections that pass inputs directly to later layers, enabling the training of ultra-deep networks. With 152 layers, ResNet was eight times deeper than VGG but less complex. The noteworthy result: a 3.57% top-5 error rate of the model ensemble on ImageNet — a triumph that dominated all categories. ResNet won ImageNet Classification, Detection, and Localization as well as COCO Detection and Segmentation in 2015. The residual learning framework reformulated layers as learning residual functions rather than unreferenced functions. This innovation enabled the training of networks with hundreds of layers.

Skip connections pass inputs directly forward, enabling the training of ultra-deep networks

152 layers — 8x deeper than VGG but less complex through the residual learning framework

3.57% top-5 error rate (ensemble) on ImageNet, won all 2015 ILSVRC and COCO categories

Established residual connections as the standard for modern deep learning architectures

People:Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Organizations:Microsoft Research

2015Milestones

OpenAI Is Founded

The organization that wanted to make AI accessible to everyone — and changed the world. On December 11, 2015, Sam Altman, Elon Musk, and other prominent tech figures announced the founding of OpenAI. With a pledged one billion dollars — a multi-year funding commitment from investors, only a fraction of which actually flowed at first — and the goal of developing safe artificial general intelligence that benefits all of humanity, OpenAI entered the stage as a nonprofit research organization. What began as an idealistic endeavor grew into the world's most influential AI lab. In 2019, a for-profit subsidiary was established. With GPT-3 and ChatGPT, OpenAI redefined what AI can do.

Founded on December 11, 2015 in San Francisco

Mission: Develop safe artificial general intelligence that benefits all of humanity

Pledged: $1 billion from Elon Musk, Peter Thiel, Reid Hoffman, and others — a multi-year funding commitment, not immediately available

GPT-1 (2018) and GPT-2 (2019) were created during the purely nonprofit phase; the capped-profit structure followed in 2019, under which GPT-3 (2020) and ChatGPT (2022) were developed

People:Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba, John Schulman

Organizations:OpenAI, Y Combinator

2016Competitions

AlphaGo defeats Lee Sedol

The historic moment when AI first defeated a world champion in the most complex board game. From March 9 to 15, 2016, the DeepMind Challenge Match took place in Seoul – five games between Lee Sedol, one of the world's best Go players, and AlphaGo. The result astonished the world: 4:1 for the machine. Particularly the famous 'Move 37' in game two demonstrated machine creativity – a move with a 1:10,000 probability that overturned centuries of Go wisdom. AlphaGo combined deep learning with Monte Carlo tree search and trained both with human games and through self-play. Lee Sedol's response in game four with his 'divine Move 78' showed, however, that human intuition can still surprise. Over 200 million people worldwide followed these matches.

AlphaGo defeated Lee Sedol 4:1 and demonstrated AI superiority in the most complex board game for the first time

The famous 'Move 37' with 1:10,000 probability showed machine creativity and challenged Go traditions

Combination of deep learning and Monte Carlo tree search enabled mastering Go's complexity

Over 200 million people followed the matches – a turning point for public AI perception

People:Lee Sedol, Demis Hassabis, David Silver, Aja Huang

Organizations:DeepMind, Google, Korean Baduk Association

2016Papers

XGBoost: Extreme gradient boosting dominates ML

The perfection of gradient boosting and the conquest of structured data problems. On March 9, 2016, Tianqi Chen and Carlos Guestrin published on arXiv the paper XGBoost: A Scalable Tree Boosting System, presented in August 2016 at the KDD conference. Developed from Chen's PhD project at the University of Washington, XGBoost significantly improved traditional gradient boosting through extreme optimizations: L1 and L2 regularization prevented overfitting, second-order gradients provided more precise direction information, and parallelization significantly accelerated tree construction. XGBoost dominated machine learning competitions of the 2010s and became the standard choice for winning teams on Kaggle. At the Higgs Boson ML Challenge, Tianqi Chen won a special prize and XGBoost was adopted by many top participants, establishing its dominance for structured data. The scalable end-to-end tree boosting system supports C++, Java, Python, R, and other languages. XGBoost proved the continued relevance of traditional ML methods parallel to the deep learning revolution.

Extreme optimization of gradient boosting with L1/L2 regularization and second-order gradients

Dominated ML competitions of the 2010s and became standard choice for Kaggle winner teams

Parallelized tree construction and scalable end-to-end architecture for large datasets

Go-to algorithm for structured data parallel to the deep learning revolution

People:Tianqi Chen, Carlos Guestrin

Organizations:University of Washington

2016Products

Google Assistant: AI-First Strategy Becomes Reality

On May 18, 2016, Sundar Pichai introduced Google Assistant at Google I/O - Google's answer to Siri and Alexa. After years of falling behind in the voice assistant space, Google came back with full force. The Assistant was more than an upgrade to Google Now - it was the foundation of Pichai's 'AI-First' strategy. 'We want users to have an ongoing dialogue with Google,' Pichai explained. 'We're building an individual Google for every user.' The Assistant was meant to become an 'ambient experience' spanning all devices - from smartphones to Google Home to cars. Unlike command-based rivals, Google focused on natural conversation and contextual understanding. Initially the Assistant was just announced; its first home came a few months later in the messaging app Allo, followed by the Google Home speaker at the end of 2016. The launch marked Google's serious entry into voice AI development and laid the foundation for the company's AI dominance today.

Natural conversation instead of commands - 'ongoing dialogue' as the goal for voice AI

Foundation of Pichai's AI-First strategy - 'an individual Google' for every user

Ambient experience vision - seamless AI interaction across all devices and platforms

Google's comeback against Siri and Alexa - from latecomer to serious contender in voice AI

People:Sundar Pichai, Google Assistant Team

Organizations:Google Inc., Google I/O Conference

2016Organizations

Partnership on AI: Tech Giants Unite

A significant alliance of leading tech companies for responsible AI development. On September 28, 2016, Amazon, Facebook, Google, DeepMind, IBM, and Microsoft founded the 'Partnership on Artificial Intelligence to Benefit People and Society' — an unusual coalition of former competitors. With Eric Horvitz (Microsoft Research) and Mustafa Suleyman (DeepMind) as interim co-chairs, the Partnership launched with an entirely corporate board and announced plans to expand it into a parity body with an equal number of non-corporate members. The mission encompasses research and best practices on ethics, fairness, transparency, privacy, and human-AI collaboration. Notably, Apple was initially absent but joined in 2017. The Partnership deliberately avoids lobbying activities and focuses on research collaboration. This initiative marked the beginning of structured industry self-regulation in AI development.

Significant alliance of Amazon, Facebook, Google, DeepMind, IBM, and Microsoft for AI ethics

Mission: AI for the benefit of people and society through ethics, fairness, and transparency

Planned parity board: initially all-corporate, later to be expanded with an equal number of non-corporate members

Focus on research collaboration and best practices without lobbying activities

People:Mustafa Suleyman, Eric Horvitz, Partnership Team

Organizations:Amazon, Apple, Facebook, Google, IBM, Microsoft

2016Breakthroughs

Speech Recognition Reaches Human-Level Performance

On October 18, 2016, Microsoft achieved a historic milestone: its speech recognition system became the first to reach human-level performance on the Switchboard benchmark for conversational speech. After 25 years of research, the goal had been achieved — a 5.9% word error rate, matching professional transcriptionists on this task. (In 2017, Microsoft revised the human baseline to 5.1% and had to close the gap again.) Xuedong Huang, Microsoft's Chief Speech Scientist, announced: 'We have achieved human parity. This is a historic achievement.' The system used the latest deep learning technology: convolutional neural networks, LSTM architectures, and neural language models with continuous word vectors. Its strength lay in the systematic combination of proven building blocks — an ensemble of CNN and BLSTM acoustic models, i-vector speaker adaptation, and rescoring via language model. This was made possible by the convergence of three developments: large datasets (Switchboard Corpus), GPU computing, and improved training methods. This achievement paved the way for modern voice assistants — though it demonstrates parity only on a narrowly defined transcription task, not general human cognitive abilities.

5.9% word error rate reaches human level on Switchboard: as accurate as professional transcriptionists

Historic milestone: lowest error rate ever measured on the Switchboard standard

CNN + LSTM + neural language models: systematic combination of state-of-the-art deep learning technology

25-year research goal achieved: human parity on a narrowly defined transcription task

People:Xuedong Huang, Microsoft AI Research Team

Organizations:Microsoft AI and Research, Switchboard Corpus

2017Regulation

Asilomar Principles: The Field Sets Its Own Guardrails

In early 2017, long before ChatGPT, leading AI researchers met at Asilomar on the California coast — the very place where biologists had debated the risks of genetic engineering back in 1975. The Future of Life Institute had convened the conference on beneficial AI. The result was the 23 Asilomar AI Principles: guidelines on research, on values such as safety and transparency, and on long-term risks. More than a thousand AI experts and prominent signatories such as Stephen Hawking and Elon Musk stood behind them. It was one of the first broad attempts by the field to give itself guardrails — years before governments discovered the topic. For an honest assessment: the principles were voluntary and non-binding. They shaped the debate but carried no legal force.

January 2017: the Future of Life Institute gathered leading AI researchers at Asilomar (California) — at the site of the historic 1975 genetic-engineering conference.

Result: the 23 Asilomar AI Principles on research, values (safety, transparency) and long-term risks — one of the field's first broad self-commitments.

Over a thousand AI researchers and other signatories (incl. Stephen Hawking, Elon Musk) — an early consensus that AI should serve the common good.

Anti-hype: the principles were voluntary and non-binding — pioneering as a framework for discussion, but with no enforcement.

People:Stephen Hawking, Elon Musk

Organizations:Future of Life Institute

2017Papers

MobileNet - AI for Smartphones

In April 2017, Google Research considerably advanced mobile AI with MobileNet, one of the early deep learning models designed specifically for smartphones, IoT devices, and embedded systems (predecessors such as SqueezeNet already existed). Through its innovative depthwise separable convolution architecture, MobileNet reduces computational cost to approximately one-eighth of conventional convolutions at the same level of effectiveness. This notable efficiency — around nine times fewer compute operations for 3x3 kernels — paves the way for real-time image processing on mobile devices. MobileNet democratizes computer vision for billions of smartphones and establishes edge computing as a new AI paradigm beyond cloud-based solutions.

One of the early deep learning models, designed specifically for smartphones and IoT devices

Depthwise separable convolutions: around nine times less computational cost at the same effectiveness

Enables AI processing directly on devices instead of in the cloud — edge computing

Reduces computational cost to approximately one-eighth of conventional convolutions at comparable accuracy

People:Andrew Howard, Menglong Zhu, Bo Chen, Google Research Team

Organizations:Google, Google Research

2017Papers

RLHF research paper published

The technique that made ChatGPT possible – years before the breakthrough. In June 2017, researchers from OpenAI and DeepMind published the paper 'Deep Reinforcement Learning from Human Preferences'. The idea: Instead of training AI systems with perfectly defined reward functions, they learn directly from human feedback. Humans rate different AI outputs, and the system learns which behavior is preferred. This method, later known as RLHF (Reinforcement Learning from Human Feedback), became the key technology behind ChatGPT and other modern language models. RLHF made it possible to make AI systems more helpful, honest, and safe.

Paper 'Deep Reinforcement Learning from Human Preferences' published in June 2017

Core idea: AI learns from human preferences instead of predefined rewards

Joint research by OpenAI and DeepMind, including Paul Christiano and Dario Amodei

RLHF became the key technology for ChatGPT and modern AI assistants

People:Paul Christiano, Jan Leike, Dario Amodei, Tom Brown

Organizations:OpenAI, DeepMind

2017Papers

Transformer: 'Attention Is All You Need'

On June 12, 2017, eight researchers — mostly at Google, including an intern from the University of Toronto — published the paper 'Attention Is All You Need' on arXiv, laying the foundation for modern large language models. Ashish Vaswani, Noam Shazeer, and colleagues proposed a new architecture: the Transformer. Unlike previous sequence models, the Transformer dispenses with recurrent and convolutional layers. Instead, it relies purely on attention mechanisms. Self-attention captures relationships between all positions in a sequence in parallel — no sequential processing required. Multi-head attention uses multiple parallel attention heads that learn different aspects of word relationships. On WMT 2014, the model achieved 28.4 BLEU for English-German and 41.8 BLEU for English-French — new state-of-the-art results. The architecture proved far-reaching: GPT, BERT, ChatGPT, and many other models are built on Transformer variants. With well over 100,000 citations — a number that continues to rise — the paper ranks among the most cited of the 21st century.

Self-attention mechanism captures dependencies between all sequence positions simultaneously

Eliminating recurrence enables parallel processing — significantly faster than sequential models

28.4 BLEU WMT English-German, 41.8 BLEU English-French — new translation benchmarks

Became the foundation of all modern LLMs: GPT, BERT, and ChatGPT are all built on the Transformer architecture

People:Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin

Organizations:Google Brain, Google Research

2017Regulation

China's AI Master Plan: The Race for Global Leadership

On July 20, 2017, China's State Council announced the 'New Generation Artificial Intelligence Development Plan' — the first comprehensive national AI strategy of this scale. The goal: to become the world's leading AI power by 2030. The three-step plan was crystal clear: globally competitive by 2020, world-leading in selected areas with major breakthroughs in foundational AI theory by 2025, and then the leading AI superpower with 1 trillion yuan in industry output by 2030. China explicitly identified AI as a 'focus of international competition' and a 'strategic technology for national security.' The investments are substantial — dozens of billions of dollars flowing into research, infrastructure, and talent development. The plan covers military and civilian applications: from autonomous weapons to smart cities. Open-source principles are meant to foster international collaboration, while China simultaneously pursues technological independence. This strategy substantially changed the global AI landscape and triggered a wave of national AI initiatives in the United States and Europe.

First national AI strategy of this scale: coordinated government planning for global technology leadership

Three-step timeline: competitive by 2020, world-leading in selected areas by 2025, leading AI superpower by 2030

Trillion-yuan investment: massive state funding in AI research, infrastructure, and talent

Global leadership ambition: the starting shot for the worldwide AI race between China, the US, and Europe

People:State Council of China, Chinese AI Research Community

Organizations:State Council of China, Chinese Academy of Sciences

2017Regulation

Montreal Declaration for Responsible AI

The first international initiative to develop ethical AI principles through democratic citizen participation. On November 3, 2017, the Universite de Montreal launched the co-design process for the Montreal Declaration for Responsible AI Development. The Forum on the Socially Responsible Development of AI brought together over 400 participants from various sectors and disciplines. In 15 deliberation workshops over three months, more than 500 citizens, experts, and stakeholders discussed the societal challenges of AI. The declaration, published on December 4, 2018, presents 10 principles and 59 recommendations based on values such as well-being, autonomy, justice, privacy, and democracy. With over 500 signatories, the Montreal Declaration established a participatory approach to AI governance and influenced subsequent international efforts toward responsible AI development.

10 ethical principles and 59 recommendations for responsible AI development with democratic legitimacy

Focus on well-being, autonomy, justice, privacy, democracy, and ecological sustainability

Initiated by the Universite de Montreal with over 400 participants from various sectors

Over 500 signatories; influenced international AI governance and subsequent regulatory initiatives

People:Yoshua Bengio, Montreal AI Ethics Team

Organizations:Université de Montréal, Montreal Institute for Learning Algorithms

2017Breakthroughs

AlphaZero Masters Three Games

The birth of a universal game AI through pure self-learning. In December 2017, DeepMind presented AlphaZero — a system that mastered three completely different strategy games without any prior knowledge: chess, shogi, and Go. The tabula rasa approach meant: no opening databases, no human strategies, only the rules of the game as a starting point. Within 24 hours, AlphaZero achieved superhuman performance — in chess after just 4 hours, in shogi after 2 hours. In the 100-game match against Stockfish, it won 28 games, lost none, and drew 72. Its distinctiveness lay in efficient search behavior: while Stockfish evaluates 60 million positions per second, AlphaZero analyzes only 60,000 — but far more purposefully through its deep neural network. This performance impressively demonstrated the generalizability and domain-independence of pure reinforcement learning.

Learned three complex games entirely from scratch — with only the rules of the game, without human prior knowledge or databases

Achieved superhuman performance in chess (4h), shogi (2h), and Go (~8h) through pure self-play

Learned through millions of self-play games and reinforcement learning without external inputs

Evaluated only 60,000 positions per second vs. Stockfish's 60 million — but far more purposefully

People:David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou

Organizations:DeepMind, Google, Science Magazine, ArXiv

2018Milestones

Turing Award for Deep Learning

In 2019, AI received computing's highest honour: the 2018 A.M. Turing Award — often called the Nobel Prize of computing — went to Yoshua Bengio, Geoffrey Hinton and Yann LeCun, the three godfathers of deep learning. The ACM honoured their conceptual and engineering breakthroughs that made deep neural networks a critical component of computing — from backpropagation through convolutional networks to the ideas that carried the 2012 breakthrough. The award was the late, official accolade for a revolution that had been dismissed for decades. For an honest assessment: deep learning has many parents — researchers such as Jürgen Schmidhuber publicly criticised that important contributions were under-credited. The prize honours the trio's central role, not sole authorship.

Yoshua Bengio, Geoffrey Hinton and Yann LeCun — the three godfathers of deep learning — for the breakthroughs behind modern neural networks.

The A.M. Turing Award (announced March 2019) is computing's highest honour; it recognised deep neural networks as a critical component of computing.

The official accolade for the 2012 deep-learning revolution — and a forerunner of the 2024 Physics Nobel for the same line of research.

Anti-hype: deep learning has many contributors (e.g. Schmidhuber, who publicly objected); the prize honours the trio's central role, not sole authorship.

People:Yoshua Bengio, Geoffrey Hinton, Yann LeCun

Organizations:ACM

2018Papers

GPT-1: The Birth of Generative Pre-Training

The foundation of all modern large language models through unsupervised pre-training. On June 11, 2018, Alec Radford and his OpenAI team published the landmark paper 'Improving Language Understanding by Generative Pre-Training.' This work combined the Transformer architecture with unsupervised pre-training for the first time and established a two-stage paradigm: first, generative training on large text corpora, then fine-tuning for specific tasks. With 117 million parameters and training on the BooksCorpus dataset of over 7,000 unpublished books across various genres, GPT-1 demonstrated that transfer learning works for language understanding. The twelve-layer, decoder-only Transformer architecture with masked self-attention set the template for the entire GPT series. This innovation turned the 2017 Transformer architecture into a practical tool for a wide range of NLP tasks and launched the era of large language models.

Established unsupervised pre-training on large text corpora as the foundation for language models

Demonstrated the successful application of transfer learning across a wide range of NLP tasks

Twelve-layer, decoder-only Transformer architecture became the template for the entire GPT series

Launched the era of large language models and established the pre-training/fine-tuning paradigm

People:Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever

Organizations:OpenAI

2018Papers

BERT Significantly Improves Language Understanding

A key advance in bidirectional language models and the birth of modern NLP. In October 2018, Jacob Devlin and his team at Google Research published the paper on BERT — Bidirectional Encoder Representations from Transformers. This innovation substantially changed language processing by training deep bidirectional representations from unlabeled text for the first time. Unlike earlier models, BERT considers both left and right context simultaneously across all layers. The results were noteworthy: BERT achieved new state-of-the-art results on eleven NLP tasks and improved the GLUE score by a notable 7.7 percentage points to 80.5%. The pre-training itself took several days on many TPUs — but the open-source release democratized cutting-edge technology: the ready-trained model could be fine-tuned for a custom task on a single cloud TPU in about 30 minutes. BERT established the pre-training/fine-tuning paradigm that today forms the foundation of all large language models.

First deep bidirectional language model to consider left and right context simultaneously across all layers

Achieved new state-of-the-art results on 11 NLP tasks and improved the GLUE score by 7.7 percentage points to 80.5%

Open-source release enabled fine-tuning of the pre-trained model for custom tasks in about 30 minutes on a single cloud TPU

Established the pre-training/fine-tuning paradigm for all modern language models

People:Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Organizations:Google Research, Google AI Language

2019Papers

GPT-2 - "Too Dangerous to Release"

OpenAI releases GPT-2 in February 2019 but makes the surprising decision to withhold the full 1.5-billion-parameter model - claiming it's "too dangerous" for complete release. This unprecedented decision splits the AI community: supporters praise the responsible stance given misuse risks like fake news and automated spam. Critics accuse OpenAI of "closing off" research and fueling unfounded fears. After nine months without strong evidence of misuse, OpenAI releases the complete model, marking a turning point in the debate about responsible AI development.

Unprecedented decision: OpenAI withholds complete 1.5B-parameter model

Fears of fake news, identity impersonation, and automated social media spam

AI community split: ethics progress vs. accusation of research closure

Full release after 9 months due to lack of misuse evidence

People:Alec Radford, Jeffrey Wu, Rewon Child, David Luan

Organizations:OpenAI

2019Competitions

AlphaStar Reaches Grandmaster Level

The conquest of the most complex real-time strategy by artificial intelligence. In July and August 2019, DeepMind's AlphaStar competed anonymously in ranked mode on Battle.net; on October 30, 2019, DeepMind reported in the journal Nature that the system had become the first AI to reach Grandmaster level in StarCraft II — a game considered too complex for machines. AlphaStar ranked above 99.8% of all active Battle.net players and mastered all three races: Protoss, Terran, and Zerg. Prior to this, AlphaStar had already defeated professional players Grzegorz 'MaNa' Komincz and Dario 'TLO' Wunsch 5:0 each. Its distinctiveness lay in the multi-agent reinforcement learning architecture, which trained various strategies and counter-strategies within a league. With an average of 280 actions per minute, AlphaStar was even below human professionals, but demonstrated more precise execution. This achievement marked a milestone for AI in video games and real-time decision-making.

AlphaStar reached Grandmaster level in all three StarCraft II races and ranked above 99.8% of all Battle.net players

Defeated professional players MaNa and TLO 5:0 each before the public success

Multi-agent reinforcement learning with league-based training of various strategies and counter-strategies

The first AI to master a popular esports game at the highest level without restrictions

People:Oriol Vinyals, Igor Babuschkin, Wojciech Czarnecki, Grzegorz Komincz, Dario Wünsch

Organizations:DeepMind, Team Liquid, Blizzard Entertainment, Battle.net

2019Papers

T5 - Text-to-Text Transfer Transformer

Google AI significantly transforms NLP in October 2019 with T5, the Text-to-Text Transfer Transformer, which transforms all natural language processing tasks into a unified "text-to-text" format. With the innovative "Everything is Text" approach, translation, summarization, question answering, and classification can be handled with the same model, loss function, and hyperparameters. T5 introduces the comprehensive C4 dataset and achieves near-human performance on SuperGLUE benchmarks. As a foundation model with up to 11 billion parameters, T5 paves the way for modern large language models and establishes the unified text-to-text paradigm as standard.

Innovative unified approach: All NLP tasks as text-to-text problems

"Everything is Text" - paradigm unifies translation, summarization, Q&A

Establishes foundation model paradigm for modern large language models

Introduces comprehensive C4 dataset - Colossal Clean Crawled Corpus

People:Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee

Organizations:Google AI, Google Research

2020Papers

RAG: When Language Models Look Things Up First

A language model only knows what was in its training data — and when in doubt, it confidently makes something up. In 2020, Patrick Lewis and colleagues at Facebook AI showed a way out: Retrieval-Augmented Generation, or RAG. The idea is compellingly simple. Before the model answers, it searches an external knowledge source — Wikipedia, say — for relevant passages and then bases its answer on what it found. This lets knowledge be updated without retraining the model, and makes the answer verifiable. After the success of ChatGPT, RAG became the standard method for tying language models to current, checkable sources — the foundation of almost every application that lets you chat with your own documents. For an honest assessment: RAG reduces hallucinations but does not eliminate them. If what it retrieves is wrong, or the model misreads it, it still errs. It provides evidence, not genuine understanding — and it builds on earlier retrieval research.

In 2020, Patrick Lewis and colleagues at Facebook AI introduced Retrieval-Augmented Generation (RAG).

Instead of answering only from memory, the language model first searches for relevant documents (e.g. from Wikipedia) and bases its answer on them.

After ChatGPT, RAG became the standard method for tying language models to current, checkable sources — the basis of almost every chat-with-your-documents application.

Anti-hype: RAG reduces hallucinations but does not remove them — if the retrieval is wrong or misread, the model still errs. It provides evidence, not genuine understanding, and builds on earlier retrieval research (e.g. DPR, REALM).

People:Patrick Lewis

Organizations:Facebook AI Research, University College London, New York University

2020Papers

Neural Scaling Laws

In January 2020, Jared Kaplan, Sam McCandlish, Tom Brown, and Dario Amodei discovered the fundamental mathematical laws of neural scaling, considerably advancing the development of large language models. The landmark work from OpenAI and Johns Hopkins University shows that performance follows power laws with model size, dataset scale, and compute — with trends spanning seven orders of magnitude. The elegant equations enabled systematic predictions of resource allocation for the first time and established the 'bigger is better' paradigm. These mathematical foundations led directly to GPT-3's success and transformed AI development from experimental trial and error into scientifically grounded, predictable scaling. The specific allocation rule from Kaplan — scale model size aggressively, data volume only modestly — was corrected in 2022 by DeepMind's Chinchilla paper: compute-optimal training requires considerably more training data than initially recommended.

Discovery of fundamental power laws spanning seven orders of magnitude

Elegant equations enable systematic predictions of resource allocation; refined in 2022 by Chinchilla

Established the 'bigger is better' paradigm for systematic LLM development

Transforms AI development from trial and error to a scientific methodology

People:Jared Kaplan, Sam McCandlish, Tom Brown, Dario Amodei

Organizations:OpenAI, Johns Hopkins University

2020Papers

GPT-3: The 175-Billion-Parameter Model

The breakthrough to few-shot learning and emergent AI capabilities. On May 28, 2020, OpenAI's team led by Tom Brown presented the significant paper 'Language Models are Few-Shot Learners' - GPT-3 with 175 billion parameters, more than 100 times larger than GPT-2. The scale revealed emergent capabilities: the model could solve new tasks with just a few examples, without any fine-tuning. From translations to word puzzles to three-digit arithmetic, GPT-3 demonstrated impressive versatility. Human evaluators could barely distinguish news articles generated by GPT-3 from real ones. Through in-context learning alone, GPT-3 approached the state of the art on individual SuperGLUE subtasks - though on the overall benchmark it remained well behind the fine-tuned top models (around 89 points) with a score of roughly 71.8. Thirty-one OpenAI researchers (Tom Brown and 30 co-authors) demonstrated that massive parameter scaling can produce qualitatively new capabilities. GPT-3 laid the foundation for ChatGPT and the modern LLM era.

175 billion parameters - more than 100 times larger than GPT-2, with notable scaling effects

Emergent few-shot capabilities without fine-tuning: new tasks solvable with just a few examples

Showed emergent capabilities: translation, arithmetic, and text generation at human level

Laid the foundation for ChatGPT and commercialized large language models through API access

People:Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah

Organizations:OpenAI

2020Papers

DDPM: Diffusion models established

The mathematical foundation of modern image generation through denoising processes. In June 2020, Jonathan Ho, Ajay Jain, and Pieter Abbeel published the influential paper 'Denoising Diffusion Probabilistic Models' – a class of latent variable models inspired by non-equilibrium thermodynamics. Their innovation lay in a weighted variational bound and the connection between diffusion models and denoising score matching with Langevin dynamics. The results were impressive: FID score of 3.17 on CIFAR-10 and Inception score of 9.46. DDPMs established a progressive lossy decompression approach that can be interpreted as a generalization of autoregressive decoding. This work laid the mathematical foundation for Stable Diffusion and the entire modern text-to-image generation.

New class of generative models based on non-equilibrium thermodynamics and denoising processes

Progressive lossy decompression approach as generalization of autoregressive decoding

Laid mathematical foundation for Stable Diffusion and modern text-to-image generation

FID score 3.17 on CIFAR-10 demonstrated image quality rivaling GANs and established diffusion as standard

People:Jonathan Ho, Ajay Jain, Pieter Abbeel

Organizations:UC Berkeley

2020Papers

Vision Transformer: 'An Image is Worth 16x16 Words'

Transformer architecture in computer vision. On October 22, 2020, Alexey Dosovitskiy's team at Google Research published the paper 'An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale'. The Vision Transformer (ViT) showed that CNNs are not necessary — pure transformers can be applied directly to sequences of image patches. The key finding (the 'at Scale' part): only after large-scale pre-training on massive datasets (ImageNet-21k or JFT-300M) does ViT achieve comparable or better results than state-of-the-art CNNs; on medium-sized datasets without this pre-training, ViT performs worse by comparison. The system splits images into patches — typically 16x16 pixels, though other sizes are used depending on the variant — treats them as token sequences, and applies standard transformer architecture. The universality of the transformer architecture became clear: the same technology that transformed NLP also works in computer vision. ViT inspired a new generation of attention-based vision models and demonstrated the power of unified architectures.

First scalable, patch-based application of pure transformer architecture to computer vision without CNN components

Image patches (typically 16x16 pixels) treated as token sequences, transforming the image-to-sequence pipeline

Self-attention for image processing proved the universality of transformer architecture

Achieved state-of-the-art CNN performance after large-scale pre-training and inspired attention-based vision models

People:Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov

Organizations:Google Research, Google Brain

2020Breakthroughs

AlphaFold Achievement

The solution to a 50-year-old biological puzzle through artificial intelligence. In November 2020, DeepMind's AlphaFold 2 dominated the CASP14 competition with accuracy that scientists described as 'astounding' and 'transformational'. The system achieved a GDT score of 92.4 out of 100 points in protein structure prediction – a precision that matches experimental methods like X-ray crystallography. AlphaFold clearly beat 145 other teams and solved a problem that had occupied biology since the 1970s. The attention-based neural network architecture can predict how proteins fold within days – a process fundamental to understanding life. For this achievement, Demis Hassabis and John Jumper received the 2024 Nobel Prize in Chemistry.

AlphaFold 2 dominated CASP14 with a 92.4 GDT score, clearly beating 145 other teams

Solved the 50-year-old protein folding problem and fundamentally changed structural biology

Attention-based architecture achieved experimental accuracy in protein structure prediction

Demis Hassabis and John Jumper received the 2024 Nobel Prize in Chemistry for this achievement

People:Demis Hassabis, John Jumper

Organizations:DeepMind, Google, CASP, University of Washington

2021Breakthroughs

CLIP: The Bridge Between Image and Language

On the very day OpenAI unveiled DALL-E — 5 January 2021 — perhaps the more consequential model arrived: CLIP. It did not learn to generate images, but to understand image and language in the same space. From around 400 million image-text pairs scraped from the web, the team led by Alec Radford trained two encoders against each other (contrastively) until matching images and captions landed at the same point in a shared vector space. The effect was striking: CLIP could classify images zero-shot — you simply described the categories in words, with no training on the task. It reached 76.2% on ImageNet, on par with a ResNet-50 trained on 1.28 million labelled examples — none of which CLIP had ever seen. Crucially for the big picture: CLIP became the foundation of the text-to-image wave — DALL-E 2 builds on CLIP embeddings, and Stable Diffusion uses CLIP's text encoder directly. The caveat: contrastive image-text models were not new (ConVIRT came months earlier) — CLIP's contribution was the scale, the breadth of zero-shot transfer, and the open weights that set off an entire ecosystem.

Contrastive training: two encoders (image + text) learn from about 400M web pairs to place matching images and texts at the same point in one vector space.

Zero-shot: categories are described in words, no task-specific training — 76.2% on ImageNet, on par with a ResNet-50 that needed 1.28M labelled images.

Foundation of the text-to-image wave: DALL-E 2 builds on CLIP embeddings; Stable Diffusion uses CLIP's text encoder directly.

Anti-hype: contrastive image-text models already existed (ConVIRT, Oct 2020). CLIP's contribution: scale, zero-shot breadth, open weights — but it also inherited the biases of web data.

People:Alec Radford, Jong Wook Kim, Ilya Sutskever

Organizations:OpenAI

2021Products

DALL-E Creates Images from Text

A landmark breakthrough in text-to-image generation and a meaningful advance in AI creativity. On January 5, 2021, OpenAI unveiled DALL-E — a system that generates coherent and often remarkably creative images from text descriptions. Text-to-image models had existed before (such as alignDRAW in 2015 or GAN-based approaches like StackGAN and AttnGAN), but DALL-E raised coherence and versatility to a new level. Based on a 12-billion-parameter version of GPT-3, DALL-E demonstrated that the boundary between language and visual understanding can be crossed. The system trained on 250 million image-text pairs from the internet and developed noteworthy capabilities in the process: it can anthropomorphize animals, plausibly combine unrelated concepts, and even render text within images. Mark Riedl of Georgia Tech commented that the results were 'remarkably more coherent' than any previous text-to-image system. DALL-E successfully extended GPT's language understanding into the visual domain and opened an entirely new dimension of AI creativity.

Raised text-to-image generation to a new level — coherent, creative images from natural-language descriptions (predecessors like alignDRAW or StackGAN already existed)

Developed noteworthy creative capabilities: anthropomorphization, concept combination, text rendering

12-billion-parameter version of GPT-3, trained on 250 million image-text pairs from the internet

Opened a new dimension of AI creativity and inspired the generative AI movement

People:Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray

Organizations:OpenAI, DALL-E Team

2021Milestones

Anthropic Is Founded

When former OpenAI executives wanted to realize their own vision of safe AI. In January 2021, Dario and Daniela Amodei, along with five other former OpenAI researchers — including Tom Brown, Jared Kaplan, and Chris Olah — founded Anthropic; seven co-founders in total. The siblings had previously held key positions at OpenAI — Dario as VP of Research. Their new company was to focus on AI safety and the development of reliable, interpretable systems. With Constitutional AI, Anthropic developed an innovative approach to training AI systems through principles rather than human feedback alone. Claude, their AI assistant, became one of the leading competitors to ChatGPT.

Founded in January 2021 in San Francisco

Dario Amodei (CEO, former VP Research at OpenAI) and Daniela Amodei (President) — part of a seven-person founding team

Focus on AI safety, interpretability, and Constitutional AI

Developed Claude, one of the leading AI assistants

People:Dario Amodei, Daniela Amodei, Tom Brown, Jared Kaplan, Sam McCandlish, Jack Clark, Chris Olah

Organizations:Anthropic, OpenAI

2021Products

GitHub Copilot: The AI pair programmer

The democratization of AI-assisted software development for millions of developers. On June 29, 2021, GitHub announced the technical preview of Copilot – the first AI pair programmer, powered by OpenAI Codex. Based on a GPT-3 variant trained with billions of lines of public code from GitHub repositories, Copilot could generate code completions and entire functions from comments. The underlying Codex model achieved a 28.8% success rate on first attempt in the HumanEval benchmark – significantly better than GPT-3's 0%. Particularly impressive: With 100 sampling attempts, the success rate increased to 70.2%. Copilot worked especially well with Python, JavaScript, TypeScript, Ruby, and Go. The limited technical preview generated enormous interest and established AI-assisted programming as a viable tool. Copilot fundamentally changed the developer experience and paved the way for a new generation of AI-powered coding tools.

Technical preview on June 29, 2021 with limited access via waitlist for selected developers

28.8% success rate on first attempt (HumanEval), 70.2% with 100 sampling attempts

Established AI-assisted programming as viable tool and inspired new coding tools

People:Nat Friedman, GitHub Team, OpenAI Team

Organizations:GitHub, OpenAI, Microsoft

2021Products

OpenAI Codex: AI Programs for People

On August 10, 2021, OpenAI released Codex via an API and significantly changed software development — a large-scale AI for code generation. Based on GPT-3, but trained on 159 gigabytes of Python code from 54 million GitHub repositories, Codex transformed natural language into functional code. 'Create a function for prime numbers' became real Python code in seconds. Earlier, on June 29, 2021, the partnership with GitHub had already produced the Technical Preview of Copilot — an AI programming assistant already running on an early version of Codex. Codex supported over a dozen programming languages: Python, JavaScript, Go, Ruby, Swift, and more. In the HumanEval benchmark, the fine-tuned Codex-S solved around 37% of tasks on the first attempt (pass@1) — the base model achieved just under 29%; noteworthy, but not a measure for arbitrary requests. GitHub Copilot proved to be a meaningful productivity gain for developers. Codex demonstrated that AI can support creative, complex cognitive work. From code generation to code understanding, Codex opened the door to AI-assisted software development.

Natural language to code: 'Write a sorting function' becomes functional Python/JavaScript

GitHub Copilot (Technical Preview from June 29, 2021): prominent AI programming assistant, trained on 54 million code repositories

12+ programming languages: from Python to Swift — AI understands developer intent in natural language

Meaningful productivity gains: Codex demonstrated AI's potential for creative cognitive work

People:OpenAI Team, GitHub Development Team

Organizations:OpenAI, GitHub, Microsoft

2022Papers

InstructGPT: The Bridge to ChatGPT

Between the method and the global hit lay a crucial intermediate step — and it was called InstructGPT. In early 2022, in the paper Training language models to follow instructions with human feedback, OpenAI showed how to get GPT-3 to actually do what users want: through reinforcement learning from human feedback (RLHF). The striking result: an InstructGPT with only 1.3 billion parameters was preferred by people over the answers of the hundred-times-larger GPT-3 (175 billion). It was not raw size but alignment with intent that made the difference. InstructGPT was the direct technical bridge between the RLHF idea (2017) and ChatGPT, which popularised the same method at the end of 2022. For an honest assessment: InstructGPT did not invent RLHF — a 2017 paper did — but it showed for the first time at scale how much alignment makes a language model more useful.

OpenAI applied RLHF (reinforcement learning from human feedback) to GPT-3 so it follows instructions and matches users' intent.

Striking: a 1.3B-parameter InstructGPT was preferred over the 100x larger GPT-3 (175B) — alignment beats raw size.

The direct bridge between the RLHF idea (2017) and ChatGPT (late 2022) — it explains why ChatGPT worked so well.

Anti-hype: InstructGPT did not invent RLHF (a 2017 paper did); it first showed at scale how strongly alignment makes a model more useful.

People:Long Ouyang

Organizations:OpenAI

2022Papers

Chinchilla: Scaling Rethought

In 2022, DeepMind asked an uncomfortable question: are we actually building our AI models the wrong way? In the paper Training Compute-Optimal Large Language Models, the team led by Jordan Hoffmann showed that the largest language models of the time — GPT-3, Gopher — had many parameters but too little training data. Their correction, now called the Chinchilla scaling laws: for a given compute budget, model size and amount of data should grow roughly in step. As proof, they trained Chinchilla with 70 billion parameters on 1.4 trillion tokens — and with it beat the four-times-larger Gopher (280 billion). This shifted how practically every later frontier model is trained. For an honest assessment: Chinchilla did not invent scaling laws, but corrected the earlier ones by Kaplan (2020); later models such as Llama even deliberately went beyond the compute-optimal ratio to be more efficient in use.

The Chinchilla scaling laws: for a fixed compute budget, model size and training data should grow roughly in step.

The largest models (GPT-3, Gopher) were oversized and under-trained. Chinchilla (70B, 1.4T tokens) beat the 4x larger Gopher (280B).

Shifted how practically every later frontier model is trained (data/parameter ratio); influenced Llama, among others.

Anti-hype: Chinchilla did not invent scaling laws but corrected Kaplan (2020); later models deliberately over-train for more efficient use.

People:Jordan Hoffmann

Organizations:Google DeepMind

2022Products

PaLM: Google's Giant with 540 Billion Parameters

In 2022, Google showed how far language models could be scaled up: PaLM, the Pathways Language Model, had 540 billion parameters and was trained with Google's Pathways system across thousands of TPU chips. What impressed was less the sheer size than what PaLM could do with it. Using so-called chain-of-thought prompts, in which the model writes out its reasoning step by step, it solved multi-step word problems and even explained the punchlines of jokes. PaLM thus became the poster child for the idea of emergent abilities — skills that appear suddenly only above a certain model size. It was a high point of Google's scaling era and a forerunner of PaLM 2 and Gemini. For an honest assessment: 540 billion parameters were extremely expensive, and PaLM was never released as an open model. The emergent-abilities thesis is also contested — some of those jumps are partly an artifact of the chosen measurement metric.

In 2022, Google introduced PaLM — a language model with 540 billion parameters, trained on thousands of TPU chips.

PaLM excelled at multi-step reasoning: with chain-of-thought prompts it solved word problems and even explained jokes.

It fueled the idea of emergent abilities — skills that appear suddenly only above a certain model size.

Anti-hype: 540 billion parameters were enormously expensive, and PaLM was never released openly. The emergent-abilities thesis is also contested — some jumps are partly an artifact of the metric (Schaeffer et al. 2023).

Organizations:Google

2022Products

Stable Diffusion: Open-source image generation

The democratization of AI image generation through the first powerful open-source model. On August 22, 2022, Stability AI released Stable Diffusion and significantly transformed access to advanced text-to-image technology. As the first open-source model of its class, Stable Diffusion could generate photorealistic 512x512-pixel images on consumer GPUs – an important advancement in speed and accessibility. Based on Latent Diffusion Models (LDMs), the system iterates through 'de-noising' in latent spaces instead of direct pixel manipulation. With 860 million parameters in the U-Net and 123 million in the text encoder, it remained relatively lightweight despite high performance. The GitHub-available source code enabled an explosively growing community to develop countless variants and tools. Stable Diffusion broke the monopoly of proprietary systems and made high-quality AI image generation accessible to everyone.

First powerful open-source text-to-image model with GitHub-available source code

Latent diffusion models with iterative de-noising in latent spaces instead of direct pixel manipulation

Explosive community growth with countless variants, tools, and applications

Broke monopoly of proprietary systems and democratized high-quality AI image generation

People:Emad Mostaque, Robin Rombach, Andreas Blattmann

Organizations:Stability AI, CompVis, Runway

2022Breakthroughs

OpenAI Releases Whisper

When speech recognition finally became reliable — and available to everyone. On September 21, 2022, OpenAI released Whisper, a speech recognition system trained to work robustly across different languages, accents, and background noise. Unlike earlier systems trained on clean audio data, Whisper used 680,000 hours of multilingual data from the internet. The result: a system capable of transcribing in 99 languages while competing with commercial solutions. OpenAI released Whisper as open source — a gift to developers worldwide that enabled countless applications.

Released on September 21, 2022, as open source

Covers 99 languages and transcribes robustly even with accents and background noise — strongest in English, as the majority of training data is in English

Trained on 680,000 hours of multilingual audio data from the internet

Democratized high-quality speech recognition through open-source availability

People:Alec Radford, Jong Wook Kim, Tao Xu

Organizations:OpenAI

2022Products

ChatGPT Marks a Turning Point in AI Adoption

The moment AI became accessible to everyone and a new era began. On November 30, 2022, OpenAI released ChatGPT as a free research preview — without major marketing, with few expectations. What followed exceeded all forecasts: after 5 days ChatGPT reached one million users, after two months 100 million — at the time the fastest user growth any consumer application had ever achieved (surpassed in July 2023 by Meta's Threads). Built on GPT-3.5, ChatGPT gave a broad audience direct access to a powerful AI for the first time, without any technical barriers. Kevin Roose of the New York Times called it the 'best AI chatbot ever released to the public.' ChatGPT democratized artificial intelligence, turning a research field into an everyday tool. This release marked the beginning of the current generative AI wave.

Released on November 30, 2022, as a free research preview accessible to the general public

Reached 1 million users in 5 days, 100 million in 2 months — at the time the fastest growth of any consumer app (later surpassed by Threads)

First powerful AI without technical barriers — direct web access for any internet user

Democratized AI and triggered the current generative AI wave across society and the economy

People:Sam Altman, Greg Brockman, Ilya Sutskever, John Schulman

Organizations:OpenAI, Microsoft, ChatGPT

2022Papers

Constitutional AI — Safety Through Principles

In December 2022, Anthropic introduced Constitutional AI (CAI), a new method for developing harmless, helpful, and honest AI systems. A 'constitution' of ethical principles allows the AI to critique and improve its own responses to harmful content — without human labels for exactly those harm evaluations. (Anthropic only described the explicit anchoring of these principles in the UN Declaration of Human Rights and other foundational rights documents in May 2023 in 'Claude's Constitution'; the original paper used a pragmatically assembled set of principles.) The innovative RLAIF method (Reinforcement Learning from AI Feedback) replaces human feedback only for harmlessness through AI self-critique — helpfulness continued to be trained via human preference data (RLHF). CAI thus establishes a safety-first approach as an alternative to ChatGPT's pure performance focus and paves the way for responsible AI development.

The AI critiques and improves its own responses to harmful content — without human harm labels for these evaluations

Safety-first alternative to pure performance approaches such as ChatGPT

Triple goal: helpful, honest, and harmless through ethical principles

RLAIF: Reinforcement Learning from AI Feedback replaces human evaluations for harmlessness (helpfulness continues via RLHF)

People:Yuntao Bai, Andy Jones, Kamal Ndousse, Dario Amodei, Anthropic Team

Organizations:Anthropic

2023Regulation

NIST AI Framework: USA Defines Trustworthy AI

On January 26, 2023, the US National Institute of Standards and Technology published the first comprehensive AI Risk Management Framework (AI RMF 1.0) — America's response to global AI regulation. After 18 months of development with 240+ organizations from industry, academia, and civil society, NIST defined federal standards for trustworthy AI for the first time. The framework establishes four core functions: Govern, Map, Measure, Manage — and seven characteristics of trustworthy AI: safe, resilient, explainable, privacy-enhancing, fair, transparent, and accountable. As a voluntary standard, it aims to minimize AI risks for individuals, organizations, and society. The publication followed Biden's AI Bill of Rights (2022) and was later complemented by his AI Executive Order (October 2023). The AI RMF was created under the statutory mandate of the National AI Initiative Act of 2020 — NIST continued its established role as the federal standards agency. The framework became the foundation for industry standards and international coordination — a counterweight to China's state-controlled AI approach and Europe's regulatory model.

Four core functions: Govern, Map, Measure, Manage for systematic AI risk management

Seven characteristics of trustworthy AI: safe, resilient, explainable, privacy-enhancing, fair, transparent, and accountable

Voluntary multi-stakeholder approach: 240+ organizations jointly developed the standards

Federal standards agency: NIST developed the AI RMF under the mandate of the National AI Initiative Act of 2020

People:NIST AI Team, 240+ Contributing Organizations

Organizations:NIST, US Department of Commerce, Biden Administration

2023Products

LLaMA: Open-Source Foundation Model

The democratization of Large Language Models through open research models. On February 24, 2023, Meta AI released LLaMA (Large Language Model Meta AI) — a collection of foundation models ranging from 7B to 65B parameters, trained exclusively on publicly available data. The landmark paper 'LLaMA: Open and Efficient Foundation Language Models' demonstrated that state-of-the-art performance is achievable without proprietary datasets. LLaMA enabled researchers without access to large infrastructure to study advanced language models. The inference code was released under a GPLv3 license, while model access was granted on a case-by-case basis for academic research. Trained on trillions of tokens and offered in various model sizes, LLaMA addressed diverse hardware requirements. This work catalyzed a wave of open LLM research and inspired numerous follow-up models in the open-source community.

Inference code under GPLv3 license; model weights were released on a case-by-case basis and exclusively for non-commercial research

Models from 7B to 65B parameters trained exclusively on publicly available datasets

Enabled researchers without large infrastructure to study advanced language models

Various model sizes for different hardware requirements and research purposes

People:Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet

Organizations:Meta AI, FAIR

2023Products

Claude and Constitutional AI

The introduction of an AI with built-in value system and ethical principles. In March 2023, Anthropic introduced Claude – an AI assistant based on Constitutional AI that established a novel approach to AI safety. Unlike conventional systems, Claude learns through a two-phase method: first the model critiques and improves its own responses based on a constitution of ethical principles, then it is refined through AI-generated feedback – without human evaluations for harm prevention. The result is a system that acts both helpfully and harmlessly. Anthropic released Claude and Claude Instant simultaneously, with the latter being a faster, more cost-effective variant. This Constitutional AI method proved to be a Pareto improvement over human feedback and opened new paths for scalable AI oversight.

Constitutional AI framework with two-phase training: self-critique based on ethical principles, then AI feedback-based refinement

Novel safety approach without human harm evaluations – purely through AI supervision

Simultaneous release of Claude and Claude Instant for different application requirements

Established 'helpful, harmless, honest' as core values for responsible AI development

People:Dario Amodei, Daniela Amodei, Tom Brown, Chris Olah

Organizations:Anthropic, Constitutional AI, AI Safety

2023Products

GPT-4: Multimodal AI Model

The breakthrough to human-level performance on professional and academic benchmarks. On March 14, 2023, OpenAI unveiled GPT-4 - a large multimodal model that processes both text and image inputs and reaches human-level performance across a range of disciplines. The improvements were substantial: while GPT-3.5 passed the bar exam in the bottom 10%, GPT-4 reached the top 10%. On the SAT Math test, performance improved from the 70th to the 89th percentile. After six months of iterative alignment drawing on insights from the adversarial testing program and ChatGPT feedback, the entire deep learning stack was rebuilt from scratch. The multimodal capabilities allow GPT-4 to process documents, diagrams, and screenshots with the same quality as pure text inputs. GPT-4 established new standards for AI safety and performance.

Large multimodal model with text and image inputs, vision capabilities for documents and diagrams

Bar exam top 10% vs. GPT-3.5 bottom 10%; SAT Math improved from the 70th to the 89th percentile

6 months of iterative alignment with adversarial testing and ChatGPT feedback for improved safety

Integration into ChatGPT Plus made advanced multimodal AI accessible to consumers

People:Sam Altman, OpenAI Team

Organizations:OpenAI

2023Products

Midjourney V5: Photorealistic AI Art

Photorealistic AI image generation reaches a new quality level, considerably transforming the creative industry. On March 15, 2023, Midjourney released Version 5 and achieved a quality leap that users described as 'uncanny' and 'too perfect.' The alpha version could for the first time generate photorealistic images that were barely distinguishable from real photographs. Particularly noteworthy: the chronic problem of malformed hands was considerably improved — V5 could correctly render five fingers in most cases. Julie Wieland, a graphic designer, compared the experience to 'finally getting glasses after ignoring poor vision for too long' — suddenly seeing everything in 4K quality [Source: Ars Technica, March 2023]. Improved prompt sensitivity enabled more precise creative control, while automatic upscaling scaled 1024x1024-pixel base images without additional GPU costs. V5 sparked intense debates about the future of human creativity.

Photorealistic image quality barely distinguishable from real photographs

Sparked intense reactions in the creative community — from enthusiasm to existential concerns

Considerably advanced AI art through precise hand rendering and improved prompt sensitivity

Set new standards for commercial AI image generation with considerable impact on the creative industry

People:David Holz, Midjourney Team

Organizations:Midjourney Inc

2023Regulation

Biden AI Executive Order — First Comprehensive US Regulation

President Biden signed Executive Order 14110 on October 30, 2023, on the 'safe, secure, and trustworthy development and use of artificial intelligence' — the first comprehensive AI regulation in the United States and, at 110 pages, the longest executive order in history. The far-reaching order requires developers of powerful AI systems to disclose safety test results and establishes strict red-team standards through NIST. It guards against AI-based fraud through content authentication and watermarking, and addresses risks in critical infrastructure and biological threats. At the moment of its signing, this order set global standards for responsible AI development and positioned the United States as a leader in AI governance. Its effect, however, did not last: on January 20, 2025, President Trump revoked EO 14110 via Executive Order 14148 — the document thus marks the regulatory state of 2023.

Most comprehensive AI governance ever — 110 pages, the longest executive order in history

Mandatory safety tests and red-team results for powerful AI systems

Defense Production Act: mandatory reporting for AI systems posing national security risks

Positioned the United States in 2023 as a leader in responsible AI governance — revoked again in 2025

People:Joe Biden, Kamala Harris

Organizations:White House, NIST, Department of Homeland Security

2023Regulation

Pause Letter & Bletchley: AI Safety Goes Global

In 2023, in the first shock after ChatGPT, the world wrestled with rules for a suddenly powerful technology. In March, thousands of signatories — including Yoshua Bengio and Elon Musk — called, in an open letter from the Future of Life Institute, for a six-month pause on training AI systems more powerful than GPT-4. No pause happened, but the letter put the topic on the world's agenda. In November came the first global AI Safety Summit at Britain's Bletchley Park — deliberately at the place where Turing once cracked codes. 28 states and the EU, including the US and China, signed the Bletchley Declaration on the risks of advanced AI. It was the first time rival powers spoke together about AI safety — the start of a summit series (Seoul 2024, Paris 2025). For an honest assessment: the pause never came, and the Bletchley Declaration was non-binding — both set agendas but created no enforceable rules.

March 2023: an open letter from the Future of Life Institute (thousands of signatories, incl. Bengio, Musk) called for a 6-month pause on training AI more powerful than GPT-4.

November 2023: the first global AI Safety Summit at Britain's Bletchley Park — where Turing cracked codes during the war.

28 states and the EU — including the US and China — signed the Bletchley Declaration on the risks of advanced AI; the start of a summit series (Seoul 2024, Paris 2025).

Anti-hype: the pause never came; the declaration was non-binding. Both set agendas but created no enforceable rules.

Organizations:Future of Life Institute, UK Government

2023Products

Mistral & Mixtral: Europe's Open Models

While US corporations dominated the 2023 headlines, a challenger emerged from Paris: Mistral AI, founded in spring 2023 by Arthur Mensch (formerly at Google DeepMind) together with Guillaume Lample and Timothée Lacroix (formerly at Meta). As early as September, the small Mistral 7B model surprised the field — freely available under an Apache 2.0 licence and stronger than the much larger Llama 2 13B. December brought Mixtral 8x7B: an open Mixture-of-Experts model that matched GPT-3.5 on many tasks while activating only a fraction of its parameters per query (about 13 of 47 billion). Mistral became Europe's flagship for open models and raised billions. For an honest assessment: open weights are not the same as open source — training data and code stay locked away. And Mixtral matched GPT-3.5, not the then-leading GPT-4; Mixture-of-Experts itself is also considerably older.

Spring 2023: in Paris, Arthur Mensch (ex-Google-DeepMind) plus Guillaume Lample and Timothée Lacroix (ex-Meta) founded Mistral AI — Europe's answer to the US labs.

September 2023: Mistral 7B — a small, open-weights model (Apache 2.0) that beat the larger Llama 2 13B.

December 2023: Mixtral 8x7B, an open Mixture-of-Experts model — matching GPT-3.5 on many benchmarks, yet efficient (only ~13B active of ~47B parameters).

Anti-hype: open weights is not open source (training data/code stay closed); Mixtral matched GPT-3.5, not GPT-4. Mixture-of-Experts is also older (e.g. Shazeer 2017).

People:Arthur Mensch, Guillaume Lample, Timothée Lacroix

Organizations:Mistral AI

2023Products

Google Gemini: Multimodal AI Family

Google's response to ChatGPT and a breakthrough in native multimodality. On December 6, 2023, Google announced Gemini 1.0 - an AI family built for multimodality from the ground up. The collaboration between DeepMind and Google Brain produced three model sizes: Gemini Ultra for highly complex tasks, Gemini Pro as a balanced solution, and Gemini Nano for on-device applications. Unlike systems that added multimodal capabilities as an afterthought, Gemini was designed natively with understanding of language, audio, code, and video. In six out of eight benchmarks, Gemini Pro outperformed the GPT-3.5 standard, including MMLU tests. On the day of the announcement, the regular Bard received new capabilities with Gemini Pro; Google announced the more powerful Bard Advanced with Gemini Ultra for early 2024. Gemini marked Google's strategic response to OpenAI's dominance and established multimodal AI as the new standard for large language models.

Built for multimodality from the ground up: language, audio, code, and video understanding natively integrated

Outperformed GPT-3.5 in 6 out of 8 standard benchmarks and established Google as a serious ChatGPT alternative

Three model sizes: Ultra (complex), Pro (balanced), Nano (on-device) for different use cases

Regular Bard received Gemini Pro on the day of the announcement; Bard Advanced with Gemini Ultra was announced for early 2024

People:Sundar Pichai, Demis Hassabis, Gemini Team

Organizations:Google, DeepMind, Google AI

2024Products

Embodied AI: The Models Get a Body

For years, the big AI models lived only on screens — they wrote texts, painted pictures, held conversations. In 2024 that began to change: it became the year of embodied AI. The idea is to put the same foundation models that understand language and images into real bodies — above all into humanoid robots. The company Figure teamed up with OpenAI and showed a robot that talks, sees, and handles objects. NVIDIA introduced Project GR00T, a foundation model specifically for humanoids, and young companies such as Physical Intelligence were valued in the billions. Many were already speaking of robotics' ChatGPT moment. For an honest assessment: most of this was so far demonstrations and announcements, not machines that work reliably in everyday life. The physical world is incomparably harder for a robot to master than the screen — dexterity, safety, and reliability remain unsolved problems.

2024 became the year of embodied AI: language models that had lived only in chat moved into robots — especially humanoid ones.

Figure teamed up with OpenAI and showed a talking, acting humanoid; NVIDIA introduced Project GR00T, a foundation model for humanoids; startups like Physical Intelligence were valued in the billions.

The hope: a robot that unites language, vision, and action in one foundation model could learn general tasks in the real world — a ChatGPT moment for robotics.

Anti-hype: much of this was so far demos and announcements, not reliably working products. The real world is incomparably harder for robots than the screen — dexterity, safety, and reliability remain unsolved.

Organizations:Figure AI, NVIDIA, Physical Intelligence

2024Products

Waymo: The Driverless Taxi Becomes Everyday Reality

For more than a decade, autonomous driving was the prime example of AI promises that kept slipping. In 2024 it became tangible: Waymo, Google's robotaxi subsidiary, made driverless taxis available to the public at scale for the first time — in San Francisco, Los Angeles, and Phoenix. In the summer of 2024, the company reported more than 100,000 paid rides per week, entirely without a safety driver at the wheel. After years of announcements, this was the first solid proof that autonomous driving can work as a real, everyday service. For an honest assessment: Waymo only drives in tightly bounded, painstakingly mapped urban areas — not everywhere, and not in all weather. There are still breakdowns and stalled vehicles, and operations are expensive. Fully autonomous driving everywhere remains unsolved; the retreat of rival Cruise after a serious 2023 accident showed how fragile the technology still is.

In 2024, Waymo, Google's robotaxi subsidiary, became the first to offer driverless taxis at scale — open to the public in several US cities.

In the summer of 2024, Waymo reported more than 100,000 paid rides per week, entirely without a safety driver at the wheel.

After more than a decade of promises, it was the first solid proof that autonomous driving can work as a real service.

Anti-hype: Waymo only drives in tightly bounded, mapped urban areas — not everywhere. There are still breakdowns, and operations are expensive. Fully autonomous driving everywhere remains unsolved (rival Cruise's retreat showed the fragility).

Organizations:Waymo, Alphabet

2024Products

Sora: AI-Generated Video from Text

The leap to photorealistic AI-generated video and its implications for the film industry. On February 15, 2024, OpenAI unveiled Sora — a text-to-video model that generates detailed HD videos up to one minute long from brief descriptions. Named after the Japanese word for 'sky,' Sora symbolizes 'boundless creative potential.' As a Diffusion Transformer, Sora adapts DALL-E 3 technology for temporal consistency and often — though not reliably — simulates physically plausible motion. The demo videos surpassed all existing text-to-video systems and set new standards for AI creativity. Director Tyler Perry halted an 00 million studio expansion out of concern about Sora's impact on the industry. OpenAI pursued a cautious approach with red team testing for misinformation and bias before a broader release.

Photorealistic text-to-video generation producing HD videos up to a minute long, surpassing existing systems

Diffusion Transformer based on DALL-E 3 technology for temporal consistency

Often simulates physically plausible motion and maintains consistency throughout the full video length

Potential disruption of the film industry — Tyler Perry halted an 00 million studio expansion

People:Tim Brooks, Bill Peebles, Connor Holmes, Will DePue

Organizations:OpenAI

2024Products

Claude 3 family with multimodal capabilities

The introduction of an AI family with vision and three specialized models. On March 4, 2024, Anthropic introduced the Claude 3 family: Opus, Sonnet, and Haiku – three models with different strengths for various use cases. The central feature was sophisticated vision processing that can analyze photos, charts, diagrams, and technical drawings. Claude 3 Opus achieved new best results in cognitive tasks and surpassed competitors in benchmarks like MMLU and GPQA. Sonnet offered the ideal balance between intelligence and speed for enterprises, while Haiku impressed with near-instant response times. With a context window of 200,000 tokens (expandable to 1 million) and availability in 159 countries, Claude 3 set new benchmark standards for multimodal AI systems.

Sophisticated vision processing for photos, charts, diagrams, and technical drawings

Opus (highest intelligence), Sonnet (balance), Haiku (speed) for different use cases

Multimodal capabilities enable processing visual formats alongside text processing

Claude 3 Opus achieved new best results in MMLU, GPQA, and other cognitive benchmarks

People:Dario Amodei, Daniela Amodei, Tom Brown, Claude 3 Team

Organizations:Anthropic, Claude API, Amazon Bedrock

2024Products

Devin: The First Autonomous AI Software Engineer

The birth of fully autonomous software development through artificial intelligence. On March 12, 2024, Cognition Labs introduced Devin — marketed by the company as the world's first fully autonomous AI software engineer. The system can independently plan, clone repositories, write code, debug, test, and even deploy. On the demanding SWE-Bench, Devin achieved a 13.86% success rate on real-world GitHub issues — a massive leap over the previous best of 1.96%. The startup was valued at around 50 million in an early funding round; shortly after launch, reports of a valuation of around billion circulated. Despite impressive successes, tests also revealed limitations: only 3 out of 20 tasks were completed successfully, often with unpredictable failures.

Fully autonomous software development: planning, coding, debugging, testing, and deployment without human intervention

Handles complex engineering tasks from code migration to complete app development

13.86% success rate on SWE-Bench — 7x better than the previous state of the art of 1.96%

Sparked debate about the future of software development and inspired open-source alternatives such as OpenHands

People:Scott Wu, Steven Hao, Walden Yan

Organizations:Cognition Labs, SWE-Bench

2024Breakthroughs

AlphaFold 3: AI Predicts How Molecules Interact

Four years after the breakthrough of AlphaFold 2, Google DeepMind followed up in May 2024 — together with its sister company Isomorphic Labs. AlphaFold 2 had predicted how a single protein folds into its three-dimensional shape. AlphaFold 3 takes a decisive step further: it models how proteins interact with other molecules — with DNA, RNA, ions, and small drug molecules. It is precisely this interplay that is crucial for drug discovery, because it lets researchers estimate on a computer how a drug binds to its target protein. For an honest assessment: the predictions are impressive but not error-free — their accuracy varies by molecule type, and they still need to be verified in the lab. Moreover, AlphaFold 3 was initially released without open source code, only as a limited web service, which drew criticism in the research community about reproducibility.

In May 2024, Google DeepMind and Isomorphic Labs introduced AlphaFold 3.

While AlphaFold 2 predicted the folding of single proteins, AlphaFold 3 models their interactions — with DNA, RNA, drug molecules, and ions.

Especially valuable for drug discovery: researchers can estimate on a computer how a drug binds to its target protein.

Anti-hype: the predictions are not error-free and must be verified in the lab. AlphaFold 3 was also initially released without open code — only as a limited web service, which drew criticism about reproducibility.

Organizations:Google DeepMind, Isomorphic Labs

2024Competitions

AlphaProof: AI Wins Silver at the Math Olympiad

Mathematics was long seen as the supreme discipline where AI fails — too creative, too dependent on genuine understanding. In July 2024, Google DeepMind made a statement: its system AlphaProof, together with AlphaGeometry 2, solved four of the six problems at the International Mathematical Olympiad. That matched the level of a silver medal, just a single point below gold. What stands out is the method: AlphaProof formulates its proofs in the formal language Lean, which makes every step machine-checkable — so the AI cannot cheat. It learned through reinforcement learning. For the first time, an AI reached medal level at this highly respected competition. For an honest assessment: these were not real contest conditions. Where humans have only four and a half hours, the AI computed for days in some cases, and experts first had to translate the problems by hand into the formal language. The two combinatorics problems went unsolved.

In July 2024, Google DeepMind's AlphaProof, with AlphaGeometry 2, solved four of the six International Mathematical Olympiad problems — at silver-medal level.

AlphaProof formulates proofs in the formal language Lean and checks them itself; it learned via reinforcement learning. AlphaGeometry 2 handled the geometry problem.

For the first time, an AI reached medal level at this prestigious competition — a milestone for machine reasoning with verifiable proofs.

Anti-hype: not contest conditions — the AI took days in some cases instead of 4.5 hours, and humans first translated the problems into formal language. The two combinatorics problems went unsolved.

Organizations:Google DeepMind

2024Regulation

EU AI Act: The First Comprehensive AI Law

The world's first comprehensive regulation of artificial intelligence enters into force. On August 1, 2024, the EU AI Act became legally binding — a risk-based regulatory framework with 180 recitals and 113 articles covering the entire AI lifecycle. The law categorizes AI systems according to four risk levels: prohibited applications are banned, high-risk systems in education, employment, and the justice sector are subject to detailed compliance obligations, systems with limited risk must meet transparency requirements, and the large remainder with minimal risk remains largely unrestricted. In addition, separate rules apply to GPAI foundation models such as GPT, which powers ChatGPT, for example. The extraterritorial reach also covers providers outside the EU with European users. Violations can result in fines of up to 35 million euros or 7% of global annual turnover. Like the GDPR in 2018, the AI Act could set global standards and determine how AI affects our lives. The phased implementation begins in 2025 and is fully in effect by 2027.

The world's first comprehensive AI law with 180 recitals and 113 articles covering the entire AI lifecycle

Four risk levels: prohibited, high-risk, limited, and minimal risk — plus separate rules for GPAI foundation models

Extraterritorial reach like the GDPR could set global AI standards and influence worldwide compliance

Fines of up to 35 million euros or 7% of annual turnover, with phased implementation from 2025 to 2027

People:Ursula von der Leyen, Thierry Breton

Organizations:European Union, European Parliament, European Commission

2024Products

OpenAI o1 — Advances in Reasoning

On September 12, 2024, OpenAI released o1-preview (and o1-mini) and significantly expanded AI reasoning through chain-of-thought, whose reasoning chain is trained via reinforcement learning. o1 is the first widely available language model that systematically 'thinks' before responding — using a private chain of thought, it analyzes problems step by step. This new approach opens up an additional scaling dimension: test-time scaling, where longer 'thinking' leads to better results. The full o1 model achieves PhD-level performance in physics, chemistry, and biology in benchmark tests and solves 83% of tasks in the American Invitational Mathematics Examination (GPT-4o: 13%). The technology demonstrates that AI can develop significantly improved problem-solving capabilities through structured reasoning.

First model whose chain-of-thought is trained and scaled via reinforcement learning — enabling structured reasoning

New scaling dimension: the longer it thinks, the better the results

New approach: from pattern reproduction to improved problem solving

Notable advance in complex reasoning — improved problem-solving capabilities

People:Sam Altman, Noam Brown, OpenAI Team

Organizations:OpenAI

2024Milestones

The AI Nobel Prizes of 2024

In October 2024 something unprecedented happened: two science Nobel Prizes honoured the foundations of modern AI. On 8 October the Physics Nobel went to John Hopfield and Geoffrey Hinton — for foundational discoveries that enable machine learning with artificial neural networks. That physics, of all fields, should crown neural networks sparked debate — yet Hopfield's physics-inspired networks (1982) and Hinton's learning methods genuinely laid the groundwork. A day later the Chemistry Nobel was shared by David Baker (for computational protein design) and Demis Hassabis and John Jumper of DeepMind — for AlphaFold, which cracked the 50-year-old protein-folding problem. For the first time, foundational AI research was honoured at the very top of science. Notably, Hinton — a freshly minted laureate — used the stage to warn about the risks of the technology he had helped create.

8 October 2024: the Physics Nobel to John Hopfield and Geoffrey Hinton for the foundations of machine learning with neural networks — a physics prize for AI.

9 October 2024: the Chemistry Nobel to David Baker (protein design) and Demis Hassabis and John Jumper of DeepMind (AlphaFold, protein folding).

For the first time two science Nobel Prizes in one year honoured the foundations of AI — a turning point in the field's standing.

Debated: are neural networks really physics? The prizes honour decades-old foundations (Hopfield networks 1982, Hinton's Boltzmann machine). Hinton simultaneously warned of AI risks.

People:John Hopfield, Geoffrey Hinton, Demis Hassabis, John Jumper, David Baker

Organizations:Royal Swedish Academy of Sciences

2024Breakthroughs

OpenAI o3: Breakthrough on ARC-AGI

Just before the end of 2024, on 20 December, OpenAI announced o3 — the successor to o1 and the proof that thinking at run-time (test-time scaling) keeps scaling. One number drew the most attention: o3 reached 87.5% on ARC-AGI, a test deliberately built so that it cannot be passed by memorisation — where previous models had stayed near zero. With that, o3 moved into near-human territory on this benchmark for the first time, while also excelling at mathematics and coding. Together with o1 and DeepSeek's R1, o3 marked the era of reasoning models (o3-mini followed at the end of January 2025, full o3 in April). For an honest assessment: the 87.5% came in a high-compute mode with enormous — and very expensive — computation per task; the ARC Prize organisers stressed explicitly that o3 is not AGI and drops sharply on the harder successor test ARC-AGI-2.

o3 (announced 20 Dec 2024) extends o1's test-time scaling: more thinking at run-time → better results, top scores in mathematics and code.

87.5% on ARC-AGI — a test built to resist memorisation, on which predecessors scored near zero: a much-noted leap toward near-human adaptivity.

With o1 and DeepSeek-R1, the era of reasoning models; o3-mini end of Jan 2025, full o3 in April 2025.

Anti-hype: the 87.5% came in the expensive high-compute December preview (the later released o3 scored lower); the ARC organisers stress that o3 is NOT AGI and drops to ~3% on the harder ARC-AGI-2.

Organizations:OpenAI

2025Products

Agentic AI Goes Mainstream

In 2024 and 2025, what AI actually does shifted: from answering to acting. Anthropic opened the chapter in October 2024 with Computer Use — the first of the major AI labs to ship a model that operates a computer itself: look at the screen, move the mouse, click, type. In January 2025 came OpenAI's Operator, an agent that browses the web and completes tasks on its own, soon followed by Deep Research, which researches in multiple steps and writes cited reports. The chatbot that outputs text became a system that acts on the user's behalf — the qualitative turn that Devin (2024) had already hinted at. For an honest assessment: the first versions were slow, error-prone and often limited to narrowly defined tasks; the systems marketed as agents were promoted more strongly in 2025 than their reliability could yet match.

Anthropic, Computer Use (Oct 2024): the first frontier model to offer computer use in public beta — screen, mouse, keyboard.

OpenAI: Operator (Jan 2025) browses the web on its own; Deep Research (Feb 2025) researches in multiple steps and writes cited reports.

The turn from chatbot (output text) to agent (act) — hinted at by Devin (2024), product mainstream in 2025.

Anti-hype: early versions were slow, error-prone and narrow; the systems were marketed more strongly than their 2025 reliability warranted.

Organizations:Anthropic, OpenAI

2025Products

DeepSeek-R1: The AI Shock from China

In late January 2025, an AI model visibly moved the world's stock markets for the first time. The Chinese lab DeepSeek released R1 on 20 January 2025 — a reasoning model on par with OpenAI's o1, but with open weights (MIT licence) and trained at a fraction of the expected cost. What made it possible was large-scale reinforcement learning on the base model DeepSeek-V3. When the DeepSeek app topped the US charts a week later, sentiment flipped: on 27 January, Nvidia lost about 17% of its value — roughly 600 billion dollars in a single day, the largest one-day loss in US stock-market history — as investors feared that frontier AI might not need endlessly expensive chips after all. R1 shook several certainties at once: that only US hyperscalers can compete at the top, that reasoning models stay closed, and that more compute is the only way forward. For an honest assessment: the widely-cited figure of a few million dollars refers only to the final training run of the V3 base model (not R1 itself, nor research and hardware overall) — and R1 was not better than o1 in every discipline.

R1 (20 Jan 2025): a reasoning model at o1 level with open weights (MIT licence), trained via large-scale reinforcement learning on DeepSeek-V3.

Trained at a fraction of the expected cost — challenging the assumption that frontier AI necessarily needs huge compute budgets.

27 Jan 2025: Nvidia down ~17% (about $600B in one day, a US record); China at the AI frontier — AI became visibly a market and geopolitics question.

Anti-hype: the cited few million dollars refers only to the final training run of the V3 base model — not R1 itself, nor research/hardware overall; R1 was not uniformly better than o1.

People:Liang Wenfeng

Organizations:DeepSeek

2025Milestones

Stargate: AI as Nation-Scale Infrastructure

On 21 January 2025, artificial intelligence took the stage at the White House — as a nation-scale infrastructure project. OpenAI, SoftBank, Oracle and the investor MGX announced the Stargate project: up to 500 billion dollars over four years for AI data centres in the United States, with deployment of 100 billion to begin immediately. It made visible that the next phase of AI is less an algorithms question than an energy-and-construction one: computing power on the scale of power plants and industrial parks. For a field whose through-line since AlexNet has been compute (see CUDA 2007), this was the logical but enormous next step — and a signal that AI had become a national, geopolitical priority. For an honest assessment: an announcement is not a finished data centre. Whether the full 500 billion actually materialises was contested from the start — even participants and observers publicly doubted the financing.

Up to $500B over four years for AI data centres in the US (OpenAI, SoftBank, Oracle, MGX); deployment of $100B to begin immediately.

Unveiled at the White House: AI became visibly a question of national infrastructure and geopolitics.

The next phase of AI is an energy-and-construction question — compute on the scale of power plants (the through-line since CUDA/AlexNet).

Anti-hype: an announcement is not a finished data centre; whether the full $500B materialises was contested from the start.

People:Sam Altman, Masayoshi Son, Larry Ellison

Organizations:OpenAI, SoftBank, Oracle

2025Regulation

Paris AI Action Summit

On 10 and 11 February 2025, heads of state and government, tech companies and researchers gathered at the Grand Palais in Paris for the AI Action Summit — the third major AI summit after Bletchley (2023) and Seoul (2024), co-chaired by France's President Macron and India's Prime Minister Modi. The change of tone was striking: where the first summit had put AI safety at the centre, Paris was mostly about opportunity, investment and competitiveness — the US Vice President openly argued against too much regulation. In the end, 58 countries plus the EU and the African Union signed a declaration for inclusive and sustainable AI — but the United States and the United Kingdom declined to sign. The summit thus laid bare the transatlantic rift in AI governance. For an honest assessment: the declaration was non-binding, and critics called the summit a missed opportunity for safety.

Third global AI summit (after Bletchley 2023, Seoul 2024): 10-11 February 2025, Grand Palais, co-chaired by Macron and Modi.

A shift from safety to opportunity and competition: Paris stressed investment over risks; the US Vice President argued against too much regulation.

58 countries plus the EU and the African Union signed the final declaration — the US and UK declined to sign (an open transatlantic rift).

Anti-hype: the declaration was non-binding; critics called the summit a missed opportunity for safety.

People:Emmanuel Macron, Narendra Modi

2025Products

The Frontier Models of 2025

In 2025 the reasoning capability that o1 and R1 had kicked off became the standard of frontier models — at a pace that was hard to keep up with. In March Google presented Gemini 2.5 Pro, in May Anthropic followed with Claude 4 (Opus 4 and Sonnet 4), in August OpenAI with GPT-5; in between came Claude 3.7 (the first hybrid model that either answers quickly or thinks for longer), GPT-4.5, Meta's Llama 4 and xAI's Grok. The new generation combined two lines: the step-by-step thinking of reasoning models and the ability to act on their own (agency). Long-horizon autonomous coding in particular moved to the centre. For an honest assessment: the labs outdid each other with benchmark records week after week, and each claimed the top spot for itself — real progress, but the much-invoked word AGI remained marketing more than reality.

In 2025 reasoning (step-by-step thinking) and agency (acting on its own) became standard in frontier models; Claude 3.7 introduced the hybrid model that answers fast or thinks longer on demand.

A tight race: Gemini 2.5 Pro (March), Claude 4 / Opus 4 (May), GPT-5 (August) — plus Llama 4, Grok, DeepSeek. Several labs at the frontier.

At the centre: long-horizon autonomous coding (e.g. Claude Code) — models that work through whole tasks on their own.

Anti-hype: benchmark records week after week, every lab claims the top spot; real progress, but AGI remained marketing more than reality.

Organizations:Anthropic, OpenAI, Google DeepMind

1837Milestones

Babbage's Analytical Engine: The Idea of the Computer

In the 1830s, the British mathematician Charles Babbage designed the Analytical Engine, first describing it in 1837 — the first design for a general-purpose, programmable computer.

His design already had the building blocks of today's computers: a calculating unit (the mill), a memory (the store), programming via punched cards, and even conditional branching.

Babbage's machine was the distant ancestor of every computer — and thus of the hardware on which AI can run in the first place.

Anti-hype: the Analytical Engine was never finished in Babbage's lifetime — it remained a design on paper. And it was a calculator, not an AI: the foundation, not thinking itself.

People:Charles Babbage

1843Papers

Ada Lovelace: The First Program — and a Bold Vision

In 1843, Ada Lovelace translated an article on Babbage's Analytical Engine and added her own extensive notes, which far surpassed the original text.

Her Note G contains a procedure for computing the Bernoulli numbers — often called the first published computer program.

With foresight she saw that the machine could do more than calculate: it could process symbols and even compose music — the idea of general-purpose computing.

People:Ada Lovelace

1936Papers

The Turing Machine: What Computation Even Means

In 1936, Alan Turing published the paper On Computable Numbers, describing a simple thought model of computation — what later came to be called the Turing machine.

With it, Turing pinned down what is computable at all. A universal Turing machine can imitate any other — the theoretical blueprint of the general-purpose computer.

With it, Turing became the founder of computer science. That a single machine can compute anything computable is the basis for machines later learning to think.

People:Alan Turing

1943Papers

McCulloch & Pitts: The First Artificial Neuron

The first mathematical model of the neuron as a logical computing unit: McCulloch and Pitts cast the workings of the nervous system in formal propositional logic.

All or nothing: a neuron fires when the sum of its inputs exceeds a threshold. Networks of such units compute any logical function; feedback loops create memory.

The decisive limitation: no learning. Weights and thresholds were fixed, the network had to be designed by hand. Only Hebb (1949) and Rosenblatt's Perceptron (1957) brought learning rules.

The impact reached far beyond biology: von Neumann's computer architecture (EDVAC, 1945), Wiener's cybernetics and ultimately every artificial neural network rest on this work.

People:Warren S. McCulloch, Walter Pitts

Organizations:University of Illinois, College of Medicine, University of Chicago

1948Papers

Shannon's Information Theory: The Bit Is Born

In 1948, Claude Shannon at Bell Labs published A Mathematical Theory of Communication, founding information theory.

He introduced the bit as the unit of information and defined entropy — how much uncertainty a message resolves on average.

Central to AI: cross-entropy and KL divergence — straight from Shannon's theory — are today's standard training objectives in machine learning.

Anti-hype: Shannon described message transmission, not intelligence. Information theory is a foundation AI builds on — not an AI result. (The term bit was suggested by colleague John Tukey.)

People:Claude Shannon

Organizations:Bell Labs

1949Papers

Hebbian Learning: How the Brain Might Learn

In 1949, psychologist Donald Hebb published The Organization of Behavior, formulating how learning in the brain might work at the level of synapses.

Hebb's rule: when two connected nerve cells repeatedly fire together, their connection grows stronger.

The idea — that learning means adjusting connection strengths — became a founding principle of learning neural networks (e.g. Hopfield networks).

People:Donald Hebb

1950Papers

The Turing Test: The Imitation Game

Indistinguishability test: an evaluator attempts to tell a machine apart from a human via text conversation

Shifted focus from philosophical definitions to behavioral demonstrations of intelligence

Posed the fundamental question 'Can machines think?' and proposed an operational approach

Established the first AI benchmark and influenced all subsequent conversational AI developments

People:Alan Turing

Organizations:University of Manchester, Mind Journal

1956Breakthroughs

Logic Theorist: The First Reasoning Program

Often called “the first AI program” — more precisely: the first program built to model human reasoning on an open intellectual task (game-playing programs came earlier).

Heuristic search instead of brute force: working backward from the goal, estimating which steps (substitution, detachment, chaining) were worthwhile — inspired by Pólya's heuristics.

Proved 38 of the first 52 theorems of Chapter 2 of the “Principia Mathematica” — for one theorem even shorter than the original.

Written in the list-processing language IPL (chiefly by Shaw), which influenced McCarthy's LISP; the heuristic approach led directly to the General Problem Solver (1957).

People:Allen Newell, Herbert A. Simon, John Clifford Shaw

Organizations:RAND Corporation, Carnegie Institute of Technology

1956Conferences

Dartmouth Conference: The Birth of AI

The birth of AI as an independent research discipline through an 8-week workshop with leading thinkers

John McCarthy coined the term 'Artificial Intelligence,' defining a new field of research

Established the research program: machine language, abstraction, problem-solving, and self-improvement

Brought together the founding fathers of AI: McCarthy, Minsky, Shannon, Rochester, and future Nobel laureate Herbert Simon

People:John McCarthy, Marvin Minsky, Nathaniel Rochester, Claude Shannon

Organizations:Dartmouth College, IBM, Bell Labs

1957Papers

Perceptron: The First Learning Neural Network

First trainable artificial neuron with weighted inputs and a Heaviside step function

Binary classification via threshold decision, effective for linearly separable patterns

Frank Rosenblatt's perceptron learning rule corrected weights with every misclassification, enabling automatic learning

The limitation to linearly separable problems later led to the XOR critique by Minsky and Papert

People:Frank Rosenblatt

Organizations:Cornell Aeronautical Laboratory, US Navy

1958Breakthroughs

LISP: The Language of AI

John McCarthy designed LISP in 1958 at MIT for symbolic computation (lists instead of numbers) — for decades THE language of AI research (expert systems, NLP, planning).

Introduced ideas that are standard today: recursion, automatic garbage collection, functions as data, interactive evaluation (REPL).

Built on the list processing of IPL; Steve Russell implemented McCarthy's eval as the first interpreter and made LISP runnable.

Anti-hype: not the first high-level language (Fortran 1957 came earlier) — but the second-oldest still in use, and the most influential for AI.

People:John McCarthy, Steve Russell

Organizations:MIT

1959Breakthroughs

Arthur Samuel: Self-Learning AI & the Term “Machine Learning”

In the title of his 1959 paper Samuel used the term “machine learning” — the first documented use in its modern sense; he is conventionally credited as its originator.

The first publicly demonstrated self-learning program: it tuned the weights of its own evaluation function and memorised positions (rote learning).

By playing tens of thousands of games against itself it prefigured the self-play later perfected by AlphaZero — for Sutton, the earliest use of temporal-difference learning.

Anti-hype: the celebrated 1962 win was against an overrated opponent; against world-class players the program lost. Checkers was not fully solved until 2007 (Chinook).

People:Arthur Lee Samuel

Organizations:IBM

1965Milestones

DENDRAL: Pioneer of Expert Systems

DENDRAL inferred the structure of organic molecules from mass-spectrometry data — using the knowledge of human chemists rather than general search.

The lesson: knowledge is power. Instead of general problem-solvers, AI now bet on narrow, knowledge-rich domains — the start of expert systems.

People:Edward Feigenbaum, Joshua Lederberg, Bruce Buchanan

Organizations:Stanford University

1965Papers

Fuzzy Logic: The Logic of Imprecision

Lotfi Zadeh's 1965 paper 'Fuzzy Sets,' with over 100,000 citations, substantially changed how uncertainty is handled

Enabled mathematical modeling of vagueness, incompleteness, and contradictory information

Found application in expert systems, control systems, and approximate decision-making processes

Laid the groundwork for soft computing and modern AI approaches to handling imperfect information

People:Lotfi Zadeh

Organizations:UC Berkeley, Information and Control

1966Breakthroughs

ELIZA: The First Chatbot

The first computer program explicitly developed for human-machine conversation, completed in 1966

Used simple pattern-matching and substitution methodology — the program got by with remarkably little code

Created the illusion of understanding and emotional intelligence without genuine language comprehension

Made visible what would later be called the 'ELIZA effect' and cautioned against projecting human qualities onto rudimentary programs

People:Joseph Weizenbaum

Organizations:MIT, MIT AI Laboratory

1969Papers

Perceptrons: The Book That Helped Trigger the AI Winter

In 1969, Marvin Minsky and Seymour Papert published Perceptrons, analysing mathematically what single-layer perceptrons can — and cannot — do.

Their famous result: a single-layer perceptron cannot learn the simple XOR function, because it is not linearly separable.

The book is seen as a co-trigger of the first AI winter: funding for neural networks dried up for over a decade.

People:Marvin Minsky, Seymour Papert

Organizations:MIT

1969Breakthroughs

Shakey: The First Intelligent Mobile Robot

First mobile robot capable of reflecting on its own actions and independently planning complex tasks

Combined TV camera, sonar, processors, and sensors into an autonomous mobile system

Developed the STRIPS planning system for automatic task decomposition and route finding

United computer vision, navigation, and logical reasoning in a single physical system

People:Charles Rosen, Nils Nilsson, Bertram Raphael

Organizations:SRI International, DARPA

1970Milestones

SHRDLU: Understanding Language in the Blocks World

Around 1970, Terry Winograd at MIT built SHRDLU — a program that understood commands in plain English and manipulated a virtual blocks world.

SHRDLU could resolve ambiguous sentences, remember what had been said, answer questions, and even explain why it had done something.

It was seen as the impressive high point of symbolic AI — proof that machines can understand language remarkably well within a limited world.

Anti-hype: SHRDLU's understanding only worked in its tiny blocks world. It could not transfer to the real world — a cautionary tale about the limits of such microworlds.

People:Terry Winograd

Organizations:MIT

1970Papers

Hidden Markov Models Established

The Baum-Welch algorithm as a special case of Expectation-Maximization for HMM parameter estimation

First practical application in speech recognition from the mid-1970s at Carnegie Mellon and IBM

Transformed sequence modeling from template matching to statistical probabilistic approaches

Laid the mathematical foundation for modern probabilistic machine learning methods

People:Leonard Baum, Lloyd Welch, Ted Petrie

Organizations:Institute for Defense Analyses

1972Milestones

Prolog: Programming with Logic

In 1972, Alain Colmerauer and Philippe Roussel at the University of Marseille created the language Prolog — short for Programmation en Logique.

Prolog is declarative: you describe facts and rules, and the system derives the logical conclusions itself — instead of prescribing step by step how.

Prolog became the most important language of logical, symbolic AI — in expert systems, language processing, and Japan's Fifth Generation project.

People:Alain Colmerauer, Philippe Roussel, Robert Kowalski

Organizations:University of Aix-Marseille

1974Milestones

The First AI Winter

DARPA in the US and the British Science Research Council drastically cut funding for undirected AI research in the mid-1970s

Professor James Lighthill sharply criticized AI research in 1973 for failing to meet its goals and pointed to the problem of combinatorial explosion

DARPA canceled the 3-million-dollar contract with Carnegie Mellon for speech-understanding systems after disappointing results

AI programs of the early 1970s were limited to trivial versions of real problems and functioned like intelligent 'toys'

People:James Lighthill, J.C.R. Licklider, Hans Moravec

Organizations:DARPA, British Science Research Council, Carnegie Mellon University

1980Papers

Neocognitron: The Ancestor of CNNs

In 1980, Kunihiko Fukushima introduced the Neocognitron — a multilayered neural network for pattern recognition.

Its model was the visual cortex (Hubel and Wiesel): simple and complex cells that recognise features in stages, independent of their position.

The Neocognitron thus anticipated the core ideas of today's convolutional neural networks — local feature filters and hierarchical processing. LeCun's LeNet (1989) built on it.

People:Kunihiko Fukushima

Organizations:NHK Broadcasting Science Research Laboratories

1980Milestones

The Expert Systems Era of the 1980s

The AI industry grew from a few million dollars (1980) to billions (1988)

Two-thirds of Fortune 500 companies deployed expert systems in their day-to-day business

MYCIN's treatment recommendations achieved around 65% acceptance — comparable to human faculty experts

Classic pattern of an economic bubble: boom followed by a massive crash

People:Edward Feigenbaum, Bruce Buchanan, Edward Shortliffe

Organizations:Stanford University, Fortune 500 Companies

1982Papers

Hopfield Networks: Associative Memory

Content-addressable memory that reconstructs complete patterns from incomplete or noisy inputs

Recurrent architecture with symmetric bidirectional connections and emergent collective properties

Lyapunov energy function guides system to fixed-point attractors by 'rolling downhill' to stored memory

Reignited interest in neural networks and laid foundation for modern RNN development

People:John Hopfield

Organizations:California Institute of Technology, Bell Laboratories

1986Papers

Backpropagation Algorithm

Published in Nature on October 9, 1986 as 'Learning representations by back-propagating errors'

Made efficient training of multi-layer neural networks practically usable and widely known through gradient computation

Hidden layers learned to automatically recognize important features — a significant advance over perceptrons

Laid the mathematical foundation for all modern deep learning applications and Transformer architectures

People:David Rumelhart, Geoffrey Hinton, Ronald Williams

Organizations:University of California San Diego, Carnegie Mellon University, Nature

1987Milestones

The Second AI Winter

The market for specialized Lisp machines collapsed in 1987, as Apple and IBM computers became cheaper and more powerful

Expert systems such as XCON proved too maintenance-intensive, rigid, and unable to handle new data

Jack Schwartz cut AI funding at DARPA 'deeply and brutally' and described expert systems as 'clever programming'

The costs of AI-specific hardware far outweighed the promised business returns

People:Jacob T. Schwartz, Marvin Minsky, Roger Schank

Organizations:DARPA, IPTO, Symbolics, Lisp Machines Inc, XCON

1987Datasets

UCI ML Repository: The Dataset Library

Founded in 1987 as an FTP archive by David Aha and UCI students for empirical ML algorithm analysis

Became the primary source of ML datasets for students, educators, and researchers worldwide

Cited tens of thousands of times — one of the most widely used dataset resources in all of computer science

Democratized ML research by providing access to standardized, high-quality benchmark datasets

People:David Aha, Patrick Murphy

Organizations:University of California Irvine, UCI

1988Papers

Bayesian Networks: Reasoning Under Uncertainty

Judea Pearl (UCLA) established reasoning under uncertainty as a third pillar of AI — alongside symbolic AI and neural networks.

Bayesian networks: graphs of variables (nodes) and probability-based dependencies (edges) — replacing ad-hoc certainty factors with clean, efficient inference.

Shaped 1990s/2000s machine learning; Pearl received the 2011 Turing Award and later founded modern causal inference.

Anti-hype: Bayes' theorem dates to the 18th century; Pearl's achievement was making probabilistic reasoning structured and computable for AI — not inventing probability.

People:Judea Pearl

Organizations:UCLA

1989Papers

Universal Approximation Theorem

Rigorous mathematical proof of the universal approximation capabilities of neural networks

A single hidden layer with sufficiently many neurons can approximate any Borel-measurable function to arbitrary precision (Cybenko's parallel work showed this for continuous functions)

Proves the ability to model complex, non-linear relationships in real-world data

Provided mathematical justification for the use of neural networks and a theoretical basis for confidence

People:Kurt Hornik, Maxwell Stinchcombe, Halbert White

Organizations:University of California San Diego

1989Breakthroughs

World Wide Web: The Invention of the WWW

Hypertext project with linked documents, browsers, and 'hot spots' — building on earlier hypertext ideas (Ted Nelson, Vannevar Bush's Memex), but deliberately simpler than Nelson's Xanadu

Information Management Proposal submitted on March 12, 1989 at CERN for automated scientific information exchange

HTML, HTTP, and URI/URL developed as fundamental web technologies by the end of 1990

Created the data infrastructure for later Common Crawl collections and large language model training

People:Tim Berners-Lee

Organizations:CERN

1989Papers

LeNet and the Birth of CNNs

First successful combination of Convolutional Neural Networks with backpropagation training

Recognized handwritten ZIP codes for the US Postal Service: about 5% error on test data, roughly 1% when uncertain cases were allowed to be rejected

Yann LeCun's pioneering work at Bell Labs established CNNs as a practical computer vision solution

Laid the foundation for all modern CNN architectures from AlexNet to current vision systems

People:Yann LeCun, Bernhard Boser, John Denker

Organizations:AT&T Bell Labs, NIPS

1992Breakthroughs

TD-Gammon: Learning by Playing Against Itself

In 1992, Gerald Tesauro at IBM introduced TD-Gammon — a neural network that learned to play backgammon.

It learned almost entirely by playing against itself, using the reinforcement-learning method temporal difference — with no human games as a template.

TD-Gammon reached near world-class level and discovered new opening moves that professionals adopted — a forerunner of AlphaGo, almost 25 years earlier.

Anti-hype: for a long time the success could not be transferred to other games. Backgammon's dice automatically create variety in practice — a self-play advantage that chess or Go lack.

People:Gerald Tesauro

Organizations:IBM

1992Papers

Q-Learning: Foundation of Reinforcement Learning

1992 mathematical convergence proof: Q-learning is guaranteed to find optimal policies given infinite exploration

Innovative model-free approach: learning optimal actions without an environment model or transition probabilities

Elegant solution for Markov decision problems through stepwise Q-function optimization

Cornerstone of modern reinforcement learning — still at the core of Deep Q-Networks and countless AI systems today

People:Chris Watkins, Peter Dayan

Organizations:King's College Cambridge, University College London

1993Datasets

Penn Treebank: Syntactic Annotation Transforms NLP

4.5+ million words with part-of-speech tagging, around 3 million with detailed syntactic annotation — produced through a two-stage semi-automatic process

Established empirical methods in computational linguistics and became the standard benchmark for parsing research

Significantly shifted parsing algorithms from rule-based to statistical approaches

Laid the groundwork for statistical parsing and continues to serve modern NLP systems as an evaluation benchmark

People:Mitchell Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz

Organizations:University of Pennsylvania, Linguistic Data Consortium

1995Papers

AdaBoost: Weak Learners Become Strong

Adaptive weighting: difficult cases are weighted more heavily for focused learning on problem areas

Weak Learners principle: hundreds of simple classifiers together produce highly precise predictions

Godel Prize 2003: one of the most prestigious awards in theoretical computer science, for founding boosting theory

Foundation of modern ensemble methods: inspired XGBoost and an entire generation of boosting algorithms

People:Yoav Freund, Robert Schapire

Organizations:AT&T Bell Laboratories

1995Papers

Support Vector Machines: Maximum Margin Classification

Vapnik and Chervonenkis' maximum margin approach from 1964 extended into a practical solution for non-separable data

The kernel trick enables non-linear classification through implicit high-dimensional transformations

The maximum margin principle maximizes the distance between classes for optimal generalization

Established a theoretically grounded alternative to neural networks with generalization guarantees

People:Vladimir Vapnik, Corinna Cortes

Organizations:AT&T Bell Labs

1995Datasets

WordNet: The Semantic Network of Language

First lexical dictionary built as a semantic network of synsets and meaning relations, with programmatic access

Synsets linked by semantic and lexical relations form a navigable meaning network

Mirrors human semantic memory and connects cognitive science with computational linguistics

Laid the groundwork for ImageNet hierarchies, knowledge graphs, and modern semantic NLP systems

People:George Miller, Christiane Fellbaum

Organizations:Princeton University, Cognitive Science Laboratory

1996Papers

PageRank: Google's Billion-Dollar Algorithm

Stanford project 'BackRub' analyzed backlink data to determine web importance — the foundation for Google

Innovative link analysis: page importance determined by inbound links rather than keyword frequency alone

Random Surfer Model: a page's importance grows with how frequently the random surfer reaches it via the link structure

Stanford research became Google Inc. — PageRank as the foundation of the world's most valuable search engine

People:Larry Page, Sergey Brin, Rajeev Motwani, Terry Winograd

Organizations:Stanford University, Google Inc.

1997Competitions

Deep Blue Defeats Kasparov

First victory of a computer over a reigning world chess champion in a match under standard tournament conditions (Deep Blue had already won a single game in 1996)

200 million positions per second, improved endgame databases, and grandmaster consultation

IBM's technical triumph after years of development from ChipTest in 1985, through Deep Thought, to Deep Blue

A turning point in public perception of AI and proof of machine superiority in complex strategic thinking

People:Garry Kasparov, Murray Campbell, Joe Hoane, Feng-hsiung Hsu

Organizations:IBM, World Chess Championship

1997Papers

LSTM: Long Short-Term Memory

Solved vanishing gradient problem through constant error flow over thousands of time steps

Special memory cells with constant error carousels for long-term information storage

Multiplicative gate units learn to open and close access to constant error flow

Enabled effective long-term sequence modeling for speech recognition and time series analysis

People:Sepp Hochreiter, Jürgen Schmidhuber

Organizations:Technical University of Munich, IDSIA

1998Datasets

MNIST: The Machine Learning Standard

70,000 handwritten digits as 28x28-pixel normalized grayscale images

Curated by Yann LeCun, Corinna Cortes, and Christopher Burges from NIST databases

Became the 'Hello World' of machine learning and the standard benchmark for ML algorithms

Democratized ML education through easy access without time-consuming data preparation

People:Yann LeCun, Corinna Cortes, Christopher Burges

Organizations:AT&T Labs, Courant Institute

2001Papers

Random Forest: Breakthrough of Ensemble Methods

Ensemble breakthrough: hundreds of random decision trees vote together for better predictions

Bagging + feature randomization: each tree sees different data and features for diversity

Theoretical grounding: generalization error bounds based on tree strength and correlation

Plug-and-play ML algorithm: minimal tuning with exceptional performance across all domains

People:Leo Breiman, Adele Cutler

Organizations:UC Berkeley Statistics Department, Machine Learning Journal

2005Organizations

Future of Humanity Institute Founded

Founded at Oxford University in 2005, grew from 3 to around 40 researchers before closing in 2024

Pioneered existential risks, longtermism, and AI governance as new research fields

Established AI alignment and AI safety as legitimate academic disciplines with global impact

Lent AI safety research scientific credibility and respect through its Oxford affiliation

People:Nick Bostrom, Anders Sandberg

Organizations:Oxford University, Future of Humanity Institute

2005Competitions

DARPA Grand Challenge: The Birth of Autonomous Driving

Stanford's 'Stanley' became the first autonomous vehicle to complete a 212 km desert course in under 7 hours

Breakthrough from zero successful vehicles (2004) to five finishers (2005), four within the time limit, through better AI

Recognized as a software race: LiDAR, machine learning, and human driving data as the key

The birth of modern self-driving technology — inspired Tesla, Google, and an entire industry

People:Sebastian Thrun, Mike Montemerlo, Stanley Thrun Team

Organizations:DARPA, Stanford University, Stanford AI Lab

2006Papers

Deep Belief Networks: The Renaissance of Deep Learning

A greedy layer-by-layer learning algorithm enabled efficient training of deep neural networks for the first time

Stacking Restricted Boltzmann Machines (RBMs) as building blocks for complex representations

Unsupervised pre-training solved the weight initialization problem of deep networks

Ended the obscurity of neural networks and established the modern deep learning revolution from 2006 onward

People:Geoffrey Hinton, Simon Osindero, Yee-Whye Teh

Organizations:University of Toronto, Neural Computation

2006Competitions

Netflix Prize: The Million-Dollar Algorithm

,000,000 prize for a 10% improvement of the Cinematch algorithm over a 3-year competition

100+ million ratings from 480k users for 17,770 movies as a public ML dataset

Considerably advanced collaborative filtering through matrix factorization and Restricted Boltzmann Machines

40,000+ teams from 186 countries, over 5,000 on the qualifying leaderboard with around 44,000 submissions — crowdsourcing power for ML

People:Reed Hastings, Netflix Team, BellKor Pragmatic Chaos Team

Organizations:Netflix, BellKor, AT&T Research

2007Datasets

Common Crawl Foundation Established

Founded in 2007 with the mission of archiving the entire public internet and making it freely available

Has grown by billions of pages each month since crawling began in 2008 — now (as of 2024) over 100 billion web pages and several petabytes of data

Became the most important training source for GPT-3, ChatGPT, LLaMA, and other modern large language models

Non-profit approach democratized access to comprehensive language data for AI research worldwide

People:Gil Elbaz, Common Crawl Team

Organizations:Common Crawl Foundation, Internet Archive, Alexa Internet

2007Milestones

CUDA: The Graphics Card Becomes the AI Engine

GPUs compute thousands of operations in parallel. That fits neural networks exactly, whose core is matrix multiplication.

It became the engine of deep learning: AlexNet (2012) trained on two GTX 580 cards using CUDA; from cuDNN (2014) on, virtually every major framework runs on it.

Anti-hype: GPGPU existed before CUDA (shaders 2001, BrookGPU 2004); CUDA did not cause deep learning alone — it made the compute accessible (necessary, not sufficient).

People:Ian Buck, John Nickolls

Organizations:NVIDIA

2008Papers

Zero-Shot Learning: Learning Without Data

Classification of classes without training data — using only semantic descriptions of the target classes

Reuse of trained models for entirely new tasks through semantic embeddings

Semantic representations enable generalization to unseen concepts

Laid the foundation for few-shot and zero-shot capabilities in modern large language models

People:Hugo Larochelle, Dumitru Erhan, Yoshua Bengio

Organizations:University of Montreal

2009Datasets

CIFAR datasets established

CIFAR-10 with 60,000 images in 10 categories, CIFAR-100 with 100 more detailed classes as computer vision benchmarks

Became one of the most important standardized benchmarks for computer vision algorithms worldwide

Enabled systematic evaluation and comparison of different machine learning approaches

Krizhevsky used CIFAR-10 before 2011 for CNN training – precursor to his AlexNet success in 2012

People:Alex Krizhevsky, Vinod Nair, Geoffrey Hinton

Organizations:University of Toronto, Canadian Institute for Advanced Research, CIFAR

2009Datasets

ImageNet: The Dataset That Changed Everything

At launch in 2009, around 3.2 million images; at full scale, over 14 million hand-annotated images in around 22,000 categories by around 49,000 workers from 167 countries

Based on WordNet hierarchies for structured categorization of visual objects

Provided the critical training data for AlexNet's 2012 breakthrough and the development of Deep Learning

Transformed computer vision research and enabled autonomous vehicles, facial recognition, and medical imaging

People:Fei-Fei Li, Jia Deng, Wei Dong, Richard Socher

Organizations:Stanford University, Princeton University

2010Milestones

DeepMind is founded

Founded in September 2010 in London as DeepMind Technologies

Demis Hassabis (neuroscientist, game developer), Shane Legg, and Mustafa Suleyman

Acquired by Google in 2014 for an estimated $500 million

Later responsible for AlphaGo, AlphaFold, and other groundbreaking AI systems

People:Demis Hassabis, Shane Legg, Mustafa Suleyman

Organizations:DeepMind, Google

2010Competitions

ImageNet Challenge: The Competition Begins

First ILSVRC in 2010 with 1,000 categories and 1.2 million training images — far beyond PASCAL VOC

Established Top-1 and Top-5 error rates as standard metrics for computer vision evaluation

Annual competition since 2010 attracted over 50 institutions worldwide and drove research advances

Created the competition structure that enabled AlexNet's breakthrough in 2012: a Top-5 error rate of just 15.3% (approximately 84.7% accuracy)

People:Fei-Fei Li, Olga Russakovsky, Alexander Berg

Organizations:Stanford University, ImageNet Team

2011Competitions

Watson Defeats Jeopardy! Champions

Defeated Jeopardy! legends Ken Jennings and Brad Rutter in a televised challenge

First TV demonstration of advanced natural language processing capabilities for millions of viewers

DeepQA system combined knowledge retrieval with complex reasoning without an internet connection

Ken Jennings' 'computer overlords' comment underscored the cultural significance of AI progress

People:David Ferrucci, Ken Jennings, Brad Rutter

Organizations:IBM Research, Jeopardy!, Sony Pictures Television

2011Products

Siri Launch: Voice Assistant Goes Mainstream

First mass-market voice assistant deeply integrated into a smartphone, reaching millions of users worldwide

Advanced natural language processing enabled intuitive human-computer communication

One of Steve Jobs' last major products before his death on October 5, 2011

Founded the modern era of voice assistants and inspired all competitors

People:Steve Jobs, Susan Bennett, Tom Gruber, Adam Cheyer

Organizations:Apple, SRI International, DARPA

2012Papers

Dropout Regularization

Solves the central overfitting problem of deep neural networks

Randomly disabling half of all neurons during training

One of the building blocks of AlexNet's ImageNet breakthrough — alongside GPU training, ReLU, and network depth

Became the standard in most modern deep learning architectures

People:Geoffrey Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov

Organizations:University of Toronto

2012Breakthroughs

AlexNet Success

AlexNet won the ImageNet 2012 Challenge with a 15.3% error rate — 10.9 percentage points better than the second-place participant (26.2%)

60 million parameters, ReLU activations, dropout layers, and GPU training established new technical standards

Demonstrated the practical superiority of deep learning impressively and ended skepticism toward neural networks

Launched modern AI development and made CNN architectures the standard in computer vision

People:Alex Krizhevsky, Geoffrey Hinton, Ilya Sutskever

Organizations:University of Toronto, ImageNet Challenge, NIPS

2012Breakthroughs

The Deep Learning Revolution

Deep learning established itself as the dominant AI technology and ended the dominance of traditional machine learning approaches

AlexNet's ImageNet victory demonstrated for the first time the practical superiority of deep neural networks

GPU computing enabled the training of large neural networks and fundamentally changed AI research methods

Triggered massive investment in deep learning research and the industrial adoption of neural architectures

People:Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Alex Krizhevsky, Ilya Sutskever

Organizations:University of Toronto, NYU, University of Montreal

2013Papers

Word2Vec: Words as Vectors

First efficient dense, low-dimensional vector representations of words with semantic relationships

Semantic and syntactic patterns through vector arithmetic: king - man + woman = queen

Enabled analogical reasoning in vector spaces through cosine similarity and distance metrics

Laid the foundation for modern embedding techniques and transformer-based large language models

People:Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean

Organizations:Google, Google Research

2013Papers

VAE: Variational Autoencoders

Variational inference for efficient approximation of intractable posterior distributions in continuous latent variables

Probabilistic latent space enables continuous interpolation and generation of new data points

Pioneering combination of autoencoder architecture with scalable probabilistic generative modeling through amortized variational inference

Encoder-decoder architecture with Reparameterization Trick for differentiable randomness

People:Diederik P. Kingma, Max Welling

Organizations:University of Amsterdam

2014Papers

Adam: Deep Learning's Default Optimizer

In 2014, Diederik Kingma and Jimmy Ba introduced the Adam optimizer — the name is derived from Adaptive Moment Estimation (not an acronym).

Adam adjusts the learning rate for each parameter automatically, combining two ideas: momentum and adaptive step sizes (as in RMSProp).

Adam became the standard tool for training neural networks — robust and without tedious learning-rate tuning. The paper is among the most cited in machine learning.

Anti-hype: Adam is no miracle cure — in some cases plain SGD generalizes better. It builds on predecessors (AdaGrad, RMSProp); later variants like AdamW (2017) fixed weaknesses.

People:Diederik Kingma, Jimmy Ba

2014Datasets

MS COCO: The Computer Vision Gold Standard

Objects in natural context rather than isolated — considerably shifted computer vision from artificial to real-world scenes

2.5 million pixel-precise annotations across 328k images — unprecedented annotation quality and depth

Gold standard with mAP metrics for objective model comparisons — defined computer vision evaluation

Foundation for YOLO, Mask R-CNN, and all modern CV systems — from autonomous vehicles to AR

People:Tsung-Yi Lin, Michael Maire, Serge Belongie

Organizations:Microsoft Research, Cornell University, UC Berkeley

2014Papers

GANs - Generative Adversarial Networks

Two neural networks in a minimax game: generator vs. discriminator

Invented in a single night in 2014 in Montreal after a bar visit - it worked immediately

A mathematically elegant framework for adversarial optimization

Fundamentally transformed generative AI - paving the way for the photorealistic image generation that followed

People:Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

Organizations:University of Montreal, NIPS Conference

2014Papers

Attention Mechanism: The Key to Modern LLMs

Solved the encoder-decoder bottleneck: variable sentence lengths instead of fixed vector compression

Dynamic attention instead of static encoding: adaptive focus on relevant parts of the input

Learns alignment between languages: which words correspond to each other during translation?

Conceptual precursor to the Transformer: Bahdanau's attention idea paved the way for GPT, BERT, and ChatGPT

People:Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Organizations:University of Montreal, Jacobs University Bremen

2014Products

Amazon Alexa and Echo Launch

Established the mass-market smart speaker category with always-on voice readiness

Made voice AI accessible to millions of consumers through the public sale starting in 2015 — not just tech enthusiasts

Transformed living rooms into voice-controlled smart home hubs

Marked the beginning of a far-reaching market development — Google, Apple, and others followed

People:Jeff Bezos, Amazon Alexa Team

Organizations:Amazon, Ivona (acquired 2013)

2015Breakthroughs

Deep Q-Networks: AI Learns Atari from Pixels

Learning from raw pixels: the system saw only the screen and the score — no hand-crafted features, no per-game knowledge.

Convolutional network + Q-learning + an experience-replay memory (Lin, early 1990s) + a target network added in 2015 that stabilised training.

Anti-hype: human level on about half of the 49 games (43/49 better than prior methods) — near zero on sparse-reward games (Montezuma's Revenge).

The launch of deep reinforcement learning; it made DeepMind famous before AlphaGo — the bridge from Q-learning to AlphaGo and AlphaZero.

People:Volodymyr Mnih, David Silver, Demis Hassabis

Organizations:Google DeepMind

2015Papers

Batch Normalization: A Key Advance in Neural Network Training

Solved the Internal Covariate Shift problem by normalizing activations in every mini-batch

Roughly 14 times fewer training steps to reach the same accuracy — enabling higher learning rates and robust initialization

Dual benefit: acceleration AND regularization — often a dropout replacement in modern architectures

4.8% ImageNet top-5 error with ensemble — surpassed human raters (approximately 5.1%) and set a new standard

People:Sergey Ioffe, Christian Szegedy

Organizations:Google Inc., ICML Conference

2015Papers

YOLO: You Only Look Once

45 fps base performance, Fast YOLO 155 fps – hundreds to thousands of times faster than existing detectors

Single-pass architecture formulates object detection as regression problem instead of two-stage paradigm

Grid-based cell division with direct bounding box and class probability prediction

Enabled real-time computer vision for autonomous vehicles, surveillance, and mobile applications

People:Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

Organizations:University of Washington, Allen Institute, Facebook AI Research

2015Breakthroughs

DeepMind AlphaGo Development

First computer victory against a professional Go player on a full board without a handicap (Fan Hui 5 to 0)

Novel approach using deep neural networks instead of hard-coded algorithms

Mastering 10^170 possible board configurations — more than atoms in the universe

The breakthrough came a decade earlier than predicted by AI experts

People:Demis Hassabis, David Silver, DeepMind Team

Organizations:DeepMind, Google

2015Products

Tesla Autopilot: Driver Assistance for the Mass Market

Software update of October 14, 2015 activated pre-installed hardware — a new concept for the automotive industry

Mobileye-based sensor suite: front-facing camera, radar, and 12 ultrasonic sensors for Level 2 driver assistance

Adaptive cruise control, lane-keeping assist, and automatic parking — previously premium-only features

Hundreds of millions of kilometers in the first year alone — demonstrating mass-market readiness for driver assistance systems

People:Elon Musk, Tesla Engineering Team

Organizations:Tesla Inc., Mobileye

2015Products

TensorFlow: Google's ML framework goes open source

Apache 2.0 license made Google's powerful internal ML system freely available to everyone

Replaced DistBelief with double speed and improved scalability

Flexible Python interface and auto-differentiation significantly improved ML development

Enabled millions of developers access to advanced AI technology

People:Martín Abadi, Ashish Agarwal, Paul Barham, Jeff Dean

Organizations:Google, Google Brain

2015Papers

ResNet: Residual Networks Transform Deep Learning

Skip connections pass inputs directly forward, enabling the training of ultra-deep networks

152 layers — 8x deeper than VGG but less complex through the residual learning framework

3.57% top-5 error rate (ensemble) on ImageNet, won all 2015 ILSVRC and COCO categories

Established residual connections as the standard for modern deep learning architectures

People:Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Organizations:Microsoft Research

2015Milestones

OpenAI Is Founded

Founded on December 11, 2015 in San Francisco

Mission: Develop safe artificial general intelligence that benefits all of humanity

Pledged: $1 billion from Elon Musk, Peter Thiel, Reid Hoffman, and others — a multi-year funding commitment, not immediately available

GPT-1 (2018) and GPT-2 (2019) were created during the purely nonprofit phase; the capped-profit structure followed in 2019, under which GPT-3 (2020) and ChatGPT (2022) were developed

People:Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba, John Schulman

Organizations:OpenAI, Y Combinator

2016Competitions

AlphaGo defeats Lee Sedol

AlphaGo defeated Lee Sedol 4:1 and demonstrated AI superiority in the most complex board game for the first time

The famous 'Move 37' with 1:10,000 probability showed machine creativity and challenged Go traditions

Combination of deep learning and Monte Carlo tree search enabled mastering Go's complexity

Over 200 million people followed the matches – a turning point for public AI perception

People:Lee Sedol, Demis Hassabis, David Silver, Aja Huang

Organizations:DeepMind, Google, Korean Baduk Association

2016Papers

XGBoost: Extreme gradient boosting dominates ML

Extreme optimization of gradient boosting with L1/L2 regularization and second-order gradients

Dominated ML competitions of the 2010s and became standard choice for Kaggle winner teams

Parallelized tree construction and scalable end-to-end architecture for large datasets

Go-to algorithm for structured data parallel to the deep learning revolution

People:Tianqi Chen, Carlos Guestrin

Organizations:University of Washington

2016Products

Google Assistant: AI-First Strategy Becomes Reality

Natural conversation instead of commands - 'ongoing dialogue' as the goal for voice AI

Foundation of Pichai's AI-First strategy - 'an individual Google' for every user

Ambient experience vision - seamless AI interaction across all devices and platforms

Google's comeback against Siri and Alexa - from latecomer to serious contender in voice AI

People:Sundar Pichai, Google Assistant Team

Organizations:Google Inc., Google I/O Conference

2016Organizations

Partnership on AI: Tech Giants Unite

Significant alliance of Amazon, Facebook, Google, DeepMind, IBM, and Microsoft for AI ethics

Mission: AI for the benefit of people and society through ethics, fairness, and transparency

Planned parity board: initially all-corporate, later to be expanded with an equal number of non-corporate members

Focus on research collaboration and best practices without lobbying activities

People:Mustafa Suleyman, Eric Horvitz, Partnership Team

Organizations:Amazon, Apple, Facebook, Google, IBM, Microsoft

2016Breakthroughs

Speech Recognition Reaches Human-Level Performance

5.9% word error rate reaches human level on Switchboard: as accurate as professional transcriptionists

Historic milestone: lowest error rate ever measured on the Switchboard standard

CNN + LSTM + neural language models: systematic combination of state-of-the-art deep learning technology

25-year research goal achieved: human parity on a narrowly defined transcription task

People:Xuedong Huang, Microsoft AI Research Team

Organizations:Microsoft AI and Research, Switchboard Corpus

2017Regulation

Asilomar Principles: The Field Sets Its Own Guardrails

January 2017: the Future of Life Institute gathered leading AI researchers at Asilomar (California) — at the site of the historic 1975 genetic-engineering conference.

Result: the 23 Asilomar AI Principles on research, values (safety, transparency) and long-term risks — one of the field's first broad self-commitments.

Over a thousand AI researchers and other signatories (incl. Stephen Hawking, Elon Musk) — an early consensus that AI should serve the common good.

Anti-hype: the principles were voluntary and non-binding — pioneering as a framework for discussion, but with no enforcement.

People:Stephen Hawking, Elon Musk

Organizations:Future of Life Institute

2017Papers

MobileNet - AI for Smartphones

One of the early deep learning models, designed specifically for smartphones and IoT devices

Depthwise separable convolutions: around nine times less computational cost at the same effectiveness

Enables AI processing directly on devices instead of in the cloud — edge computing

Reduces computational cost to approximately one-eighth of conventional convolutions at comparable accuracy

People:Andrew Howard, Menglong Zhu, Bo Chen, Google Research Team

Organizations:Google, Google Research

2017Papers

RLHF research paper published

Paper 'Deep Reinforcement Learning from Human Preferences' published in June 2017

Core idea: AI learns from human preferences instead of predefined rewards

Joint research by OpenAI and DeepMind, including Paul Christiano and Dario Amodei

RLHF became the key technology for ChatGPT and modern AI assistants

People:Paul Christiano, Jan Leike, Dario Amodei, Tom Brown

Organizations:OpenAI, DeepMind

2017Papers

Transformer: 'Attention Is All You Need'

Self-attention mechanism captures dependencies between all sequence positions simultaneously

Eliminating recurrence enables parallel processing — significantly faster than sequential models

28.4 BLEU WMT English-German, 41.8 BLEU English-French — new translation benchmarks

Became the foundation of all modern LLMs: GPT, BERT, and ChatGPT are all built on the Transformer architecture

People:Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin

Organizations:Google Brain, Google Research

2017Regulation

China's AI Master Plan: The Race for Global Leadership

First national AI strategy of this scale: coordinated government planning for global technology leadership

Three-step timeline: competitive by 2020, world-leading in selected areas by 2025, leading AI superpower by 2030

Trillion-yuan investment: massive state funding in AI research, infrastructure, and talent

Global leadership ambition: the starting shot for the worldwide AI race between China, the US, and Europe

People:State Council of China, Chinese AI Research Community

Organizations:State Council of China, Chinese Academy of Sciences

2017Regulation

Montreal Declaration for Responsible AI

10 ethical principles and 59 recommendations for responsible AI development with democratic legitimacy

Focus on well-being, autonomy, justice, privacy, democracy, and ecological sustainability

Initiated by the Universite de Montreal with over 400 participants from various sectors

Over 500 signatories; influenced international AI governance and subsequent regulatory initiatives

People:Yoshua Bengio, Montreal AI Ethics Team

Organizations:Université de Montréal, Montreal Institute for Learning Algorithms

2017Breakthroughs

AlphaZero Masters Three Games

Learned three complex games entirely from scratch — with only the rules of the game, without human prior knowledge or databases

Achieved superhuman performance in chess (4h), shogi (2h), and Go (~8h) through pure self-play

Learned through millions of self-play games and reinforcement learning without external inputs

Evaluated only 60,000 positions per second vs. Stockfish's 60 million — but far more purposefully

People:David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou

Organizations:DeepMind, Google, Science Magazine, ArXiv

2018Milestones

Turing Award for Deep Learning

Yoshua Bengio, Geoffrey Hinton and Yann LeCun — the three godfathers of deep learning — for the breakthroughs behind modern neural networks.

The A.M. Turing Award (announced March 2019) is computing's highest honour; it recognised deep neural networks as a critical component of computing.

The official accolade for the 2012 deep-learning revolution — and a forerunner of the 2024 Physics Nobel for the same line of research.

Anti-hype: deep learning has many contributors (e.g. Schmidhuber, who publicly objected); the prize honours the trio's central role, not sole authorship.

People:Yoshua Bengio, Geoffrey Hinton, Yann LeCun

Organizations:ACM

2018Papers

GPT-1: The Birth of Generative Pre-Training

Established unsupervised pre-training on large text corpora as the foundation for language models

Demonstrated the successful application of transfer learning across a wide range of NLP tasks

Twelve-layer, decoder-only Transformer architecture became the template for the entire GPT series

Launched the era of large language models and established the pre-training/fine-tuning paradigm

People:Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever

Organizations:OpenAI

2018Papers

BERT Significantly Improves Language Understanding

First deep bidirectional language model to consider left and right context simultaneously across all layers

Achieved new state-of-the-art results on 11 NLP tasks and improved the GLUE score by 7.7 percentage points to 80.5%

Open-source release enabled fine-tuning of the pre-trained model for custom tasks in about 30 minutes on a single cloud TPU

Established the pre-training/fine-tuning paradigm for all modern language models

People:Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Organizations:Google Research, Google AI Language

2019Papers

GPT-2 - "Too Dangerous to Release"

Unprecedented decision: OpenAI withholds complete 1.5B-parameter model

Fears of fake news, identity impersonation, and automated social media spam

AI community split: ethics progress vs. accusation of research closure

Full release after 9 months due to lack of misuse evidence

People:Alec Radford, Jeffrey Wu, Rewon Child, David Luan

Organizations:OpenAI

2019Competitions

AlphaStar Reaches Grandmaster Level

AlphaStar reached Grandmaster level in all three StarCraft II races and ranked above 99.8% of all Battle.net players

Defeated professional players MaNa and TLO 5:0 each before the public success

Multi-agent reinforcement learning with league-based training of various strategies and counter-strategies

The first AI to master a popular esports game at the highest level without restrictions

People:Oriol Vinyals, Igor Babuschkin, Wojciech Czarnecki, Grzegorz Komincz, Dario Wünsch

Organizations:DeepMind, Team Liquid, Blizzard Entertainment, Battle.net

2019Papers

T5 - Text-to-Text Transfer Transformer

Innovative unified approach: All NLP tasks as text-to-text problems

"Everything is Text" - paradigm unifies translation, summarization, Q&A

Establishes foundation model paradigm for modern large language models

Introduces comprehensive C4 dataset - Colossal Clean Crawled Corpus

People:Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee

Organizations:Google AI, Google Research

2020Papers

RAG: When Language Models Look Things Up First

In 2020, Patrick Lewis and colleagues at Facebook AI introduced Retrieval-Augmented Generation (RAG).

Instead of answering only from memory, the language model first searches for relevant documents (e.g. from Wikipedia) and bases its answer on them.

After ChatGPT, RAG became the standard method for tying language models to current, checkable sources — the basis of almost every chat-with-your-documents application.

People:Patrick Lewis

Organizations:Facebook AI Research, University College London, New York University

2020Papers

Neural Scaling Laws

Discovery of fundamental power laws spanning seven orders of magnitude

Elegant equations enable systematic predictions of resource allocation; refined in 2022 by Chinchilla

Established the 'bigger is better' paradigm for systematic LLM development

Transforms AI development from trial and error to a scientific methodology

People:Jared Kaplan, Sam McCandlish, Tom Brown, Dario Amodei

Organizations:OpenAI, Johns Hopkins University

2020Papers

GPT-3: The 175-Billion-Parameter Model

175 billion parameters - more than 100 times larger than GPT-2, with notable scaling effects

Emergent few-shot capabilities without fine-tuning: new tasks solvable with just a few examples

Showed emergent capabilities: translation, arithmetic, and text generation at human level

Laid the foundation for ChatGPT and commercialized large language models through API access

People:Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah

Organizations:OpenAI

2020Papers

DDPM: Diffusion models established

New class of generative models based on non-equilibrium thermodynamics and denoising processes

Progressive lossy decompression approach as generalization of autoregressive decoding

Laid mathematical foundation for Stable Diffusion and modern text-to-image generation

FID score 3.17 on CIFAR-10 demonstrated image quality rivaling GANs and established diffusion as standard

People:Jonathan Ho, Ajay Jain, Pieter Abbeel

Organizations:UC Berkeley

2020Papers

Vision Transformer: 'An Image is Worth 16x16 Words'

First scalable, patch-based application of pure transformer architecture to computer vision without CNN components

Image patches (typically 16x16 pixels) treated as token sequences, transforming the image-to-sequence pipeline

Self-attention for image processing proved the universality of transformer architecture

Achieved state-of-the-art CNN performance after large-scale pre-training and inspired attention-based vision models

People:Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov

Organizations:Google Research, Google Brain

2020Breakthroughs

AlphaFold Achievement

AlphaFold 2 dominated CASP14 with a 92.4 GDT score, clearly beating 145 other teams

Solved the 50-year-old protein folding problem and fundamentally changed structural biology

Attention-based architecture achieved experimental accuracy in protein structure prediction

Demis Hassabis and John Jumper received the 2024 Nobel Prize in Chemistry for this achievement

People:Demis Hassabis, John Jumper

Organizations:DeepMind, Google, CASP, University of Washington

2021Breakthroughs

CLIP: The Bridge Between Image and Language

Contrastive training: two encoders (image + text) learn from about 400M web pairs to place matching images and texts at the same point in one vector space.

Zero-shot: categories are described in words, no task-specific training — 76.2% on ImageNet, on par with a ResNet-50 that needed 1.28M labelled images.

Foundation of the text-to-image wave: DALL-E 2 builds on CLIP embeddings; Stable Diffusion uses CLIP's text encoder directly.

Anti-hype: contrastive image-text models already existed (ConVIRT, Oct 2020). CLIP's contribution: scale, zero-shot breadth, open weights — but it also inherited the biases of web data.

People:Alec Radford, Jong Wook Kim, Ilya Sutskever

Organizations:OpenAI

2021Products

DALL-E Creates Images from Text

Raised text-to-image generation to a new level — coherent, creative images from natural-language descriptions (predecessors like alignDRAW or StackGAN already existed)

Developed noteworthy creative capabilities: anthropomorphization, concept combination, text rendering

12-billion-parameter version of GPT-3, trained on 250 million image-text pairs from the internet

Opened a new dimension of AI creativity and inspired the generative AI movement

People:Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray

Organizations:OpenAI, DALL-E Team

2021Milestones

Anthropic Is Founded

Founded in January 2021 in San Francisco

Dario Amodei (CEO, former VP Research at OpenAI) and Daniela Amodei (President) — part of a seven-person founding team

Focus on AI safety, interpretability, and Constitutional AI

Developed Claude, one of the leading AI assistants

People:Dario Amodei, Daniela Amodei, Tom Brown, Jared Kaplan, Sam McCandlish, Jack Clark, Chris Olah

Organizations:Anthropic, OpenAI

2021Products

GitHub Copilot: The AI pair programmer

Technical preview on June 29, 2021 with limited access via waitlist for selected developers

28.8% success rate on first attempt (HumanEval), 70.2% with 100 sampling attempts

Established AI-assisted programming as viable tool and inspired new coding tools

People:Nat Friedman, GitHub Team, OpenAI Team

Organizations:GitHub, OpenAI, Microsoft

2021Products

OpenAI Codex: AI Programs for People

Natural language to code: 'Write a sorting function' becomes functional Python/JavaScript

GitHub Copilot (Technical Preview from June 29, 2021): prominent AI programming assistant, trained on 54 million code repositories

12+ programming languages: from Python to Swift — AI understands developer intent in natural language

Meaningful productivity gains: Codex demonstrated AI's potential for creative cognitive work

People:OpenAI Team, GitHub Development Team

Organizations:OpenAI, GitHub, Microsoft

2022Papers

InstructGPT: The Bridge to ChatGPT

OpenAI applied RLHF (reinforcement learning from human feedback) to GPT-3 so it follows instructions and matches users' intent.

Striking: a 1.3B-parameter InstructGPT was preferred over the 100x larger GPT-3 (175B) — alignment beats raw size.

The direct bridge between the RLHF idea (2017) and ChatGPT (late 2022) — it explains why ChatGPT worked so well.

Anti-hype: InstructGPT did not invent RLHF (a 2017 paper did); it first showed at scale how strongly alignment makes a model more useful.

People:Long Ouyang

Organizations:OpenAI

2022Papers

Chinchilla: Scaling Rethought

The Chinchilla scaling laws: for a fixed compute budget, model size and training data should grow roughly in step.

The largest models (GPT-3, Gopher) were oversized and under-trained. Chinchilla (70B, 1.4T tokens) beat the 4x larger Gopher (280B).

Shifted how practically every later frontier model is trained (data/parameter ratio); influenced Llama, among others.

Anti-hype: Chinchilla did not invent scaling laws but corrected Kaplan (2020); later models deliberately over-train for more efficient use.

People:Jordan Hoffmann

Organizations:Google DeepMind

2022Products

PaLM: Google's Giant with 540 Billion Parameters

In 2022, Google introduced PaLM — a language model with 540 billion parameters, trained on thousands of TPU chips.

PaLM excelled at multi-step reasoning: with chain-of-thought prompts it solved word problems and even explained jokes.

It fueled the idea of emergent abilities — skills that appear suddenly only above a certain model size.

Organizations:Google

2022Products

Stable Diffusion: Open-source image generation

First powerful open-source text-to-image model with GitHub-available source code

Latent diffusion models with iterative de-noising in latent spaces instead of direct pixel manipulation

Explosive community growth with countless variants, tools, and applications

Broke monopoly of proprietary systems and democratized high-quality AI image generation

People:Emad Mostaque, Robin Rombach, Andreas Blattmann

Organizations:Stability AI, CompVis, Runway

2022Breakthroughs

OpenAI Releases Whisper

Released on September 21, 2022, as open source

Covers 99 languages and transcribes robustly even with accents and background noise — strongest in English, as the majority of training data is in English

Trained on 680,000 hours of multilingual audio data from the internet

Democratized high-quality speech recognition through open-source availability

People:Alec Radford, Jong Wook Kim, Tao Xu

Organizations:OpenAI

2022Products

ChatGPT Marks a Turning Point in AI Adoption

Released on November 30, 2022, as a free research preview accessible to the general public

Reached 1 million users in 5 days, 100 million in 2 months — at the time the fastest growth of any consumer app (later surpassed by Threads)

First powerful AI without technical barriers — direct web access for any internet user

Democratized AI and triggered the current generative AI wave across society and the economy

People:Sam Altman, Greg Brockman, Ilya Sutskever, John Schulman

Organizations:OpenAI, Microsoft, ChatGPT

2022Papers

Constitutional AI — Safety Through Principles

The AI critiques and improves its own responses to harmful content — without human harm labels for these evaluations

Safety-first alternative to pure performance approaches such as ChatGPT

Triple goal: helpful, honest, and harmless through ethical principles

RLAIF: Reinforcement Learning from AI Feedback replaces human evaluations for harmlessness (helpfulness continues via RLHF)

People:Yuntao Bai, Andy Jones, Kamal Ndousse, Dario Amodei, Anthropic Team

Organizations:Anthropic

2023Regulation

NIST AI Framework: USA Defines Trustworthy AI

Four core functions: Govern, Map, Measure, Manage for systematic AI risk management

Seven characteristics of trustworthy AI: safe, resilient, explainable, privacy-enhancing, fair, transparent, and accountable

Voluntary multi-stakeholder approach: 240+ organizations jointly developed the standards

Federal standards agency: NIST developed the AI RMF under the mandate of the National AI Initiative Act of 2020

People:NIST AI Team, 240+ Contributing Organizations

Organizations:NIST, US Department of Commerce, Biden Administration

2023Products

LLaMA: Open-Source Foundation Model

Inference code under GPLv3 license; model weights were released on a case-by-case basis and exclusively for non-commercial research

Models from 7B to 65B parameters trained exclusively on publicly available datasets

Enabled researchers without large infrastructure to study advanced language models

Various model sizes for different hardware requirements and research purposes

People:Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet

Organizations:Meta AI, FAIR

2023Products

Claude and Constitutional AI

Constitutional AI framework with two-phase training: self-critique based on ethical principles, then AI feedback-based refinement

Novel safety approach without human harm evaluations – purely through AI supervision

Simultaneous release of Claude and Claude Instant for different application requirements

Established 'helpful, harmless, honest' as core values for responsible AI development

People:Dario Amodei, Daniela Amodei, Tom Brown, Chris Olah

Organizations:Anthropic, Constitutional AI, AI Safety

2023Products

GPT-4: Multimodal AI Model

Large multimodal model with text and image inputs, vision capabilities for documents and diagrams

Bar exam top 10% vs. GPT-3.5 bottom 10%; SAT Math improved from the 70th to the 89th percentile

6 months of iterative alignment with adversarial testing and ChatGPT feedback for improved safety

Integration into ChatGPT Plus made advanced multimodal AI accessible to consumers

People:Sam Altman, OpenAI Team

Organizations:OpenAI

2023Products

Midjourney V5: Photorealistic AI Art

Photorealistic image quality barely distinguishable from real photographs

Sparked intense reactions in the creative community — from enthusiasm to existential concerns

Considerably advanced AI art through precise hand rendering and improved prompt sensitivity

Set new standards for commercial AI image generation with considerable impact on the creative industry

People:David Holz, Midjourney Team

Organizations:Midjourney Inc

2023Regulation

Biden AI Executive Order — First Comprehensive US Regulation

Most comprehensive AI governance ever — 110 pages, the longest executive order in history

Mandatory safety tests and red-team results for powerful AI systems

Defense Production Act: mandatory reporting for AI systems posing national security risks

Positioned the United States in 2023 as a leader in responsible AI governance — revoked again in 2025

People:Joe Biden, Kamala Harris

Organizations:White House, NIST, Department of Homeland Security

2023Regulation

Pause Letter & Bletchley: AI Safety Goes Global

March 2023: an open letter from the Future of Life Institute (thousands of signatories, incl. Bengio, Musk) called for a 6-month pause on training AI more powerful than GPT-4.

November 2023: the first global AI Safety Summit at Britain's Bletchley Park — where Turing cracked codes during the war.

28 states and the EU — including the US and China — signed the Bletchley Declaration on the risks of advanced AI; the start of a summit series (Seoul 2024, Paris 2025).

Anti-hype: the pause never came; the declaration was non-binding. Both set agendas but created no enforceable rules.

Organizations:Future of Life Institute, UK Government

2023Products

Mistral & Mixtral: Europe's Open Models

Spring 2023: in Paris, Arthur Mensch (ex-Google-DeepMind) plus Guillaume Lample and Timothée Lacroix (ex-Meta) founded Mistral AI — Europe's answer to the US labs.

September 2023: Mistral 7B — a small, open-weights model (Apache 2.0) that beat the larger Llama 2 13B.

December 2023: Mixtral 8x7B, an open Mixture-of-Experts model — matching GPT-3.5 on many benchmarks, yet efficient (only ~13B active of ~47B parameters).

Anti-hype: open weights is not open source (training data/code stay closed); Mixtral matched GPT-3.5, not GPT-4. Mixture-of-Experts is also older (e.g. Shazeer 2017).

People:Arthur Mensch, Guillaume Lample, Timothée Lacroix

Organizations:Mistral AI

2023Products

Google Gemini: Multimodal AI Family

Built for multimodality from the ground up: language, audio, code, and video understanding natively integrated

Outperformed GPT-3.5 in 6 out of 8 standard benchmarks and established Google as a serious ChatGPT alternative

Three model sizes: Ultra (complex), Pro (balanced), Nano (on-device) for different use cases

Regular Bard received Gemini Pro on the day of the announcement; Bard Advanced with Gemini Ultra was announced for early 2024

People:Sundar Pichai, Demis Hassabis, Gemini Team

Organizations:Google, DeepMind, Google AI

2024Products

Embodied AI: The Models Get a Body

2024 became the year of embodied AI: language models that had lived only in chat moved into robots — especially humanoid ones.

Figure teamed up with OpenAI and showed a talking, acting humanoid; NVIDIA introduced Project GR00T, a foundation model for humanoids; startups like Physical Intelligence were valued in the billions.

The hope: a robot that unites language, vision, and action in one foundation model could learn general tasks in the real world — a ChatGPT moment for robotics.

Organizations:Figure AI, NVIDIA, Physical Intelligence

2024Products

Waymo: The Driverless Taxi Becomes Everyday Reality

In 2024, Waymo, Google's robotaxi subsidiary, became the first to offer driverless taxis at scale — open to the public in several US cities.

In the summer of 2024, Waymo reported more than 100,000 paid rides per week, entirely without a safety driver at the wheel.

After more than a decade of promises, it was the first solid proof that autonomous driving can work as a real service.

Organizations:Waymo, Alphabet

2024Products

Sora: AI-Generated Video from Text

Photorealistic text-to-video generation producing HD videos up to a minute long, surpassing existing systems

Diffusion Transformer based on DALL-E 3 technology for temporal consistency

Often simulates physically plausible motion and maintains consistency throughout the full video length

Potential disruption of the film industry — Tyler Perry halted an 00 million studio expansion

People:Tim Brooks, Bill Peebles, Connor Holmes, Will DePue

Organizations:OpenAI

2024Products

Claude 3 family with multimodal capabilities

Sophisticated vision processing for photos, charts, diagrams, and technical drawings

Opus (highest intelligence), Sonnet (balance), Haiku (speed) for different use cases

Multimodal capabilities enable processing visual formats alongside text processing

Claude 3 Opus achieved new best results in MMLU, GPQA, and other cognitive benchmarks

People:Dario Amodei, Daniela Amodei, Tom Brown, Claude 3 Team

Organizations:Anthropic, Claude API, Amazon Bedrock

2024Products

Devin: The First Autonomous AI Software Engineer

Fully autonomous software development: planning, coding, debugging, testing, and deployment without human intervention

Handles complex engineering tasks from code migration to complete app development

13.86% success rate on SWE-Bench — 7x better than the previous state of the art of 1.96%

Sparked debate about the future of software development and inspired open-source alternatives such as OpenHands

People:Scott Wu, Steven Hao, Walden Yan

Organizations:Cognition Labs, SWE-Bench

2024Breakthroughs

AlphaFold 3: AI Predicts How Molecules Interact

In May 2024, Google DeepMind and Isomorphic Labs introduced AlphaFold 3.

While AlphaFold 2 predicted the folding of single proteins, AlphaFold 3 models their interactions — with DNA, RNA, drug molecules, and ions.

Especially valuable for drug discovery: researchers can estimate on a computer how a drug binds to its target protein.

Organizations:Google DeepMind, Isomorphic Labs

2024Competitions

AlphaProof: AI Wins Silver at the Math Olympiad

In July 2024, Google DeepMind's AlphaProof, with AlphaGeometry 2, solved four of the six International Mathematical Olympiad problems — at silver-medal level.

AlphaProof formulates proofs in the formal language Lean and checks them itself; it learned via reinforcement learning. AlphaGeometry 2 handled the geometry problem.

For the first time, an AI reached medal level at this prestigious competition — a milestone for machine reasoning with verifiable proofs.

Anti-hype: not contest conditions — the AI took days in some cases instead of 4.5 hours, and humans first translated the problems into formal language. The two combinatorics problems went unsolved.

Organizations:Google DeepMind

2024Regulation

EU AI Act: The First Comprehensive AI Law

The world's first comprehensive AI law with 180 recitals and 113 articles covering the entire AI lifecycle

Four risk levels: prohibited, high-risk, limited, and minimal risk — plus separate rules for GPAI foundation models

Extraterritorial reach like the GDPR could set global AI standards and influence worldwide compliance

Fines of up to 35 million euros or 7% of annual turnover, with phased implementation from 2025 to 2027

People:Ursula von der Leyen, Thierry Breton

Organizations:European Union, European Parliament, European Commission

2024Products

OpenAI o1 — Advances in Reasoning

First model whose chain-of-thought is trained and scaled via reinforcement learning — enabling structured reasoning

New scaling dimension: the longer it thinks, the better the results

New approach: from pattern reproduction to improved problem solving

Notable advance in complex reasoning — improved problem-solving capabilities

People:Sam Altman, Noam Brown, OpenAI Team

Organizations:OpenAI

2024Milestones

The AI Nobel Prizes of 2024

8 October 2024: the Physics Nobel to John Hopfield and Geoffrey Hinton for the foundations of machine learning with neural networks — a physics prize for AI.

9 October 2024: the Chemistry Nobel to David Baker (protein design) and Demis Hassabis and John Jumper of DeepMind (AlphaFold, protein folding).

For the first time two science Nobel Prizes in one year honoured the foundations of AI — a turning point in the field's standing.

Debated: are neural networks really physics? The prizes honour decades-old foundations (Hopfield networks 1982, Hinton's Boltzmann machine). Hinton simultaneously warned of AI risks.

People:John Hopfield, Geoffrey Hinton, Demis Hassabis, John Jumper, David Baker

Organizations:Royal Swedish Academy of Sciences

2024Breakthroughs

OpenAI o3: Breakthrough on ARC-AGI

o3 (announced 20 Dec 2024) extends o1's test-time scaling: more thinking at run-time → better results, top scores in mathematics and code.

87.5% on ARC-AGI — a test built to resist memorisation, on which predecessors scored near zero: a much-noted leap toward near-human adaptivity.

With o1 and DeepSeek-R1, the era of reasoning models; o3-mini end of Jan 2025, full o3 in April 2025.

Anti-hype: the 87.5% came in the expensive high-compute December preview (the later released o3 scored lower); the ARC organisers stress that o3 is NOT AGI and drops to ~3% on the harder ARC-AGI-2.

Organizations:OpenAI

2025Products

Agentic AI Goes Mainstream

Anthropic, Computer Use (Oct 2024): the first frontier model to offer computer use in public beta — screen, mouse, keyboard.

OpenAI: Operator (Jan 2025) browses the web on its own; Deep Research (Feb 2025) researches in multiple steps and writes cited reports.

The turn from chatbot (output text) to agent (act) — hinted at by Devin (2024), product mainstream in 2025.

Anti-hype: early versions were slow, error-prone and narrow; the systems were marketed more strongly than their 2025 reliability warranted.

Organizations:Anthropic, OpenAI

2025Products

DeepSeek-R1: The AI Shock from China

R1 (20 Jan 2025): a reasoning model at o1 level with open weights (MIT licence), trained via large-scale reinforcement learning on DeepSeek-V3.

Trained at a fraction of the expected cost — challenging the assumption that frontier AI necessarily needs huge compute budgets.

27 Jan 2025: Nvidia down ~17% (about $600B in one day, a US record); China at the AI frontier — AI became visibly a market and geopolitics question.

Anti-hype: the cited few million dollars refers only to the final training run of the V3 base model — not R1 itself, nor research/hardware overall; R1 was not uniformly better than o1.

People:Liang Wenfeng

Organizations:DeepSeek

2025Milestones

Stargate: AI as Nation-Scale Infrastructure

Up to $500B over four years for AI data centres in the US (OpenAI, SoftBank, Oracle, MGX); deployment of $100B to begin immediately.

Unveiled at the White House: AI became visibly a question of national infrastructure and geopolitics.

The next phase of AI is an energy-and-construction question — compute on the scale of power plants (the through-line since CUDA/AlexNet).

Anti-hype: an announcement is not a finished data centre; whether the full $500B materialises was contested from the start.

People:Sam Altman, Masayoshi Son, Larry Ellison

Organizations:OpenAI, SoftBank, Oracle

2025Regulation

Paris AI Action Summit

Third global AI summit (after Bletchley 2023, Seoul 2024): 10-11 February 2025, Grand Palais, co-chaired by Macron and Modi.

A shift from safety to opportunity and competition: Paris stressed investment over risks; the US Vice President argued against too much regulation.

58 countries plus the EU and the African Union signed the final declaration — the US and UK declined to sign (an open transatlantic rift).

Anti-hype: the declaration was non-binding; critics called the summit a missed opportunity for safety.

People:Emmanuel Macron, Narendra Modi

2025Products

The Frontier Models of 2025

In 2025 reasoning (step-by-step thinking) and agency (acting on its own) became standard in frontier models; Claude 3.7 introduced the hybrid model that answers fast or thinks longer on demand.

A tight race: Gemini 2.5 Pro (March), Claude 4 / Opus 4 (May), GPT-5 (August) — plus Llama 4, Grok, DeepSeek. Several labs at the frontier.

At the centre: long-horizon autonomous coding (e.g. Claude Code) — models that work through whole tasks on their own.

Anti-hype: benchmark records week after week, every lab claims the top spot; real progress, but AGI remained marketing more than reality.

Organizations:Anthropic, OpenAI, Google DeepMind

Search

Category

Date Range

Babbage's Analytical Engine: The Idea of the Computer

Related Content

Ada Lovelace: The First Program — and a Bold Vision

Related Content

The Turing Machine: What Computation Even Means

Related Content

McCulloch & Pitts: The First Artificial Neuron

Related Content

Shannon's Information Theory: The Bit Is Born

Related Content

Hebbian Learning: How the Brain Might Learn

Related Content

The Turing Test: The Imitation Game

Related Content

Logic Theorist: The First Reasoning Program

Related Content

Dartmouth Conference: The Birth of AI

Related Content

Perceptron: The First Learning Neural Network

Related Content

LISP: The Language of AI

Related Content

Arthur Samuel: Self-Learning AI & the Term “Machine Learning”

Related Content

DENDRAL: Pioneer of Expert Systems

Related Content

Fuzzy Logic: The Logic of Imprecision

Related Content

ELIZA: The First Chatbot

Related Content

Perceptrons: The Book That Helped Trigger the AI Winter

Related Content

Shakey: The First Intelligent Mobile Robot

Related Content

SHRDLU: Understanding Language in the Blocks World

Related Content

Hidden Markov Models Established

Related Content

Prolog: Programming with Logic

Related Content

The First AI Winter

Related Content

Neocognitron: The Ancestor of CNNs

Related Content

The Expert Systems Era of the 1980s

Related Content

Hopfield Networks: Associative Memory

Related Content

Backpropagation Algorithm

Related Content

The Second AI Winter

Related Content

UCI ML Repository: The Dataset Library

Related Content

Bayesian Networks: Reasoning Under Uncertainty

Related Content

Universal Approximation Theorem

Related Content

World Wide Web: The Invention of the WWW

Related Content

LeNet and the Birth of CNNs

Related Content

TD-Gammon: Learning by Playing Against Itself

Related Content

Q-Learning: Foundation of Reinforcement Learning

Related Content

Penn Treebank: Syntactic Annotation Transforms NLP

Related Content

AdaBoost: Weak Learners Become Strong

Related Content

Support Vector Machines: Maximum Margin Classification

Related Content

WordNet: The Semantic Network of Language

Related Content

PageRank: Google's Billion-Dollar Algorithm

Related Content

Deep Blue Defeats Kasparov