Embeddings & Latent Space
The mathematical space where similar words are neighbors — without anyone telling them.
Your computer reads the word "apple" and sees token ID 402. That number is meaningless — until the machine learns to place it in a space where similar concepts cluster together. That space is the Embedding — and without it, no ChatGPT, no image generator, no modern AI would exist.
In this article, you will learn three things: how Word2Vec transforms words into dense vectors, what the Latent Space is and why proximity equals semantic similarity, and how RAG pipelines build on Embeddings to retrieve knowledge from documents.
Words as Vectors: The Word2Vec Breakthrough
Before Word2Vec, developers used One-Hot Encoding: each word was represented as a huge vector — 50,000 dimensions, with exactly one 1 and 49,999 zeros. The fatal problem: the distance between "apple" and "banana" is identical to the distance between "apple" and "car engine." One-Hot has no concept of similarity.
50,000 dimensions. Exactly one 1, rest zeros. No similarity measurable. "Apple" and "banana" equally distant as "apple" and "engine."
100-300 dimensions. All values active. Similarity directly measurable. "Apple" and "banana" are close together.
In 2013, Tomas Mikolov and his team at Google introduced Word2Vec. The core idea comes from linguist J.R. Firth (1957): "A word is known by the company it keeps." Words that appear in similar contexts receive similar vectors. Word2Vec trained on billions of sentences and compressed meaning into dense vectors with typically 100 to 300 dimensions.
Word Embedding — When Words Get Coordinates
Where the analogy breaks: the student also sees faces, hears tones, and reads body language. Word2Vec only sees text — it has zero sensory grounding. It knows that "apple" and "banana" co-occur with "fruit" but has no idea what either tastes like.
The most famous example: Vector("King") - Vector("Man") + Vector("Woman") ≈ Vector("Queen"). The direction from "Man" to "Woman" in the vector space roughly parallels the shift from "King" to "Queen." Similarly: Vector("Paris") - Vector("France") + Vector("Germany") ≈ Vector("Berlin"). These equations are fascinating — but also idealized. In practice, Embedding spaces are messier than textbook examples suggest.
Common Misconception: "Embeddings Understand Meaning"
Every modern neural network starts by converting its input into Embeddings. In a Transformer, the very first layer is the Embedding layer — it converts token IDs into dense vectors before any attention mechanism can operate.
The Latent Space: Where Meaning Lives
Latent Space — The Hidden Space of Representations
"Latent" means hidden: the dimensions of the space don't correspond to human-readable properties. Dimension 47 might partially encode "formality" and partially "topic domain" — you can't label it cleanly. That's why researchers use mathematical tools for dimensionality reduction to visualize high-dimensional spaces in 2D.
Where the analogy breaks: a city is two-dimensional. The Latent Space has thousands of dimensions. In 2D, a point can have only a few direct neighbors. In 12,288 dimensions, "dog" can simultaneously be close to "cat" (pet dimension), "wolf" (biology dimension), and "loyalty" (character dimension). This multi-dimensional proximity is impossible on a flat map.
A concrete example: a text analysis tool embeds movie reviews. "The film was brilliant, outstanding acting" and "Fantastic movie, superb cast" land nearly at the same point — despite sharing no significant words. "The film was terribly boring" lands at the opposite end of the space.
Common Misconception: "Each Dimension Has a Readable Meaning"
RAG: The Killer App of Embeddings
Retrieval-Augmented Generation connects a pre-trained LLM with external knowledge. The technical foundation: Embeddings. Without translating text into vectors, RAG would not exist.
The process: documents are split into chunks of 500-1000 tokens. Each chunk is converted into a vector by an Embedding model and stored in a vector database. When a user asks a question, the query is also embedded, the most similar chunks are found via similarity search, and passed to the LLM as context.
Only finds documents containing the exact search words. "Reset password" does not find "renew access credentials."
Finds documents by meaning. "Reset password" also finds "renew access credentials" and "account recovery."
A company manages 10,000 technical PDF manuals. An employee asks: "How do I reset my password?" Keyword search only finds documents containing "password." Embedding-based search also finds chunks about "renewing access credentials," "credential reset procedure," and "account recovery" — because these phrases sit close together in the Latent Space.
Common Misconception: "RAG Replaces Fine-Tuning"
Deep Dive: From Static to Contextual Embeddings
BERT significantly improves language understanding
An important advance in bidirectional language models and the birth of modern NLP. In October 2018, Jacob Devlin and his team at Google Research published the paper on BERT – Bidirectional Encoder Representations from Transformers. This innovation significantly changed language processing by training deep bidirectional representations from unlabeled texts for the first time. Unlike previous models, BERT considers both left and right context simultaneously in all layers. The result was notable: BERT achieved new best results in eleven NLP tasks and improved the GLUE score by a remarkable 7.7 percentage points to 80.5%. The open-source release democratized cutting-edge technology and enabled anyone to train their own powerful language models in 30 minutes. BERT established the pre-training-fine-tuning paradigm that forms the foundation of all large language models today.
Deep Dive: Cosine Similarity
Interactive: Words in Embedding Space
Click on the words and observe their position in 2D space. Semantically similar words are close together. Notice the parallel axes: King relates to Queen as Man relates to Woman — exactly the vector arithmetic that made Word2Vec famous.
Click on a circle to see details.
Legend
Warnings and Pitfalls
Key Takeaways
Now you understand how AI models represent meaning. The next article introduces Computer Vision and CNNs — how machines learn to understand images.
Quiz: Embeddings & Latent Space
Did you understand everything?
- Why are the vectors for "apple" and "banana" close together despite sharing no letters?
- What does it mean when a Latent Space has 12,288 dimensions?
- Why does a RAG search for "reset password" also find documents about "renewing access credentials"?