When software talks to models, someone needs the vocabulary — APIs and MCP do that.
Architectures 10 min Intermediate April 26, 2026
You have seen what a language model can do when you type into a chat window. But software does not type — it sends structured requests through an API.
This article takes you from your first API call to the protocol that could standardize how all AI tools connect: MCP. Along the way, you will learn what these calls cost, why 50 custom integrations are a problem, and how four building blocks turn a single model into an acting agent.
From Chat Window to Code
The moment you move from a chat UI to code, the model becomes a service you call — like any other web API.
LLM API
AnalogyDefinition
Think of a restaurant counter. You fill out an order form (the JSON request) specifying what you want: model, messages, temperature. You hand it to the counter staff along with your membership card (API key). The kitchen (the model) processes your order and sends back a tray (JSON response). You pay per item on the tray, not per visit.
Analogy:
Think of a restaurant counter. You fill out an order form (the JSON request) specifying what you want: model, messages, temperature. You hand it to the counter staff along with your membership card (API key). The kitchen (the model) processes your order and sends back a tray (JSON response). You pay per item on the tray, not per visit.
Definition:
An HTTP endpoint that accepts a structured JSON request (model name, message history, parameters) and returns the model's response as JSON. Authentication uses an API key — a secret token tied to a billing account.
Important: you do not choose the ingredients — you have no access to the model's internal workings.
A minimal Python call looks like this:
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is DNA?"}]
)
print(response.choices[0].message.content)
What Does an API Call Cost?
Billing is split between input and output tokens.
Example with GPT-4o pricing: 30 input tokens × USD 2.50 per million = USD 0.000075. 100 output tokens × USD 10.00 per million = USD 0.001. Total: about USD 0.001 — roughly one tenth of a cent.
At 1,000 calls per day: about USD 1.08. At 100,000 calls per day: about USD 107.
~0.1 cents
Per Call Cost of a single GPT-4o API call
~107 USD
At 100k Calls/Day Daily cost at high volume
70%
Fewer Integrations Savings through MCP (N+M instead of N×M)
Common Misconception: APIs Are Expensive
Individual calls cost fractions of a cent. The cost risk lies in volume and long context windows — not in single requests. Think of API costs like a water bill: the per-liter price is tiny, but leaving the tap open drains the budget.
Interactive: What Happens During an API Call?
Click Play and watch step by step what happens behind the scenes when your code calls an API — from the browser through DNS and TLS to the JSON response.
The Journey of an HTTP Request
This animation shows the six stations every API call passes through — from the click in the browser to the JSON response.
Step 0 of 6
Start animation
Click "Play" to see the RAG pipeline step by step.
Why Does This Matter?
Every API call — whether to OpenAI, Claude, or your own REST API — goes through exactly these steps. Understanding what happens helps you diagnose errors (timeouts, 404, CORS) much faster.
One Plug for Everything — MCP
You have just seen how one app talks to one model. Now imagine five apps, each needing ten different tools. Without a standard, that is 50 integrations. This is the N-times-M problem.
Model Context Protocol (MCP)
AnalogyDefinition
USB-C for AI. Before USB-C, every device had its own cable — phone charger, camera cable, external drive connector. With 5 devices and 10 accessories, you needed up to 50 different cables. USB-C reduced this to one standard plug: each device and each accessory just needs one USB-C port. MCP does the same for AI integrations.
Analogy:
USB-C for AI. Before USB-C, every device had its own cable — phone charger, camera cable, external drive connector. With 5 devices and 10 accessories, you needed up to 50 different cables. USB-C reduced this to one standard plug: each device and each accessory just needs one USB-C port. MCP does the same for AI integrations.
Definition:
An open protocol (introduced by Anthropic in late 2024) for standardizing bidirectional connections between AI applications and external tools. An MCP server exposes tools, an MCP client manages the connection, and the LLM decides which tool to call and when.
Without MCP
5 apps × 10 tools = 50 custom integrations to build and maintain.
With MCP
5 MCP clients + 10 MCP servers = 15 implementations. That is a 70% reduction.
Key difference: USB-C is a mature, universal standard. MCP is young (2024) and not yet universally adopted.
Architecturally, MCP forms a triangle: the MCP server exposes tools, the client manages the connection, and the LLM decides when to use which tool. Concrete example: Claude Desktop connects to filesystem, SQLite, and GitHub through three MCP servers — one protocol, three capabilities.
Important: MCP Is Not a Universal Replacement
MCP does not replace all APIs. Direct integrations remain necessary for specialized SaaS services where no MCP server exists yet. MCP standardizes how AI discovers and invokes tools — not how the tool works internally.
Anatomy of an Agent
An API call is a question with one answer. An agent is a loop — it plans, acts, observes, and decides whether to continue. This loop needs four building blocks.
Brain (LLM) Plans and interprets — decides which tool to use and when
Hands (Tools) Execute actions — each tool has a specific function
Memory (Context) Stores history, documents, and intermediate results across steps
Conductor (Orchestration) Controls flow, handles errors, and determines stopping conditions
Think of a head chef managing dinner service.
The chef's brain is the LLM — reading orders, deciding what to cook, and prioritizing. The chef's hands are the tools — knife, stove, oven, each with a specific function. The order tickets above the pass are the context — they track what has been ordered, what is in progress, what is done. The kitchen workflow is the orchestration: starters first, then mains, then desserts, check timing.
Like a real chef, an agent can also make mistakes — picking the wrong tool or misreading an order.
1
Receive Task
2
Plan (LLM chooses tool)
3
Execute (tool runs)
4
Observe (result returns)
5
Decide (continue or stop)
Example: A Travel-Booking Agent
Step 1: The user says: Book me a flight to Berlin next Tuesday. Step 2: The LLM reads the calendar (context) — Tuesday is March 24. Step 3: The LLM calls the flight-search tool — three options found. Step 4: The LLM compares prices and times, selects the cheapest morning flight. Step 5: The LLM asks the user for confirmation (safety gate). Step 6: User confirms. Step 7: The LLM calls the booking tool. Step 8: Orchestration checks the result — booking confirmed — loop ends.
Common Misconception: Agents Are Autonomous AI
Today's agents are only as capable as their tools, permissions, and safety boundaries. They do not improvise — they follow a plan-act-observe loop within defined guardrails. An agent without tools is a chatbot. An agent without guardrails is a liability.
Overview: Agent Frameworks
LangChain: Broad integration ecosystem — many connectors, many possibilities.
LlamaIndex: RAG-focused — document processing and knowledge retrieval as core strengths.
CrewAI: Multi-agent collaboration — agents with defined roles working together.
Claude Agent SDK: Lean and MCP-native — minimal overhead, maximum protocol integration.
This overview is a signpost, not a benchmark. Each framework has its own emphasis.
Key Takeaways
An API call is an HTTP POST with JSON — the same mental model you already know from web development, applied to language models.
MCP turns many-to-many integration chaos (N times M) into a manageable many-plus-many structure (N plus M) by standardizing how tools expose their capabilities.
An agent is not magic — it is a loop of four components (LLM, Tools, Context, Orchestration) repeating until a stopping condition is met.
Knowledge Check: APIs & MCP
Question 1 / 6
Not completed
What is the primary role of an API key when calling an LLM API?
1. What is the primary role of an API key when calling an LLM API?
☐ A) It determines which language the model responds in.
☐ B) It authenticates your application and ties usage to a billing account.
☐ C) It encrypts the content of your prompt so nobody can read it.
☐ D) It selects the specific GPU that processes your request.
2. Which statement best describes what MCP standardizes?
☐ A) The internal architecture of language models.
☐ B) The pricing structure across different AI providers.
☐ C) How AI applications discover and invoke external tools through a shared protocol.
☐ D) How training data is collected and labeled for fine-tuning.
3. A startup runs 50,000 API calls per day. Each call uses 200 input tokens and 500 output tokens. The provider charges USD 3.00 per million input tokens and USD 15.00 per million output tokens. What is the approximate daily cost?
☐ A) About USD 4
☐ B) About USD 40
☐ C) About USD 405
☐ D) About USD 4,050
4. A company has 3 AI applications that each need to work with 8 different tools. How many integrations are needed without MCP — and how many with MCP?
☐ A) Without: 11, With: 24
☐ B) Without: 24, With: 11
☐ C) Without: 24, With: 8
☐ D) Without: 11, With: 3
5. An AI agent is tasked with summarizing a 200-page PDF but keeps producing incomplete summaries. Which component of the agent architecture is most likely the bottleneck?
☐ A) Orchestration — the loop stops too early.
☐ B) Tools — the PDF reader cannot handle large files.
☐ C) LLM — the context window cannot fit all pages.
☐ D) Any of the above could be the cause — diagnosis requires checking each component individually.
6. A developer argues: We do not need MCP — direct API integrations are fine. In which scenario is this argument strongest?
☐ A) The team has 2 apps using 2 stable, rarely-changing tools.
☐ B) The team has 10 apps sharing 15 tools.
☐ C) The team adds new tools every quarter across multiple apps.
☐ D) The team builds open-source AI tools for broad community use.
Answer Key: 1) B · 2) C · 3) C · 4) B · 5) D · 6) A
Self-Check: APIs & MCP
Can you explain what happens technically when software sends a prompt to an LLM — from request to response?
Why does MCP reduce the integration count from 50 to 15 in a five-app, ten-tool scenario?
Name the four building blocks of an agent and explain why removing any one of them breaks the system.