When AI no longer just talks but presses the button — how that works technically.
Architectures 10 min Intermediate April 26, 2026
In the previous article you saw how RAG gives a language model access to fresh documents. But reading documents is still passive — like giving someone a reference library while chaining them to a desk. What if the model could actually do things — check live data, trigger workflows, send messages?
Function Calling is the mechanism that makes this possible. It does not make the AI smarter — it gives the AI hands. This article shows exactly how that works: why the model needs tools, how tools are defined, and how the complete loop from question to real-world action operates under the hood.
The Limitation of Text — Why LLMs Have No Arms
Picture a superintelligent brain floating in a nutrient jar on a laboratory table. It can think brilliantly — compose strategies, write poetry, solve equations. But it has no eyes, no ears, no hands. When you tell it "Send an email to Sarah," it can draft the perfect message but cannot press the send button. It has no connection to the outside world.
That is exactly how large language models work: text in, text out. They predict the next token based on statistical patterns from their training data. They have no mechanism to access external systems, execute code, or perceive real-time information. RAG partially addressed this by injecting relevant documents into the context — but RAG is passive: it provides knowledge, not agency.
LLM without tools
User asks about the Bitcoin price. The model has no internet access. It generates: "$42,350" — a plausible-sounding but completely fabricated number.
LLM with Function Calling
The same model recognizes an available get_crypto_price tool, generates the JSON call, the developer code queries a live API, and the model responds with the actual current price.
Same question, same model — radically different reliability. In the first case, the model hallucinates a statistically plausible answer because its training signal rewards plausible text continuation, not epistemic honesty. In the second case, the hallucination is replaced by real data.
Unlike a real brain in a jar that would recognize its own limitations ("I cannot see — I am in a jar"), an LLM lacks this self-awareness. It hallucinates because its training signal rewards producing plausible answers — not admitting ignorance.
Common Misconception
"Function Calling means the AI executes code on my server."
Wrong. The AI generates a JSON object — nothing more. It never touches your server, your API, or your database directly. Your code receives the JSON, validates it, decides whether to execute it, and performs the actual call. The AI is the architect drawing blueprints; your code is the construction crew.
The Tool Inventory — JSON Schema as Contract
Before a model can use tools, a developer must declare them using a JSON schema (JSON is a standardized text format that computers use to exchange data) sent alongside every API request. Each tool definition has three components: a unique technical identifier (name), a natural-language explanation (description), and a parameter definition (parameters). The model reads these definitions and dynamically decides — for each user message — whether any tool is needed and which one.
2023 Products
GPT-4: Multimodal AI model
The breakthrough to human performance in professional and academic benchmarks. On March 14, 2023, OpenAI unveiled GPT-4 – a Large Multimodal Model that processes text and image inputs and reaches human level in various disciplines. The improvements were substantial: while GPT-3.5 passed the Bar Exam in the bottom 10%, GPT-4 reached the top 10%. In SAT tests, performance increased from the 82nd to the 94th percentile. After six months of iterative alignment with insights from the adversarial testing program and ChatGPT feedback, the entire deep learning stack was rebuilt. The multimodal capabilities enable processing of documents, diagrams, and screenshots with the same quality as pure text inputs. GPT-4 established new standards for AI safety and performance.
JSON Schema — The Tool Contract
AnalogyDefinition
Imagine a master craftsman who is completely blind. You hand him a toolbox where every tool has a detailed label in Braille. The hammer's label reads: "For driving nails into wood. Requires: nail size in mm, material type." The screwdriver's label reads: "For turning screws. Requires: screw head type, size." When a customer says "Hang this picture on the wall," the craftsman runs his fingers over the labels, selects the hammer, and announces: "I need the hammer with a 40mm nail for wood." He does not swing the hammer himself — he tells you exactly which tool with which settings he needs.
Analogy:
Imagine a master craftsman who is completely blind. You hand him a toolbox where every tool has a detailed label in Braille. The hammer's label reads: "For driving nails into wood. Requires: nail size in mm, material type." The screwdriver's label reads: "For turning screws. Requires: screw head type, size." When a customer says "Hang this picture on the wall," the craftsman runs his fingers over the labels, selects the hammer, and announces: "I need the hammer with a 40mm nail for wood." He does not swing the hammer himself — he tells you exactly which tool with which settings he needs.
Definition:
A JSON schema defines a contract between developer and model. The name is a unique technical identifier (e.g., get_weather). The description is the most critical element — it is the model's only way to understand the tool's purpose. The parameters define expected input types, required fields, and allowed values. The model uses the description for tool selection and the parameter schema for argument generation.
Example: Tool Definition
{
"tools": [{
"name": "get_weather",
"description": "Returns current weather
for a city. Use when user asks about
weather, temperature, or outdoor
conditions.",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string" },
"unit": { "enum": ["celsius","fahrenheit"] }
},
"required": ["city"]
}
}]
}
User asks: "What is the temperature in Berlin?" — The model generates: {"name": "get_weather", "arguments": {"city": "Berlin", "unit": "celsius"}}. Note: the user never mentioned Celsius — the model inferred the appropriate unit from the German-language context.
The quality of the description determines everything. A vague description like "database function" causes the model to ignore the tool even when the user asks a relevant question. A precise description like "Executes a read-only query against the orders database. Use when the user asks about order counts, revenue, or customer data" enables correct tool selection.
Common Misconception
"The AI can call any function it wants."
Wrong. The model can only invoke tools explicitly declared in the schema. It is a whitelist, not an open playground. If you define three tools, the model can only choose among those three (or choose none). This is a critical safety property — the developer controls the action space entirely.
Terminology Note
Function Calling (OpenAI, Google) and Tool Use (Anthropic) refer to exactly the same mechanism. The JSON-schema-based declaration, the structured output, and the execution loop are architecturally identical across all providers. Do not let different marketing terms confuse you.
Deep Dive: The ReAct Pattern
In 2022, Yao et al. formalized the ReAct approach (Reasoning and Acting), which unifies reasoning and action in a single loop. Instead of thinking completely first and then acting, the model alternates between thinking ("I need to check the weather") and acting (tool call).
This approach was a turning point for agent architectures because it showed that language models can plan their own actions and feed the results back into further decisions — exactly what Function Calling enables in practice.
The ReAct pattern is now the foundation for most AI agent frameworks. When an agent calls tools across multiple steps, a variant of this pattern runs under the hood.
The Execution Loop — From JSON to Agent
Function Calling alone is only half the story. The complete architecture is a four-step loop that can repeat — and it is precisely this repetition that transforms a chatbot into an agent.
1
User sends message
2
Model generates JSON
3
Code executes function
4
Result back to model
A simple example: The user asks "What is the weather in Berlin?" The model decides it needs the get_weather tool and generates the appropriate JSON call. The developer code receives the JSON, calls the real weather API, and gets "15 degrees, cloudy." This result is passed back to the model as a new message. The model formulates: "It is currently 15 degrees and cloudy in Berlin."
The real power shows in chaining: "Plan my day in Berlin tomorrow." The model calls get_weather ("12 degrees, light rain"), check_calendar ("No appointments"), and search_indoor_activities ("Museum Island, Philharmonie, ...") in sequence. Each loop iteration provides the model with new information that shapes its next decision.
Check weather Retrieve live weather data for any city
Check calendar View appointments and create new ones
Search information Research restaurants, activities, or facts
Send emails Compose and send messages
The AI is like a CEO sitting in a corner office. Your developer code is the executive assistant in the outer office. The CEO says: "Call client Mueller and ask about the delivery date." The assistant picks up the phone, dials, gets the answer "Delivery on Friday," and relays it back. The CEO turns to the visitor and says: "Mueller delivers Friday." The visitor thinks the CEO knows everything — but the assistant did the real work behind the scenes.
Deep Dive: Security Considerations
Function Calling gives the model the ability to trigger real-world actions. This makes validation not optional but mandatory. The model can hallucinate incorrect parameters, select the wrong tool, or be tricked into unintended calls through prompt engineering.
Developer code must validate every generated call before execution: Are the parameters plausible? Is the request within allowed limits? Does the user have permission for this action? A model that generates "Delete all customer data" as a tool call must not get through.
The golden rule: Treat every AI-generated tool call like untrusted user input. Validate, limit, log — and execute critical actions only after human confirmation.
This article showed how Function Calling gives a language model the ability to act — from passive text generation to active world interaction. The next article in the path shows how APIs and MCP standardize and scale the tool ecosystem.
Interactive: How a Function Call Works
Step through a complete Function Calling flow — from the user question through tool selection to the final answer. Watch how the call stack grows and shrinks at each step.
1
2
3
4
5
6
7
Code
1query = "Wetter in Berlin?"
2resp = llm.generate(query, tools)
3# LLM: {"name":"get_weather","city":"Berlin"}
4result = get_weather(city="Berlin")
5# Tool returns: "12°C, bewölkt"
6final = llm.generate(query, result)
7# "In Berlin: 12°C, bewölkt."
Call Stack
user_query
query="Wetter in Berlin?"
Step 1 / 7Start
Python encounters the line result = greet("Ada"). It recognizes a function call and prepares to execute the function greet.
Key Takeaways
An LLM without tools is a brain in a jar — brilliant at language but unable to interact with the real world. When asked about anything dynamic, it is forced to hallucinate.
The JSON schema is a contract, not a suggestion. The quality of the tool description determines whether the AI picks the right tool. Sloppy descriptions lead to wrong actions or missed opportunities.
The AI never executes anything itself — it generates a structured request, your code does the real work, and the result flows back. This separation is both the safety mechanism and the key architectural insight.
In the next article you will learn how APIs and the Model Context Protocol (MCP) standardize this tool infrastructure — so that not every developer has to reinvent the wheel.
Knowledge Check: Function Calling
Question 1 / 6
Not completed
What are the three components of a tool definition in Function Calling?
1. What are the three components of a tool definition in Function Calling?
☐ A) Input, Output, Error Handling
☐ B) Name, Description, Parameters
☐ C) Request, Response, Callback
☐ D) Model, Schema, Endpoint
2. Why does an LLM without tools hallucinate an answer to "What is the current Bitcoin price?" instead of saying "I don't know"?
☐ A) The model is programmed to lie
☐ B) The model's training signal rewards plausible text continuation, and it has no mechanism to recognize that it lacks real-time data
☐ C) The model intentionally deceives users for engagement
☐ D) The model has internet access but chooses not to use it
3. A developer creates a tool with the description "database" (nothing else). Users ask "How many orders did we get today?" but the model never invokes the tool. Apply your knowledge of tool descriptions to diagnose and fix the problem.
☐ A) The model is broken and needs retraining
☐ B) The parameters schema is wrong
☐ C) The description is too vague for the model to match the user intent — it should be specific, e.g., "Queries the orders database. Use when users ask about order counts or revenue."
☐ D) The tool name should be changed to match the question
4. A startup builds an AI assistant with Function Calling for customer support. The model can call refund_order, escalate_ticket, and send_email. A customer writes: "Refund my last 500 orders and email me confirmation." The model generates correct JSON for all 500 refunds. Evaluate the risks and identify the missing architectural safeguard.
☐ A) No risk — the model correctly understood the request
☐ B) The model should refuse because 500 is too many
☐ C) The developer code must validate the request before execution — batch-refunding 500 orders without human approval is dangerous, and the missing safeguard is server-side validation with business-logic constraints
☐ D) The JSON schema should limit refunds to one at a time
5. A chatbot has access to the tools search_flights and book_flight. A user asks: "Find flights to Rome on Saturday." The chatbot calls search_flights, shows results, and asks: "Should I book one of these?" The user replies: "Yes, the cheapest one." Describe the execution loop steps for the entire interaction.
☐ A) A single book_flight call is sufficient for the entire interaction
☐ B) Two loop iterations: First search_flights (user query → JSON → API call → results → response), then book_flight (user confirmation → JSON → API call → booking confirmation → response)
☐ C) The model executes both calls simultaneously in parallel
☐ D) The user must manually enter the flight details because the model cannot remember previous results
6. A company gives its AI assistant 50 tools. The tools get_invoice and get_customer_info both have only the description "database query." Users report that the assistant frequently displays invoice information when they ask about customer data, and vice versa. Analyze the cause and propose an architectural solution.
☐ A) 50 tools are too many — the maximum is 10
☐ B) The database is misconfigured and mixing tables
☐ C) Both tools have identical, unspecific descriptions. The model cannot distinguish them because the description is its only differentiator. Solution: Precise, distinguishable descriptions like "Retrieves invoice details. Use for questions about invoices, amounts, payment status" vs. "Retrieves customer master data. Use for questions about contact details, customer history, contract status"
☐ D) The model needs fine-tuning to correctly assign the tools
Answer Key: 1) B · 2) B · 3) C · 4) C · 5) B · 6) C
Checkpoint
Why must a language model without tools hallucinate when asked real-time questions?
Why is the natural-language description the most critical element of a tool definition?
Who actually executes an action — the AI or the developer code?