Function Calling — When AI Presses Buttons

When AI no longer just talks but presses the button — how that works technically.

Architectures 10 min Intermediate April 26, 2026

In the previous article you saw how RAG gives a language model access to fresh documents. But reading documents is still passive — like giving someone a reference library while chaining them to a desk. What if the model could actually do things — check live data, trigger workflows, send messages?

Function Calling is the mechanism that makes this possible. It does not make the AI smarter — it gives the AI hands. This article shows exactly how that works: why the model needs tools, how tools are defined, and how the complete loop from question to real-world action operates under the hood.

The Limitation of Text — Why LLMs Have No Arms

Picture a superintelligent brain floating in a nutrient jar on a laboratory table. It can think brilliantly — compose strategies, write poetry, solve equations. But it has no eyes, no ears, no hands. When you tell it "Send an email to Sarah," it can draft the perfect message but cannot press the send button. It has no connection to the outside world.

That is exactly how large language models work: text in, text out. They predict the next token based on statistical patterns from their training data. They have no mechanism to access external systems, execute code, or perceive real-time information. RAG partially addressed this by injecting relevant documents into the context — but RAG is passive: it provides knowledge, not agency.

LLM without tools

User asks about the Bitcoin price. The model has no internet access. It generates: "$42,350" — a plausible-sounding but completely fabricated number.

LLM with Function Calling

The same model recognizes an available get_crypto_price tool, generates the JSON call, the developer code queries a live API, and the model responds with the actual current price.

Same question, same model — radically different reliability. In the first case, the model hallucinates a statistically plausible answer because its training signal rewards plausible text continuation, not epistemic honesty. In the second case, the hallucination is replaced by real data.

Unlike a real brain in a jar that would recognize its own limitations ("I cannot see — I am in a jar"), an LLM lacks this self-awareness. It hallucinates because its training signal rewards producing plausible answers — not admitting ignorance.

Common Misconception

"Function Calling means the AI executes code on my server."

Wrong. The AI generates a JSON object — nothing more. It never touches your server, your API, or your database directly. Your code receives the JSON, validates it, decides whether to execute it, and performs the actual call. The AI is the architect drawing blueprints; your code is the construction crew.

The Tool Inventory — JSON Schema as Contract

Before a model can use tools, a developer must declare them using a JSON schema (JSON is a standardized text format that computers use to exchange data) sent alongside every API request. Each tool definition has three components: a unique technical identifier (name), a natural-language explanation (description), and a parameter definition (parameters). The model reads these definitions and dynamically decides — for each user message — whether any tool is needed and which one.

JSON Schema — The Tool Contract

AnalogyDefinition
Imagine a master craftsman who is completely blind. You hand him a toolbox where every tool has a detailed label in Braille. The hammer's label reads: "For driving nails into wood. Requires: nail size in mm, material type." The screwdriver's label reads: "For turning screws. Requires: screw head type, size." When a customer says "Hang this picture on the wall," the craftsman runs his fingers over the labels, selects the hammer, and announces: "I need the hammer with a 40mm nail for wood." He does not swing the hammer himself — he tells you exactly which tool with which settings he needs.

Example: Tool Definition

{
  "tools": [{
    "name": "get_weather",
    "description": "Returns current weather
      for a city. Use when user asks about
      weather, temperature, or outdoor
      conditions.",
    "parameters": {
      "type": "object",
      "properties": {
        "city": { "type": "string" },
        "unit": { "enum": ["celsius","fahrenheit"] }
      },
      "required": ["city"]
    }
  }]
}

User asks: "What is the temperature in Berlin?" — The model generates: {"name": "get_weather", "arguments": {"city": "Berlin", "unit": "celsius"}}. Note: the user never mentioned Celsius — the model inferred the appropriate unit from the German-language context.

The quality of the description determines everything. A vague description like "database function" causes the model to ignore the tool even when the user asks a relevant question. A precise description like "Executes a read-only query against the orders database. Use when the user asks about order counts, revenue, or customer data" enables correct tool selection.

Common Misconception

"The AI can call any function it wants."

Wrong. The model can only invoke tools explicitly declared in the schema. It is a whitelist, not an open playground. If you define three tools, the model can only choose among those three (or choose none). This is a critical safety property — the developer controls the action space entirely.

Terminology Note

Function Calling (OpenAI, Google) and Tool Use (Anthropic) refer to exactly the same mechanism. The JSON-schema-based declaration, the structured output, and the execution loop are architecturally identical across all providers. Do not let different marketing terms confuse you.

In 2022, Yao et al. formalized the ReAct approach (Reasoning and Acting), which unifies reasoning and action in a single loop. Instead of thinking completely first and then acting, the model alternates between thinking ("I need to check the weather") and acting (tool call).

This approach was a turning point for agent architectures because it showed that language models can plan their own actions and feed the results back into further decisions — exactly what Function Calling enables in practice.

The ReAct pattern is now the foundation for most AI agent frameworks. When an agent calls tools across multiple steps, a variant of this pattern runs under the hood.

The Execution Loop — From JSON to Agent

Function Calling alone is only half the story. The complete architecture is a four-step loop that can repeat — and it is precisely this repetition that transforms a chatbot into an agent.

1
User sends message
2
Model generates JSON
3
Code executes function
4
Result back to model

A simple example: The user asks "What is the weather in Berlin?" The model decides it needs the get_weather tool and generates the appropriate JSON call. The developer code receives the JSON, calls the real weather API, and gets "15 degrees, cloudy." This result is passed back to the model as a new message. The model formulates: "It is currently 15 degrees and cloudy in Berlin."

The real power shows in chaining: "Plan my day in Berlin tomorrow." The model calls get_weather ("12 degrees, light rain"), check_calendar ("No appointments"), and search_indoor_activities ("Museum Island, Philharmonie, ...") in sequence. Each loop iteration provides the model with new information that shapes its next decision.

Check weather Retrieve live weather data for any city
Check calendar View appointments and create new ones
Search information Research restaurants, activities, or facts
Send emails Compose and send messages

The AI is like a CEO sitting in a corner office. Your developer code is the executive assistant in the outer office. The CEO says: "Call client Mueller and ask about the delivery date." The assistant picks up the phone, dials, gets the answer "Delivery on Friday," and relays it back. The CEO turns to the visitor and says: "Mueller delivers Friday." The visitor thinks the CEO knows everything — but the assistant did the real work behind the scenes.

Function Calling gives the model the ability to trigger real-world actions. This makes validation not optional but mandatory. The model can hallucinate incorrect parameters, select the wrong tool, or be tricked into unintended calls through prompt engineering.

Developer code must validate every generated call before execution: Are the parameters plausible? Is the request within allowed limits? Does the user have permission for this action? A model that generates "Delete all customer data" as a tool call must not get through.

The golden rule: Treat every AI-generated tool call like untrusted user input. Validate, limit, log — and execute critical actions only after human confirmation.

This article showed how Function Calling gives a language model the ability to act — from passive text generation to active world interaction. The next article in the path shows how APIs and MCP standardize and scale the tool ecosystem.

Interactive: How a Function Call Works

Step through a complete Function Calling flow — from the user question through tool selection to the final answer. Watch how the call stack grows and shrinks at each step.

1
2
3
4
5
6
7
Code
1query = "Wetter in Berlin?"
2resp = llm.generate(query, tools)
3# LLM: {"name":"get_weather","city":"Berlin"}
4result = get_weather(city="Berlin")
5# Tool returns: "12°C, bewölkt"
6final = llm.generate(query, result)
7# "In Berlin: 12°C, bewölkt."
Call Stack
user_query
query="Wetter in Berlin?"
Step 1 / 7Start

Python encounters the line result = greet("Ada"). It recognizes a function call and prepares to execute the function greet.

Key Takeaways

  1. An LLM without tools is a brain in a jar — brilliant at language but unable to interact with the real world. When asked about anything dynamic, it is forced to hallucinate.
  2. The JSON schema is a contract, not a suggestion. The quality of the tool description determines whether the AI picks the right tool. Sloppy descriptions lead to wrong actions or missed opportunities.
  3. The AI never executes anything itself — it generates a structured request, your code does the real work, and the result flows back. This separation is both the safety mechanism and the key architectural insight.

In the next article you will learn how APIs and the Model Context Protocol (MCP) standardize this tool infrastructure — so that not every developer has to reinvent the wheel.

Knowledge Check: Function Calling

Question 1 / 6
Not completed

What are the three components of a tool definition in Function Calling?

Select one answer
Answer Key: 1) B · 2) B · 3) C · 4) C · 5) B · 6) C

Checkpoint

  • Why must a language model without tools hallucinate when asked real-time questions?
  • Why is the natural-language description the most critical element of a tool definition?
  • Who actually executes an action — the AI or the developer code?