Glossary

Tool Use (Function Calling)

Tool use (also called function calling) is the ability of an AI model to invoke external functions, APIs, or services as part of generating a response. Instead of only producing text, the model can decide to call a tool, receive the result, and use it to complete the task.

Tool Use (Function Calling)

How It Works

A language model can reason and generate text, but it can't check your inventory, send an email, or look up a flight on its own. Tool use changes that. It gives the model a list of available tools (described as function signatures with parameters) and lets it decide when to call them.

Here's how it works in practice. You define a set of tools, each with a name, description, and expected parameters declared as a JSON Schema. When the model receives a user request, it can choose to emit a tool call (in a structured format) instead of (or in addition to) generating text. The system executes the tool call, returns the result to the model in a follow-up message, and the model incorporates that result into its response.

For example, a customer support agent might have tools for looking up order status, checking return eligibility, and initiating a refund. When a customer asks "where is my order?", the model calls the order lookup tool with the customer ID, gets the tracking information, and responds with it.

Modern APIs support parallel tool calls (the model requests multiple tools at once when they're independent) and forced tool use (the developer requires the model to call a specific tool or any tool). Parallel tool use cuts latency significantly on agents that need several pieces of information to answer. Anthropic, OpenAI, and Google all support parallel calls in their current APIs.

Tool use is what turns a chatbot into an agent. Without tools, the model can only talk about things. With tools, it can actually do things. This is the foundation of every production AI agent system.

The reliability of tool use depends on how well the tools are described and how clearly the model understands when to use each one. Good tool descriptions act like documentation for the model. Vague descriptions lead to incorrect tool calls. Tool names should be verbs or noun-verb pairs like "get_order_status" or "send_invoice_reminder", descriptions should explain both what the tool does and when to use it, and parameter schemas should be strict. Loose schemas (everything optional, no constraints) let the model invent bad arguments.

Tool misuse is a real risk. A model can call the right tool with the wrong arguments, or the wrong tool at the wrong time. Production systems validate every tool call against an allow-list and a parameter schema before executing. High-impact tools (payments, deletions, emails to customers) add a human-approval step.

In Practice

Tool use support is now standard across frontier models. Anthropic's tool-use API (on Claude 3.5 Sonnet, Claude 4 Opus, and Haiku) accepts a tools array with JSON Schema and returns structured tool_use blocks. OpenAI's function calling and the newer Responses API work similarly. Gemini function calling and open models via vLLM tool-calling support the same pattern. MCP servers provide cross-provider tool definitions.

Typical configuration: 5-15 tools per agent (more than 20 and model accuracy on tool selection drops measurably), strict JSON Schema with required fields and enum-constrained values, detailed tool descriptions of 50-150 words each, and parallel tool use enabled when tools are independent. Tool call timeouts: 5-10 seconds for simple API calls, 30-60 seconds for complex operations. Retries with exponential backoff for transient failures. Structured output validation on every tool response before feeding it back to the model.

A working pattern. Define tools with Pydantic (Python) or Zod (TypeScript) schemas, which then generate JSON Schema for the model. On each user turn, send tools and messages to the model. If the model emits tool_use blocks, execute each tool (in parallel where possible), validate the response, and send results back as tool_result messages. Continue the loop until the model responds with text only. Log every tool call with arguments, result, latency, and any errors to Langfuse or LangSmith. Set a max_tool_uses budget per turn (often 10-15) to prevent runaway loops.

Worked Example

A B2B software company builds a sales-operations agent that handles account-research requests from AEs. A rep asks, "what's the deal history and current usage with Acme Corp, and when was our last exec touch?"

The agent, powered by Claude 3.5 Sonnet with 6 tools registered, decides to call 3 tools in parallel: get_salesforce_account("Acme Corp"), get_product_usage("Acme Corp", days=90), and get_exec_meetings("Acme Corp", months=12). Each tool hits an internal API backed by Salesforce, Segment, and Gong respectively. All three responses return in about 900ms (they run concurrently).

The model reads all three tool results and composes a reply: "Acme Corp is a $240k ARR customer on the Enterprise tier since 2023. Usage on the core product dropped 22% in the last 30 days (see attached chart). The last exec meeting was your VP Sales with their CRO on Feb 3, 2026, per Gong. Recommend scheduling a check-in before quarter end." The response cites specific numbers that came from each tool, not from the model's training data.

Before the response is delivered, a guardrail node verifies that every number in the reply appears in one of the tool results. Numbers that don't match are flagged. In production, this system handles about 400 rep requests per day, averaging 2.3 tool calls per request and a per-request cost of $0.04 on Claude Sonnet. The tools are defined once and shared across 4 other internal agents via an MCP server, which means the sales ops team didn't rebuild integrations when the customer success team launched their own agent.

What People Get Wrong

Myth

Tool use and function calling are different things.

Reality

They're the same mechanism with different vendor names. OpenAI called it function calling first. Anthropic and Google use tool use. The underlying concept is identical: the model returns a structured call against a declared function signature, your system executes it, the result goes back to the model. Some modern APIs add parallel tool use or structured outputs on top, but the core pattern is shared across providers.

Myth

More tools always make an agent more capable.

Reality

Past 15-20 tools, model accuracy on tool selection starts to degrade noticeably. The model gets confused about which tool to use when descriptions overlap or when the tool set is too broad. Better: fewer, well-scoped tools with sharp descriptions. If an agent genuinely needs 50 capabilities, structure them as a hierarchy (a few high-level routing tools that then reveal sub-tools) or split into specialized sub-agents.

Myth

If the model calls a tool correctly, the answer is automatically right.

Reality

The model can call the right tool with wrong arguments, misinterpret the tool's response, or ignore parts of the response it doesn't understand. Production systems validate tool outputs before the model uses them and often run a reflection step where the model (or a second model) checks that the final answer is consistent with the tool results. Tool use reduces hallucination on factual lookups but doesn't eliminate it.

Related Solutions

AI Agent DevelopmentView →
Enterprise AI IntegrationView →
Agentic AutomationView →

Need help implementing this?

We build production AI systems for enterprises. Tell us what you are working on and we will scope it in 30 minutes.