Glossary

Multi-Agent Systems

A multi-agent system is an architecture where multiple AI agents, each with a specific role or skill, collaborate to accomplish a larger task. Agents can delegate work to each other, share context, and coordinate their actions.

Multi-Agent Systems

How It Works

A single AI agent handles one workflow well. But some tasks are too complex or too broad for one agent. Multi-agent systems split the work across specialized agents that each handle a piece of the problem.

Consider an example: processing a complex insurance claim. One agent reads and extracts data from the claim documents. Another agent checks the extracted data against policy rules. A third agent calculates the payout. A supervisor agent coordinates the whole flow and handles exceptions. Each agent is simpler and more reliable than one giant agent trying to do everything.

The coordination layer is what makes multi-agent systems different from just running multiple independent agents. Agents pass messages, share state, and follow a defined workflow. Architectures vary: centralized (a supervisor agent assigns tasks to specialists), hierarchical (supervisors of supervisors for deep workflows), and decentralized (peer agents negotiate and hand off work directly). Centralized designs are easier to debug. Decentralized designs can tolerate individual agent failures better but are harder to reason about.

Multi-agent systems work well for enterprise workflows that cross departments or require different types of expertise. Procurement processes, audit workflows, loan underwriting, and customer onboarding journeys typically involve multiple checks and approvals in sequence, which map cleanly onto specialized agents.

The tradeoff is complexity. Multi-agent systems are harder to build, test, and debug than single agents. You need clear contracts between agents (often defined as JSON schemas for messages), good observability to understand what's happening when something goes wrong, and careful cost management since every handoff adds LLM calls. A multi-agent flow with 5 agents and 3 turns each burns 15 LLM calls per task instead of 3.

When not to use multi-agent: when a single agent with a few tools can do the job. Teams often reach for multi-agent architectures because they look sophisticated, then spend months debugging agent-to-agent communication that a single well-scoped agent would have handled in half the code. The bar to add another agent should be "this specialization has a clear, testable boundary," not "this sounds cooler as a multi-agent system." Start single-agent. Split only when the single agent is genuinely struggling or when you need parallel execution across independent subtasks.

In Practice

Multi-agent frameworks include CrewAI (role-based agents with task assignments), AutoGen from Microsoft (conversational agents with group chat patterns), LangGraph (state-graph-based multi-agent flows), and OpenAI Swarm (experimental lightweight handoffs). Anthropic's Claude Sonnet, Claude Opus, and GPT-4o are the common backbones, with Claude Haiku or GPT-4o mini used for simpler specialist agents to control cost.

Typical configuration: 3-7 specialized agents per system (more than that usually means you haven't factored the problem cleanly), explicit handoff protocols defined as Pydantic schemas, shared state stored in a Redis or Postgres-backed memory layer, and per-agent step budgets of 5-10 iterations. Cost per multi-agent task often runs 5-20x a single-agent baseline, so teams set budget caps in the orchestrator and terminate tasks that exceed them.

A working architecture. A supervisor agent reads the inbound task and decomposes it into subtasks. Each subtask is dispatched to a specialist agent via a typed message. Specialists complete their subtasks in parallel where possible, and write results to shared state. The supervisor reads the results, checks for conflicts or gaps, and either dispatches follow-up work or composes the final output. Everything is traced to LangSmith or Langfuse with agent-level spans so you can see who called whom, what tokens were spent, and where a run went wrong. Failure handling at agent boundaries matters: a specialist timing out should not bring the whole system down.

Worked Example

A commercial insurance carrier deploys a multi-agent system for small-business claims under $25k. A claim arrives with scanned documents, a description, and photos. The orchestrator (built on LangGraph) dispatches the claim to a supervisor agent.

The supervisor calls four specialists in parallel. Document Agent (Claude Haiku) extracts structured data from the scanned forms using OCR plus LLM parsing. Policy Agent (Claude Sonnet) retrieves the policyholder's coverage from the policy DB and checks deductibles and exclusions. Photo Agent (Claude Opus with vision) analyzes the damage photos and produces a preliminary damage estimate. Fraud Agent (a fine-tuned smaller model) scores the claim against historical fraud patterns.

The supervisor receives all four results, reconciles them, and applies a decision rule. Claims with consistent specialist outputs and a fraud score under 0.2 are auto-approved and sent to the payment agent. Claims with conflicts or fraud scores above 0.5 are routed to a human adjuster with a summary of what each specialist found. Claims in the middle go to a second-pass review with an additional document agent checking for missing paperwork.

On 4,200 pilot claims, roughly 38% were auto-approved in under 3 minutes, averaging $0.42 in compute. Another 22% auto-rejected with clear reasons. The remaining 40% went to human review, but with all specialist outputs already prepared, which cut adjuster time per claim from 28 minutes to 9. Cycle time dropped from 6 days to 18 hours for the auto-approved tier.

What People Get Wrong

Myth

Multi-agent systems are always better than single agents.

Reality

They're often worse. More agents means more LLM calls, more places to fail, and harder debugging. Single well-scoped agents with good tools beat over-engineered multi-agent flows on most enterprise tasks. The rule: reach for multi-agent only when you have genuine specialization (different skills, different contexts, different models) or a need for parallel execution. If one agent can do the work sequentially, use one agent.

Myth

Agents can negotiate and figure out coordination on their own.

Reality

Emergent coordination between LLM agents is unreliable in practice. Production multi-agent systems use explicit protocols: defined message schemas, named roles, clear handoff rules, and a supervisor or state graph that controls the flow. Agents that chat to decide who does what tend to loop, get confused, or both. The coordination layer needs to be engineered, not hoped for.

Myth

Adding agents makes the system more reliable through redundancy.

Reality

Usually the opposite. Every agent handoff is another place for errors to creep in. Messages get misinterpreted, specialists work on stale state, and a single failure can cascade. Reliability in multi-agent systems comes from good contracts, strict input validation at every boundary, and explicit fallback paths, not from just having more agents.

Related Solutions

Multi-Agent SystemsView →
Agentic AutomationView →

Need help implementing this?

We build production AI systems for enterprises. Tell us what you are working on and we will scope it in 30 minutes.