Glossary

Token (LLM)

A token is the basic unit of text that a large language model processes. It can be a word, part of a word, or a punctuation mark. Language models read, process, and generate text as sequences of tokens, and API pricing is based on the number of tokens used.

How It Works

Language models do not read text the way humans do. They break text into tokens, which are typically 3-4 characters long. The word "understanding" might be two tokens: "under" and "standing." Common words like "the" are a single token. Rare or technical terms get split into more tokens.

This matters for two practical reasons. First, pricing. API providers charge per token, with separate rates for input tokens (your prompt) and output tokens (the model's response). A long prompt with lots of context costs more than a short one. Understanding your token usage is essential for managing AI costs at scale.

Second, token limits. Every model has a maximum number of tokens it can handle in a single request (its context window). This includes both your input and the model's output. If your prompt is too long, you need to either shorten it, use a model with a larger context window, or redesign your approach.

In RAG systems, token management is a key design decision. You need to fit the user's question, the retrieved documents, the system prompt, and leave room for the model's response, all within the context window. This is why chunking strategy and retrieval count matter: you want to maximize relevant context without hitting the token limit.

As a rough rule of thumb, 1 token is about 0.75 words in English. A 1000-word document is roughly 1300 tokens. A typical enterprise LLM call might use 2000-5000 tokens for input and 500-2000 tokens for output.

Related Solutions

Generative AI ApplicationsView →
AI Agent DevelopmentView →

Need help implementing this?

We build production AI systems for enterprises. Tell us what you are working on and we will scope it in 30 minutes.