Glossary

Embedding (AI)

An embedding is a numerical representation of data (text, images, audio) as a vector of numbers. Embeddings capture the semantic meaning of content so that similar items have similar vectors, enabling AI systems to search, compare, and cluster information by meaning.

How It Works

Computers work with numbers, not meaning. Embeddings bridge that gap. When you pass a sentence through an embedding model, you get back a list of numbers (typically 768 to 3072 dimensions) that represents what that sentence means. Similar sentences get similar numbers.

This is what makes semantic search possible. Instead of matching keywords, you compare the embedding of a query against the embeddings of your documents. "Annual revenue" and "yearly income" would match because their embeddings are close together, even though they share no words.

Embedding models are trained on large datasets to learn these representations. Commercial options include OpenAI's text-embedding-3-large (3072-dim), Cohere embed-v3 (1024-dim), and Voyage AI's voyage-3 (1024-dim). Open models include BGE-large (1024-dim), E5-mistral (4096-dim), and Nomic Embed. The choice of model affects the quality of your search and retrieval. Better models capture more nuance but may be slower or more expensive to run.

In practice, embeddings are the foundation of any RAG system. Your documents get embedded once and stored in a vector database. Each user query gets embedded at runtime and compared against the stored vectors using cosine similarity or dot product. The quality of this embedding step directly affects whether the right documents get retrieved.

Beyond search, embeddings are used for clustering (grouping similar documents), classification (categorizing content with a small classifier on top), anomaly detection (finding outliers in vector space), and recommendation systems (suggesting similar items by nearest-neighbor lookup).

One nuance that trips teams up: embedding models have their own context limits (often 512 or 8192 tokens) and quirks. Two embedding models from different providers produce incompatible vectors. If you switch embedding models, you have to re-embed your entire corpus. Pick carefully upfront because the migration cost is real. Also, embeddings trained on general web text may not capture domain-specific meaning well. For legal, medical, or highly technical corpora, a domain-tuned embedding model often beats a general one by a wide margin on retrieval benchmarks.

In Practice

Most production RAG stacks use OpenAI's text-embedding-3-large or text-embedding-3-small (1536-dim at $0.02 per million input tokens), Cohere embed-v3 with dedicated variants for English, multilingual, and code, or open models like BGE-large and E5 served via Hugging Face Text Embeddings Inference on a single GPU. Specialty embeddings from Voyage AI are popular for code, legal, and financial documents.

Typical configuration: 1024 or 1536 dimensions (matryoshka models let you truncate at query time), normalized vectors for cosine similarity, batch size 64-256 at ingestion for throughput, and a re-embed cadence of once per document version rather than on every change. Cost at scale: embedding 10 million chunks at 512 tokens each with text-embedding-3-small runs about $100. The same corpus on Voyage code embeddings runs more but can lift retrieval recall by 5-15% on code-heavy queries.

A common workflow for evaluation. Build a labeled test set of 200-500 query-document pairs representative of real user questions. Run the same set through two or three candidate embedding models. Measure recall@5 and MRR (mean reciprocal rank). Pick the model that wins the most questions, not just average score. Re-run the eval quarterly as embedding models get updated, and never switch models in production without re-embedding the full index.

Worked Example

A legal tech company builds an internal search over 2.3 million case documents for litigation discovery. The first build uses text-embedding-3-small (1536-dim) across the corpus, with LlamaIndex-generated chunks of 512 tokens and 50-token overlap.

Initial retrieval on a curated eval set of 400 attorney queries gets recall@5 of 0.68. Not great. The team suspects the general-purpose embedding model is missing legal-specific meaning, especially around procedural terms and jurisdictional phrasing. They run the same 400 queries against Voyage AI's voyage-law-2 embeddings (1024-dim, trained on legal text). Recall@5 jumps to 0.84. MRR improves from 0.51 to 0.72.

The tradeoff: voyage-law-2 costs more per call than OpenAI's model, and the team has to re-embed all 2.3 million chunks. Total re-embedding cost: about $3,800 and 6 hours on a batched ingestion job. Ongoing query costs increase by roughly 30%. The quality gain is worth it: attorneys find the right case on the first page for 84% of queries instead of 68%, which is the difference between the tool being used and being ignored. The re-embedding cost was recovered in the first month of higher engagement.

What People Get Wrong

Myth

Higher-dimension embeddings are always better.

Reality

Dimensionality is a trade-off between precision and cost. 1536-dim often matches or beats 3072-dim on standard retrieval benchmarks while using half the storage and compute. Matryoshka embeddings let you truncate a trained vector to a smaller size at negligible quality cost. Start at 1024 or 1536 and only go higher if your eval shows a measurable gain.

Myth

Embedding models from different providers are interchangeable.

Reality

They're not. Vectors from OpenAI and Cohere live in different spaces. You can't query one with the other. Switching embedding models means re-embedding your entire corpus, which costs real money and engineering time for large indexes. Pick your embedding model with migration cost in mind.

Myth

General-purpose embeddings work fine for any domain.

Reality

They work okay for most domains and poorly for some. Code, legal text, medical notes, and scientific papers all have specialized terminology that general models capture weakly. Domain-tuned embeddings (voyage-law, voyage-code, PubMedBERT) typically beat general models by 5-20% on domain-specific retrieval tasks. Always eval against a domain test set before committing.

Related Terms

Vector DatabaseView →

Retrieval Augmented Generation (RAG)View →

Semantic SearchView →

Chunking (RAG)View →

Need help implementing this?

We build production AI systems for enterprises. Tell us what you are working on and we will scope it in 30 minutes.

Book a Free Consultation Contact Us

Embedding (AI)

How It Works

In Practice

Worked Example

What People Get Wrong

Related Terms

Related Solutions

Related Reading

Need help implementing this?