Glossary

Vector Database

A vector database stores data as high-dimensional numerical vectors (embeddings) and enables fast similarity search across them. It is the core infrastructure behind RAG systems, semantic search, and recommendation engines in AI applications.

Vector Database

How It Works

Traditional databases search by exact matches. You query for a specific ID, keyword, or value. Vector databases work differently. They search by meaning. You give them a vector representing a concept, and they find the most similar vectors in the database.

This works because AI embedding models convert text (or images, or audio) into numerical vectors where similar items end up close together in the vector space. The sentence "how do I reset my password?" and "I forgot my login credentials" would have vectors that are near each other, even though they share no keywords.

In a RAG system, your documents get split into chunks, each chunk gets converted to a vector, and those vectors get stored in the database. When a user asks a question, the question gets converted to a vector too, and the database returns the closest matching chunks. Those chunks then go to the LLM as context for generating the answer.

Under the hood, vector databases use approximate nearest neighbor (ANN) algorithms to find similar vectors without comparing the query against every vector in the database. HNSW (Hierarchical Navigable Small World) is the dominant algorithm, used by Pinecone, Qdrant, Weaviate, and Milvus. IVF (Inverted File Index) and product quantization are used for larger-scale deployments where memory matters. The tradeoff: faster search means slightly lower recall. Most systems tune this by adjusting parameters like ef_construction and M (for HNSW) against a labeled eval set.

Popular vector databases include Pinecone (managed, simple API), Weaviate (open source with hybrid search built in), Qdrant (open source, Rust-based, fast), Milvus (open source, high-scale), and pgvector (a PostgreSQL extension, great when you already run Postgres). Each has different tradeoffs around scale, speed, hosting options, metadata filtering, and cost.

For enterprise deployments, the key decisions are: how many vectors you need to store (thousands vs billions changes the architecture), what latency you can tolerate (single-digit ms vs 100ms), whether you need hybrid search (combining vector and keyword search), whether you need strict multi-tenancy (isolated namespaces per customer), and whether the data needs to stay on your own infrastructure. For teams already running Postgres, pgvector avoids adding a second database and handles workloads up to about 10 million vectors comfortably. Above that, dedicated vector databases win on performance.

In Practice

The vector database market has consolidated around five main players. Pinecone is the managed SaaS choice, with serverless and dedicated tiers. Weaviate ships with hybrid search (BM25 + vector) and strong multi-tenancy. Qdrant is the fastest open-source option for single-node deployments. Milvus targets billion-scale workloads with distributed deployments via Zilliz Cloud. Pgvector extends PostgreSQL with vector types and HNSW indexing.

Typical configuration: HNSW index with M=16 and ef_construction=200 for general workloads, cosine similarity as the distance metric for normalized embeddings, metadata fields for tenant_id, document_id, and source for filtering, and namespaces or collections for strict multi-tenant isolation. Typical dimensions: 1024 or 1536. Latency budgets: sub-50ms for vector search at up to 10M vectors on a single node, 100-200ms at 100M+ vectors across a distributed cluster.

A production ingestion workflow. Documents arrive via a queue (SQS, Pub/Sub, Kafka). A worker parses, chunks, and embeds them. Embeddings are upserted to the vector DB in batches of 100-500 vectors per request for throughput. Each vector is tagged with metadata (tenant, source, ingested_at) for filtering. On the query side, the user question is embedded, a filter is built from request context (tenant_id at minimum), and a top-k search runs against the filtered namespace. Results are passed to the LLM or to a re-ranker before the LLM. Indexes are rebuilt or compacted weekly on high-churn corpora.

Worked Example

A SaaS customer-support platform serves 800 business customers, each with their own knowledge base. Total corpus: about 14 million chunks across all tenants. Vector dimension: 1536 using OpenAI text-embedding-3-small. Query volume: roughly 40 searches per second at peak, spread across all tenants.

The team starts on pgvector because their core app is already on Postgres. At 3 million vectors, p95 search latency sits at 120ms with filter-then-search against tenant_id. At 8 million vectors, latency drifts to 280ms and index rebuilds start taking too long during ingestion. They migrate to Pinecone using per-tenant namespaces.

Post-migration: p95 search drops to 42ms. Tenant isolation is strict (one tenant can't query another's namespace). Ingestion runs separately from query load. They configure a pod-based serverless Pinecone index with 1,536 dimensions, metric=cosine, and pod type matching their read throughput needs. Monthly vector DB spend: about $1,800, compared to the scaling pain they'd have faced with pgvector above 10 million vectors on their current Postgres tier.

On top of vector search, each query goes through a Cohere Rerank v3 pass on the top-20 before passing the top-5 to Claude Sonnet. End-to-end retrieval (embed + search + rerank) runs in about 310ms p95. Tenant admins never see cross-tenant leakage, which matters because some of them are direct competitors.

What People Get Wrong

Myth

Vector databases replace traditional databases.

Reality

They complement them. Vector databases handle similarity search. Relational and document databases still handle everything else: transactions, aggregations, exact lookups, joins, and authoritative source-of-truth data. Most AI apps run both, with the vector database as a search-and-retrieval layer and the traditional DB as the system of record. Using a vector DB for general-purpose storage is a common mistake.

Myth

You always need a dedicated vector database.

Reality

Pgvector handles up to roughly 10 million vectors comfortably on standard Postgres instances, with the benefit of keeping your data in one database. Elasticsearch, Redis, and MongoDB also support vector search now. If you're already running one of those and your scale is moderate, skip the extra dependency. Migrate to a dedicated vector DB when scale or performance demands it, not before.

Myth

Higher-dimension vectors always give better search quality.

Reality

Beyond a point, higher dimensions slow search, increase storage cost, and don't improve recall. 1024 or 1536 dimensions is the sweet spot for most enterprise workloads. Matryoshka embeddings let you truncate a trained vector to a smaller dimension with minimal quality loss, which is useful when storage or latency matters. Always measure on your domain test set before picking a dimension.

Related Solutions

Multimodal RAG SystemsView →
AI Knowledge BaseView →

Need help implementing this?

We build production AI systems for enterprises. Tell us what you are working on and we will scope it in 30 minutes.