All Posts Engineering

Vector Databases Explained: The Infrastructure Behind RAG Systems

January 20, 20268 min readContra Collective
⚙️

Vector Databases Explained: The Infrastructure Behind RAG Systems

Retrieval-Augmented Generation (RAG) has become the default architecture for AI systems that need to reason about your specific data — not just general world knowledge. At the center of most RAG architectures is a vector database, a technology that many engineers encounter for the first time when building AI systems.

Here's a clear-eyed explanation of what vector databases do, when you need one, and how to choose the right one for your use case.

What Is a Vector Database?

A vector database stores data as high-dimensional numerical vectors (embeddings) and allows efficient similarity search over those vectors. Instead of finding records that exactly match a query, vector databases find records that are semantically similar — closest in the high-dimensional space.

Why this matters for AI: When you convert text to embeddings using a model like OpenAI's text-embedding-3-large, you get a vector where semantically similar text is geometrically close. "What are your return policies?" and "How do I return a product?" will have similar vectors — even though they share no exact words. A vector search finds this semantic similarity.

The RAG Architecture

A typical RAG system works like this:

  1. Ingestion: Your documents (product descriptions, knowledge base articles, order history) are chunked into pieces and embedded into vectors. Those vectors, along with the original text, are stored in the vector database.
  2. Query: When a user asks a question, the query is also embedded into a vector.
  3. Retrieval: The vector database finds the chunks most similar to the query vector (nearest neighbor search).
  4. Generation: The retrieved chunks are included in the LLM prompt as context. The LLM generates an answer grounded in your actual data.

This architecture solves two fundamental LLM limitations: knowledge cutoff dates and context window size. Your vector database can index millions of documents; the LLM only needs to see the most relevant ones.

When Do You Need a Vector Database?

You need a vector database when:

  • You're building a system that must answer questions about a large corpus of proprietary documents
  • You need semantic search (find similar, not exact match)
  • You're implementing product recommendation systems
  • You need to detect duplicate or near-duplicate records
  • Your knowledge base is too large to fit in a single LLM context window

You probably don't need a dedicated vector database when:

  • You're working with a small corpus (< 10,000 documents) where in-memory search is sufficient
  • You just need keyword search
  • You're using an LLM for generation only and don't need to retrieve from your own data

The Options

Pinecone

Pinecone is the fully-managed option: no infrastructure to operate, autoscaling handled for you, and a clean API that's easy to start with.

Best for: Teams that want to move fast without managing vector database infrastructure. Pinecone's simplicity has a cost — it's not the cheapest at scale and the fully-managed model means less control.

pgvector

An extension to PostgreSQL that adds vector storage and similarity search. If you're already running Postgres, pgvector lets you add vector capabilities without a new database.

Best for: Organizations already on PostgreSQL that want to avoid adding a new system to their stack. Works well for medium-scale use cases. Performance doesn't match purpose-built vector databases at very large scale, but for most commercial RAG applications, it's entirely adequate.

Weaviate

Open-source, self-hostable, with multi-modal support (text, images, video). More configuration than Pinecone but more flexibility and no per-vector pricing.

Best for: Organizations with mature infrastructure teams who want to self-host and have multi-modal requirements. Also works well as a managed offering via Weaviate Cloud.

Qdrant

High-performance, open-source, Rust-based. Excellent filter performance (hybrid search — vector similarity + metadata filtering). Strong choice for production workloads where performance matters.

Chunking Strategy

Vector databases are only as good as the chunks you put into them. Poor chunking strategy is the most common reason RAG systems underperform.

Fixed-size chunking: Divide documents into fixed-length chunks with overlap. Simple, predictable, works well for uniform documents.

Semantic chunking: Use a model to identify natural semantic boundaries and chunk at those points. More expensive but produces better retrieval quality.

Hierarchical chunking: Store document summaries alongside paragraph-level chunks. Query returns summaries first, then retrieves relevant paragraphs. Works well for long documents.

The right chunking strategy depends on your document types. Long-form content (documentation, legal documents) benefits from hierarchical chunking. Short-form content (product descriptions, FAQ answers) often works fine with fixed-size.

Want to discuss this topic?

Start a Conversation