All Posts Engineering

Vector Databases Explained: The Infrastructure Behind RAG Systems

January 20, 20268 min readContra Collective
⚙️

Retrieval Augmented Generation (RAG) has become the default architecture for AI systems that need to reason about your specific data, not just general world knowledge. At the center of most RAG architectures is a vector database, a technology that many engineers encounter for the first time when building AI systems.

Here's a clear eyed explanation of what vector databases do, when you need one, and how to choose the right one for your use case.

What Is a Vector Database?

A vector database stores data as high dimensional numerical vectors (embeddings) and allows efficient similarity search over those vectors. Instead of finding records that exactly match a query, vector databases find records that are semantically similar, closest in the high dimensional space.

Why this matters for AI: When you convert text to embeddings using a model like OpenAI's text-embedding-3-large, you get a vector where semantically similar text is geometrically close. "What are your return policies?" and "How do I return a product?" will have similar vectors, even though they share no exact words. A vector search finds this semantic similarity.

The RAG Architecture

A typical RAG system works like this:

  1. Ingestion: Your documents (product descriptions, knowledge base articles, order history) are chunked into pieces and embedded into vectors. Those vectors, along with the original text, are stored in the vector database.
  2. Query: When a user asks a question, the query is also embedded into a vector.
  3. Retrieval: The vector database finds the chunks most similar to the query vector (nearest neighbor search).
  4. Generation: The retrieved chunks are included in the LLM prompt as context. The LLM generates an answer grounded in your actual data.

This architecture solves two fundamental LLM limitations: knowledge cutoff dates and context window size. Your vector database can index millions of documents; the LLM only needs to see the most relevant ones.

When Do You Need a Vector Database?

You need a vector database when:

  • You're building a system that must answer questions about a large corpus of proprietary documents
  • You need semantic search (find similar, not exact match)
  • You're implementing product recommendation systems
  • You need to detect duplicate or near duplicate records
  • Your knowledge base is too large to fit in a single LLM context window

You probably don't need a dedicated vector database when:

  • You're working with a small corpus (< 10,000 documents) where in memory search is sufficient
  • You just need keyword search
  • You're using an LLM for generation only and don't need to retrieve from your own data

The Options

Pinecone

Pinecone is the fully managed option: no infrastructure to operate, autoscaling handled for you, and a clean API that's easy to start with.

Best for: Teams that want to move fast without managing vector database infrastructure. Pinecone's simplicity has a cost. It's not the cheapest at scale and the fully managed model means less control.

pgvector

An extension to PostgreSQL that adds vector storage and similarity search. If you're already running Postgres, pgvector lets you add vector capabilities without a new database.

Best for: Organizations already on PostgreSQL that want to avoid adding a new system to their stack. Works well for medium scale use cases. Performance doesn't match purpose built vector databases at very large scale, but for most commercial RAG applications, it's entirely adequate.

Weaviate

Open source, self hostable, with multi modal support (text, images, video). More configuration than Pinecone but more flexibility and no per vector pricing.

Best for: Organizations with mature infrastructure teams who want to self host and have multi modal requirements. Also works well as a managed offering via Weaviate Cloud.

Qdrant

High performance, open source, Rust based. Excellent filter performance (hybrid search: vector similarity + metadata filtering). Strong choice for production workloads where performance matters.

Chunking Strategy

Vector databases are only as good as the chunks you put into them. Poor chunking strategy is the most common reason RAG systems underperform.

Fixed size chunking: Divide documents into fixed length chunks with overlap. Simple, predictable, works well for uniform documents.

Semantic chunking: Use a model to identify natural semantic boundaries and chunk at those points. More expensive but produces better retrieval quality.

Hierarchical chunking: Store document summaries alongside paragraph level chunks. Query returns summaries first, then retrieves relevant paragraphs. Works well for long documents.

The right chunking strategy depends on your document types. Long form content (documentation, legal documents) benefits from hierarchical chunking. Short form content (product descriptions, FAQ answers) often works fine with fixed size.

More from the Lab

⚙️Engineering
Engineering

When Agencies Build Their Own Tools: Two Cases From Our Stack in 2026

There is a familiar pattern in agency operations: you adopt a commercial tool because it solves 80% of the problem, then spend the next two years working around the remaining 20%. Eventually the workarounds accumulate, the friction compounds, and someone on the team says the quiet part out loud. We could just build this.

Apr 12, 2026
⚙️Engineering
Engineering

Vercel vs Cloudflare Pages: Edge Deployment for Commerce in 2026

The edge deployment market looked very different three years ago. Vercel was the obvious choice for teams building on Next.js, and Cloudflare Pages was a static site host trying to grow up. In 2026, that picture has changed substantially. Cloudflare has built a credible full-stack deployment platform with a global edge network, a growing Workers ecosystem, and pricing that makes Vercel's enterprise tier look expensive.

Apr 11, 2026
⚙️Engineering
Engineering

Vercel vs Netlify: Frontend Deployment for Headless Commerce Teams in 2026

There was a period when Vercel and Netlify were nearly interchangeable: both deployed JAMstack sites, both handled forms and serverless functions, both offered preview deployments on pull requests. That period is over. The two platforms have made fundamentally different product bets over the last two years, and those bets create meaningfully different outcomes depending on your stack.

Apr 11, 2026

Want to discuss this topic?

Start a Conversation