All Posts
EngineeringJune 1, 2026

RAG Over Shopify Plus Product Catalogs: Architecture for Enterprise Storefronts (2026)

RAG over a Shopify Plus product catalog is not the same problem as RAG over documents, knowledge bases, or codebases. The data is structured, it mutates constantly through orders and inventory updates, it has hard relevance signals from sales velocity and margin, and it lives behind a platform with strong opinions about how you read and write it. The generic LangChain tutorial that embeds your documents into Pinecone and calls it done falls apart at enterprise catalog scale within the first week of production traffic.

RAG over a Shopify Plus product catalog is not the same problem as RAG over documents, knowledge bases, or codebases. The data is structured, it mutates constantly through orders and inventory updates, it has hard relevance signals from sales velocity and margin, and it lives behind a platform with strong opinions about how you read and write it. The generic LangChain tutorial that embeds your documents into Pinecone and calls it done falls apart at enterprise catalog scale within the first week of production traffic.

This is the architecture pattern we ship for enterprise Shopify Plus clients running AI-powered search, conversational commerce, and personalized recommendations on top of catalogs with 50K to 5M SKUs. It covers the embedding pipeline, the vector store choice, the query path, the ranking layer, and the integration points that keep the system consistent with the underlying Shopify data.

What the Catalog Actually Looks Like

A Shopify Plus catalog is more than the product table. The data that matters for retrieval lives across:

The product object itself: title, description, vendor, product type, tags, options, status. This is the surface.

Variants: SKU, price, compare-at price, inventory, weight, custom variant titles. A product without variant data is unsellable; recommendations without variant context are wrong.

Metafields: the actual content layer. Marketing copy, technical specs, fit guides, structured attributes, related products, collection memberships beyond what Shopify's collection logic captures. Most enterprise catalogs put 60 to 80 percent of the differentiating product information in metafields.

Inventory and locations: real-time availability per location. A perfect semantic match for a product that is out of stock at the customer's nearest fulfillment center is a worse recommendation than a slightly less relevant in-stock alternative.

Sales velocity, conversion rate, return rate: the signals that turn semantic retrieval into commerce retrieval. A semantically excellent product that converts at 0.2 percent is not a good recommendation; the model does not know that without the signal being engineered in.

A useful embedding for commerce RAG has to capture text, structured attributes, and merchandising context in one representation. That shapes everything downstream.

The Embedding Pipeline

The naive pattern is: dump product titles and descriptions into an embedding model, store the vectors, query. That ships in a day, breaks within a week of real traffic, and is impossible to debug because the failure modes blur together.

The pattern we use:

# Conceptual structure. The actual implementation lives behind an admin API.
def build_product_document(product, variants, metafields, signals):
    return {
        "id": product["id"],
        "title": product["title"],
        "vendor": product["vendor"],
        "product_type": product["product_type"],
        "tags": product["tags"],
        "description_text": clean_html(product["body_html"]),
        "structured_attributes": extract_attributes(metafields),
        "variant_summary": summarize_variants(variants),
        "price_range": (min_price(variants), max_price(variants)),
        "collections": product.get("collections", []),
        "sales_velocity_30d": signals.get("sales_velocity_30d", 0),
        "conversion_rate_30d": signals.get("conversion_rate_30d", 0),
        "in_stock_locations": [v["location_id"] for v in variants if v["inventory"] > 0],
        "updated_at": product["updated_at"],
    }

def build_embedding_text(doc):
    # The text that goes into the embedding model is composed,
    # not just the raw description
    parts = [
        f"Product: {doc['title']}",
        f"Vendor: {doc['vendor']}",
        f"Type: {doc['product_type']}",
        f"Tags: {', '.join(doc['tags'])}",
        f"Description: {doc['description_text']}",
        f"Attributes: {format_attributes(doc['structured_attributes'])}",
        f"Variants: {doc['variant_summary']}",
    ]
    return "\n".join(parts)

The embedding text is composed deliberately. The order matters because most modern embedding models weight earlier tokens slightly more, and starting with the title and vendor gives the right priors. Structured attributes get serialized into a stable format because consistency across the catalog matters more than prose quality.

The model choice is workload-specific. For most Shopify Plus catalogs we ship OpenAI text-embedding-3-large or Cohere embed-multilingual-v3.0. Open weight options (Stella, BGE-M3) are competitive on quality and meaningfully cheaper at catalog scale, but the operational cost of self-hosting embedding inference rarely pencils out below 5M SKUs.

The Vector Store Decision

For Shopify Plus scale, the practical choices are Pinecone, Qdrant, and Weaviate. We covered the head-to-head in our vector database comparison for commerce. The short version for catalog RAG:

Pinecone wins when the team does not want to operate infrastructure. Managed, predictable, the filtering ergonomics are weaker than competitors but adequate for most catalogs.

Qdrant wins on cost and on filtering. Self-hosted on a single moderate instance handles 5M SKUs comfortably. Hybrid sparse+dense search is first class. The right answer for cost-sensitive enterprise deployments with infrastructure capacity.

Weaviate wins when filtering complexity is the bottleneck. Multi-tenant catalogs (multiple brands, multiple regions) and catalogs with deep faceted filtering benefit from Weaviate's class-based schema model.

Typesense and Meilisearch are also viable for the lexical + vector hybrid pattern. We have shipped Typesense as the primary search layer on Hydrogen storefronts and use the vector store as a secondary signal rather than the primary retrieval path. That hybrid pattern is often the right architecture even when you have a dedicated vector store available.

The Query Path

Inference time architecture for catalog RAG has three stages.

Stage one: query understanding. The user input ("waterproof hiking boots under $200 size 10") is parsed into a semantic embedding plus a filter expression. We do this with a fast model (Haiku 4.5, GPT-5.5 Mini, or a fine-tuned small model) and a structured output schema. The filter expression hits exact catalog facets (price, size, in-stock); the embedding hits semantic similarity. Skipping the structured parse and relying on the embedding alone produces terrible commerce results because price and size are categorical, not semantic.

Stage two: hybrid retrieval. Vector similarity gives you semantically relevant products. Lexical search (Typesense, Elasticsearch) gives you exact term matches that the embedding misses (specific SKUs, vendor names, model numbers). The fusion pattern is reciprocal rank fusion or weighted score combination, tuned per category. For categories with strong brand affinity (fashion, electronics), lexical weight is higher. For exploratory categories (home decor, gifts), vector weight is higher.

Stage three: ranking. The retrieval set comes back at 50 to 200 candidates. The ranking layer applies commerce signals: in-stock availability at the customer's location, sales velocity, margin, personalization scores, business rules (promoted vendors, current campaigns). This is where a lot of generic RAG implementations fail because they treat retrieval as the final answer. In commerce, retrieval is the input to ranking, and ranking is what determines what the customer actually sees.

Keeping It Consistent with Shopify

Shopify is the source of truth. The RAG stack is downstream. The two get out of sync without explicit infrastructure to keep them aligned.

The patterns that work:

Webhook-driven incremental updates. Subscribe to products/create, products/update, products/delete, inventory_levels/update. Process them through a queue (we usually use Pub/Sub on GCP or SQS on AWS) that fans out to the embedding pipeline. End-to-end latency from a Shopify admin change to a refreshed vector is usually 5 to 30 seconds.

Periodic full reconciliation. Webhooks miss events under failure scenarios. A nightly job that paginates the full catalog and reconciles vector store state against Shopify is non-negotiable for catalogs above 100K SKUs.

Cache invalidation tied to inventory. If a product goes out of stock at a location, downstream caches (Hydrogen ISR, edge KV, application caches) need to know. The cleanest pattern routes inventory updates through the same event bus as product updates.

Soft deletes. Hard deleting vectors causes brief periods where a product is missing from search before the next reconciliation. Marking deleted in the vector store and filtering at query time is more forgiving.

When This Applies to Your Stack

This architecture is right when:

The catalog has more than 50K SKUs and product copy is rich enough that lexical search is missing relevant results.

The brand is shipping conversational commerce, AI search, or personalized recommendations that need to ground in real product data rather than hallucinated answers.

There is engineering capacity to operate an embedding pipeline and a vector store, or budget for the managed alternatives.

It is overkill when the catalog is small (under 10K SKUs), the team has not exhausted Shopify Search & Discovery or Algolia, or the AI feature is a proof-of-concept rather than a production commitment. Those cases are better served by Shopify's native tools or a managed search vendor with semantic search add-ons.

Where Contra Collective Comes In

We build the AI integration layer that sits between Shopify Plus and customer-facing applications: embedding pipelines, vector store architecture, hybrid retrieval with ranking, conversational commerce backends, and the operational tooling that keeps the AI layer consistent with the catalog source of truth. If you are evaluating RAG, AI search, or recommendation upgrades for a Shopify Plus storefront and want to skip the 6-month learning curve, we have shipped the pattern above for enterprise brands across fashion, electronics, and home goods.

FAQ

Why not just use Shopify's native search with semantic search add-ons? For catalogs under 50K SKUs and standard use cases, that is the right answer. For larger catalogs, conversational commerce, or AI features that need to reason over product context, a dedicated RAG layer gives meaningfully more control and better outcomes.

What embedding model should we use? For most Shopify Plus catalogs, OpenAI text-embedding-3-large or Cohere embed-multilingual-v3.0. Open weight alternatives like BGE-M3 are competitive on quality but the operational cost of self-hosting embedding inference rarely pencils out below 5M SKUs.

How do we handle inventory and stock changes in the vector store? Inventory does not live in the vector itself. The vector represents the product. Stock status is a filter applied at query time, sourced from a fast key-value store updated by inventory_levels/update webhooks. This separation keeps embedding regeneration costs low while preserving real-time availability.

Does this work with Hydrogen? Yes, and it slots in well. Hydrogen's Oxygen edge runtime can call your RAG backend directly. We typically deploy the RAG service on Cloud Run or Fargate with a thin GraphQL layer that Hydrogen consumes alongside the Storefront API. ��

[ 02 ] — Keep Reading

More from the lab.

Jun 8, 2026Engineering

CrewAI vs AutoGen: Multi-Agent AI Frameworks for Enterprise Teams in 2026

Multi-agent AI systems crossed a threshold in 2025. They moved from research curiosity to production infrastructure at companies that can afford to find out what breaks. The frameworks that emerged to manage these systems now face the same scrutiny any production dependency faces: stability, debuggability, vendor lock-in, and the cost of the person who maintains it at 2am when something fails.

Ready when you are

Want to discuss this topic?

Start a Conversation