Tagged18 Articles

#infrastructure

Posts tagged with #infrastructure from the Contra Collective team.

Jun 1, 2026AI

MLX Continuous Batching: Throughput Architecture on Apple Silicon (2026)

Continuous batching is the single largest throughput unlock for transformer inference on any hardware. NVIDIA stacks have spent three years optimizing around it: vLLM, TensorRT-LLM, and SGLang all converge on the same pattern of paged KV cache, request scheduling, and prefill/decode interleaving. MLX is younger, the runtime is different, and most NVIDIA intuitions do not survive contact with Apple's unified memory architecture.

Jun 1, 2026Engineering

RAG Over Shopify Plus Product Catalogs: Architecture for Enterprise Storefronts (2026)

RAG over a Shopify Plus product catalog is not the same problem as RAG over documents, knowledge bases, or codebases. The data is structured, it mutates constantly through orders and inventory updates, it has hard relevance signals from sales velocity and margin, and it lives behind a platform with strong opinions about how you read and write it. The generic LangChain tutorial that embeds your documents into Pinecone and calls it done falls apart at enterprise catalog scale within the first week of production traffic.

May 31, 2026Engineering

Sanity on Shopify Hydrogen: Headless CMS Integration Guide (2026)

Shopify's native content management was never the strength of the platform. Custom metafields and Online Store 2.0 sections solve the simple cases. Once you have marketing teams who want to ship landing pages weekly, brand campaigns that span multiple regions, or editorial content that lives alongside the catalog, the native tooling runs out. Hydrogen makes the gap obvious because content rendering moves into your application code and your CMS choice becomes a first class architectural decision.

May 28, 2026AI

GGUF vs MLX Quantization Formats on Apple Silicon: A Practical Comparison (2026)

If you run LLMs locally on a Mac, you have probably been asked to choose between a GGUF file from Hugging Face and an MLX version of the same model. The default advice is to pick whichever your runtime supports and move on. That advice is wrong often enough to matter. The two formats quantize weights differently, store metadata differently, and behave differently under load. The right choice depends on your model size, your hardware, and what you are optimizing for.

May 27, 2026Engineering

Typesense on Shopify Hydrogen: Headless Search Architecture (2026)

Shopify's built in Storefront API search works fine for catalogs under roughly 5,000 SKUs and shoppers who arrive with a clear query in mind. Once you cross 10,000 SKUs, add faceted filtering on more than three attributes, or need ranking customization (boost in stock items, demote slow movers, surface new arrivals on certain queries), the native search path stops being sufficient. The enterprise Hydrogen storefronts we work on at Contra Collective almost always reach for a dedicated search index by the time the catalog gets serious.

May 26, 2026Engineering

Shopify Hydrogen vs Next.js Commerce: Headless Frameworks Compared (2026)

The headless Shopify decision in 2026 has narrowed to two serious frameworks: Shopify Hydrogen and Next.js Commerce. Both ship a React storefront detached from Liquid. Both target Shopify Plus merchants who want full design control and a modern frontend stack. They make meaningfully different trade-offs on data fetching, rendering, hosting, and how deep the integration with Shopify's primitives runs.

May 25, 2026Engineering

Sanity vs Contentful vs Strapi: Headless CMS Compared for Commerce (2026)

The headless CMS market in 2026 has consolidated around three names that show up on every enterprise commerce evaluation: Sanity, Contentful, and Strapi. Each one wins on a different axis. Contentful is the safe managed bet with the deepest enterprise feature set. Sanity is the developer-experience pick with the best structured content tooling in the market. Strapi is the open-source self-hosted option for teams that want full control of their content infrastructure.

May 25, 2026AI

vLLM on Apple Silicon: Does MLX Integration Actually Work in 2026?

vLLM is the production-grade inference engine that won the throughput conversation on CUDA hardware. Continuous batching, PagedAttention, prefix caching, speculative decoding. None of that, historically, ran on Apple Silicon. Search data for terms like omlx vs llama.cpp, vmlx, and vllm vs mlx reveals real demand for a bridge between the two stacks, much of it expressed as typos because the integration story is genuinely confusing.

May 24, 2026Engineering

Meilisearch vs Typesense vs Algolia: Headless Commerce Search Tested (2026)

The search infrastructure decision for headless commerce has become more interesting, not less, since Algolia stopped being the only serious answer. In 2026, three open-or-managed options dominate the category: Algolia (managed, mature, expensive), Typesense (open-source, simple, fast), and Meilisearch (open-source, developer-experience-led, increasingly capable). Pick the wrong one and you are either paying too much, operating too much, or fighting the engine's defaults for the next two years.

May 24, 2026AI

MLX-LM Server vs llama-server: Which Local Inference Server for Apple Silicon (2026)

If you have decided to host an LLM on Apple Silicon and you have already picked your runtime (MLX or llama.cpp), the next question is the server. Both projects ship an HTTP server that speaks the OpenAI API: mlx_lm.server from the MLX-LM project, and llama-server from llama.cpp. Either one will turn a loaded model into a /v1/chat/completions endpoint your backend can call. The interesting question is which one belongs in your production stack.

May 24, 2026AI

MLX vs vLLM: Architecture and Performance for Apple Silicon Production Inference

When teams deploy local inference on M-series hardware, they face an architectural fork. MLX is native: it targets Apple Silicon directly, uses Metal acceleration natively, and integrates tightly with the platform. vLLM is portable: it brought GPU serving patterns to Apple Silicon through Metal support, brings production-grade batching, and treats your Mac like a server. Both will run models. Only one fits your workload.