All Posts Engineering

OpenAstra: Agent Infrastructure Built for Production

February 25, 20267 min readContra Collective
⚙️

The Contra Collective team kept hitting the same wall on agentic projects: the infrastructure layer was consuming more engineering time than the actual agent logic. Memory management, multi channel routing, tool execution, swarm coordination, each project meant rebuilding the same scaffolding from scratch, or contorting a framework that wasn't designed for production use.

So we built OpenAstra.

OpenAstra is a self hosted agent runtime. It runs on Node 20+ and ships with the production infrastructure we engineered to solve these problems internally. The problems it solves aren't unique to us. Every engineering team building serious agentic systems runs into the same gaps.

The Problem We Were Actually Solving

The honest answer is that most agent tooling is built for demos. It works beautifully in a Jupyter notebook, falls apart under real operational conditions.

The specific failure modes we kept hitting:

Memory without structure. Every agent framework offers a vector store. None of them have a coherent answer for how to handle the difference between what an agent is reasoning about right now, what it did last week, and what it knows about your business domain. We were hand rolling memory tiering on every project.

Channel sprawl. Our internal agents needed to operate across Slack, email, GitHub, and other surfaces. Building a separate integration per channel, per project, was unsustainable. We needed a runtime that treated multi channel as a first class concern, not an afterthought.

No permission model. In our own workflows, agents touching customer data should never have access to deployment infrastructure. Agents doing research shouldn't have payment system access. Most swarm implementations have no answer for this. We needed real permission sandboxing baked into the runtime, not bolted on.

Inference cost opacity. Running multiple agents across multiple LLM providers with no visibility into where spend was going was a recurring operational problem. We needed cost tracking built into the runtime, not an external spreadsheet.

OpenAstra was engineered to solve all four.

What We Shipped

A 5 Tier Memory Architecture

The most significant engineering investment in OpenAstra is the memory system. Instead of a single vector store, we built five distinct memory layers:

  • Working memory - current reasoning context
  • Episodic memory - timestamped record of past actions and outcomes
  • Semantic memory - domain knowledge in vector form
  • Procedural memory - learned tool usage patterns
  • Shared memory - cross agent state for swarm coordination

Retrieval across these layers uses Reciprocal Rank Fusion (RRF), merging results from pgvector (semantic search) and Typesense (keyword and structured search) into a single relevance ranked output. This matters because real world agent queries are never purely semantic or purely keyword. They're usually both. RRF handles the blend correctly.

Multi Channel Integration as a First Class Primitive

OpenAstra normalizes inputs from Slack, GitHub, X, email, and other channels into a consistent event format at the runtime level. The same agent handles a Slack message, a GitHub webhook, and an incoming email without channel specific code paths.

This was one of the first things we built, because it was one of the first things we needed. Our internal agents don't live in a single channel. They operate wherever work happens.

Hierarchical Swarms with Permission Sandboxing

The swarm model in OpenAstra allows a root agent to spawn sub agents with explicitly scoped permissions. Sub agents can only access the tools and data their parent explicitly grants them, nothing more.

We built this because we needed it ourselves. Running agents with access to everything is fine for a prototype. It's not acceptable in a production environment where agents are touching real business data.

106 Skills, 64 Tools, 10 LLM Providers

We packaged everything our team had built and reused across projects: 106 skills (higher level capabilities like research, summarization, and content generation) and 64 core tools (web search, file operations, API calls, and more). These are the building blocks that let you deploy a capable agent without writing custom tool integrations from scratch.

LLM support covers 10 providers: OpenAI, Anthropic Claude, Google Gemini, local models via Ollama, and others. The provider abstraction means different agents can use different models based on what the task warrants, which is how you keep inference costs reasonable at scale.

Dream Mode

One of our own workflow needs that became a feature: running agents unattended overnight for long horizon tasks. Research pipelines, batch data processing, analysis jobs that don't need real time supervision. We called it Dream Mode, agents working while you sleep. It's a first class runtime configuration, not a hack.

Hot Reload and Cost Tracking

Two things that matter the moment you're running agents in production rather than development:

Hot reload: code changes apply without a runtime restart. We needed this for fast iteration on our own agents, and it stays in.

Cost tracking dashboards: every LLM call is tracked by agent, task, and provider over time. When you're running multiple agents across multiple models, this is the instrumentation that keeps the economics manageable.

Why We Built It

We build agentic systems for clients, and we built OpenAstra to power that work more efficiently. Self hosted means your data stays yours, with no black boxes in your production infrastructure.

If you're building agents that need to run in the real world, OpenAstra is where we'd start.

More from the Lab

⚙️Engineering
Engineering

When Agencies Build Their Own Tools: Two Cases From Our Stack in 2026

There is a familiar pattern in agency operations: you adopt a commercial tool because it solves 80% of the problem, then spend the next two years working around the remaining 20%. Eventually the workarounds accumulate, the friction compounds, and someone on the team says the quiet part out loud. We could just build this.

Apr 12, 2026
⚙️Engineering
Engineering

Vercel vs Cloudflare Pages: Edge Deployment for Commerce in 2026

The edge deployment market looked very different three years ago. Vercel was the obvious choice for teams building on Next.js, and Cloudflare Pages was a static site host trying to grow up. In 2026, that picture has changed substantially. Cloudflare has built a credible full-stack deployment platform with a global edge network, a growing Workers ecosystem, and pricing that makes Vercel's enterprise tier look expensive.

Apr 11, 2026
⚙️Engineering
Engineering

Vercel vs Netlify: Frontend Deployment for Headless Commerce Teams in 2026

There was a period when Vercel and Netlify were nearly interchangeable: both deployed JAMstack sites, both handled forms and serverless functions, both offered preview deployments on pull requests. That period is over. The two platforms have made fundamentally different product bets over the last two years, and those bets create meaningfully different outcomes depending on your stack.

Apr 11, 2026

Want to discuss this topic?

Start a Conversation