The 2026 Agentic Stack Survey: What Teams Are Actually Running

A note up front. We are not going to give you percentages. Every “AI stack survey” we read in the last twelve months reported precise market-share numbers — eighty-three percent of teams use X, forty-one percent are evaluating Y — and not one of them published a methodology section we could check. We will not be adding to the pile. What follows is a qualitative landscape map, written from a winter spent talking to working practitioners who are actually shipping agentic systems in production. Treat it like a field report, not a chart.

The shape of the stack has stabilized faster than most of us expected. Two years ago, a team standing up an agentic project had to pick a runtime, an orchestration pattern, a memory store, an evals harness, a deployment surface, and a UI primitive — and most of those slots had three or four competing answers. In 2026, most of those slots have a clear default and a credible second choice, and the live arguments have moved up the stack: not “which framework” but “how should the framework be configured for our case.” That is good news. It also means the next wave of pieces in this publication will be about what to do with the defaults, not how to choose them.

The model layer is decided. The orchestration layer is the new battleground.

Almost every team we spoke to is running a two-or-three model setup. A frontier model from one of the big labs handles reasoning-heavy work. A cheaper or faster model handles classification, routing, and the high-volume calls. A smaller open-weights model — usually self-hosted — handles the calls that need to stay inside a customer’s perimeter or the calls that fire often enough that the cost-per-call matters more than the absolute quality.

What changed in the last twelve months is not the model layer. It is the orchestration layer. Two years ago, “agentic” meant a single LLM call with tools. One year ago, “agentic” meant a graph of LLM calls with shared scratch memory. Now “agentic” means a system with named roles, owners, handoffs between specialists, persistent task state, and a surface that lets a human jump in without breaking the flow. The vocabulary has caught up with the engineering. Teams talk about “the CEO agent” and “the specialists” because that is, in practice, how they have organized the code.

The clearest trend is that orchestration is being pulled out of the framework layer and into the product layer. A year ago, you picked LangGraph or CrewAI or AutoGen and let the framework’s mental model shape your product. Now the most thoughtful teams treat those frameworks as libraries, not architectures. They keep the routing, the state machine, and the human-facing surface in their own code, and they reach for a framework only when they need a specific primitive. The handful of teams running on packaged agentic-workforce platforms go a step further: their orchestration is the platform, and their product is whatever the agents produce. That is a meaningfully different architecture from “we built our own thin wrapper on LangGraph.”

LangGraph vs AutoGen — what the working teams actually pick

Aspect

LangGraph LangChain-stewarded, graph-shaped orchestration

AutoGen Microsoft-stewarded, multi-agent conversation patterns

Orchestration model

Explicit DAG / state-graph

Conversational multi-agent

Persistence

Built-in checkpointer + thread state

Bring-your-own

Tool calling

First-class, schema-driven

First-class, function-driven

Mental model

Workflow engineer's

Conversation designer's

Best fit

Stateful branching pipelines

Cooperating agent ensembles

Production posture (2026)

Used as a library, not an architecture

Common complaint

Verbose for simple cases

Multi-agent loops can stall

Reach for it when...

You need durable state across long-running runs

You need explicit specialist-to-specialist handoff

A practitioner-side read. No vendor-supplied benchmarks. See editorial guidelines for our sourcing standards.

The tools-and-protocols slot is the most volatile

If the model layer has settled and the orchestration layer is consolidating, the tools-and-protocols slot is still wide open. MCP — Anthropic’s Model Context Protocol — is the most-asked-about line item in every conversation we had. About half of the teams we spoke to are using it for at least one integration. A meaningful chunk of those teams are using it for every integration and have rewritten their internal tool calls behind MCP servers. A small but vocal minority told us they evaluated MCP, decided the additional indirection cost was not worth it for their use case, and rolled their own JSON-RPC layer instead.

We are not going to declare a winner. We are going to say something more useful, which is that MCP’s value is highly dependent on what kind of team you are. If you are building one product and you control all the integrations, MCP is roughly a wash against a hand-rolled interface — you trade some bespoke convenience for protocol-level uniformity. If you are an agency or a platform that has to integrate with arbitrary external systems on a rolling basis, the calculation flips. The protocol does real work for you because the cost of writing the n+1th integration drops.

The agency model is also where the most interesting stack patterns are emerging right now. The teams that are doing repeat work across a portfolio of clients — content systems, internal-operations agents, lead-gen funnels, sales-ops automations — are the ones who have ironed out the abstractions first. The patterns are familiar to anyone who has worked at a service company: you start by building everything bespoke, you notice the same shape three or four times, and you extract it into a framework. The shops that have committed to a single orchestration platform across their entire delivery practice are the ones whose maintenance load grows sub-linearly with client count.

State, memory, and the case for “boring”

The most boring slot in the stack — state — is also the slot where most production incidents originate. Memory in an agentic system means two things at once. One is the working memory inside a single task: scratch notes, intermediate results, the conversation buffer. The other is the long-running memory across tasks: what does the system know about this customer, this project, this preference, this history. Teams that conflate the two ship bugs.

The convergence we are seeing is that the working memory lives in the framework’s session object or in a small Redis-backed scratch store, and the long-running memory lives in a Postgres table with a regular schema. Vector databases still have a role, but it is narrower than the 2024 marketing implied. Vectors are for retrieval-augmented generation over unstructured corpora. They are not the system’s memory of what the user said yesterday — that belongs in a relational store with proper indexing and proper auditability. The teams who internalized that early ended up shipping faster than the teams who tried to put everything behind a vector index.

A pseudocode pattern that came up in three separate conversations:

# Bad: conflating ephemeral and durable memory.
def handle_turn(user_msg: str) -> str:
    embed_and_store(user_msg)  # everything goes to the vector DB
    context = vector_search(user_msg, k=10)
    return llm.complete(context + user_msg)

# Better: separate scratch state from durable state, store
# durable facts as structured rows, and let RAG be RAG.
def handle_turn(session_id: str, user_msg: str) -> str:
    session = sessions.get(session_id)            # ephemeral
    facts   = customer_facts.for_user(session.user_id)  # durable, structured
    docs    = vector_search(user_msg, k=5)        # RAG over corpora only
    out     = llm.complete(prompt(session, facts, docs, user_msg))
    sessions.update(session_id, append_turn=user_msg)
    if extracted := extract_durable_facts(out):
        customer_facts.upsert(session.user_id, extracted)
    return out

The pattern is unglamorous on purpose. The point is that the durable facts are extracted intentionally, with a typed schema, and the vector store is doing the job it was designed for. We have seen teams cut their production incident rate noticeably after making this split — though we will not give you a percentage because we did not measure it across enough teams to claim one.

Deployment is the slot where the agency model wins

Most agentic-stack writing skips deployment, which is a mistake. The teams shipping fastest are not the ones with the most clever runtime; they are the ones whose deployment story is boring. GitHub for source, Railway or Fly or Render for the long-running services, Vercel or Netlify for the user-facing surfaces, a managed Postgres somewhere, a managed object store somewhere, and a CI pipeline that runs evals on every PR. None of that is novel. The novelty is in not getting distracted by the more exotic options.

A working team we spoke to ships their full agentic stack out of a single monorepo with a typical web-app deploy pipeline. They run the orchestration as one service, the specialist agents as a worker pool behind a queue, and a thin React surface for the human operator. They could have run it on Kubernetes. They chose not to, on the explicit grounds that the team is small enough that “Kubernetes” would be a third full-time job nobody had. That kind of restraint is the difference between a team that ships and a team that infrastructures.

What we would tell you to do if you are starting today

This is the part where we are going to be unfashionable. If you are standing up a new agentic project in 2026, the highest-leverage decision is not which framework you pick — it is what shape you give your team. The teams who ship are the teams who have a clear owner for the orchestration layer, a clear owner for the integrations, a clear owner for the evals, and an editor-of-the-system who can override all three when the system is not behaving. The teams who do not ship are the ones who treat the agentic stack as “what the framework gives us.”

If you do nothing else from this piece, write the org chart of your agentic system on a single page before you write the code. Specialists. Owners. Handoffs. The frameworks will fall in line behind a clear structure. They will not save you from an unclear one.

What we are watching next

Three threads we are tracking for the next issue. First, the MCP-in-production debate is moving from “is it useful” to “is it stable under load.” We are talking to teams that have been on MCP for nine to twelve months and have opinions. Second, the agency model — vertical AI agencies running on a single orchestration platform — is producing the most interesting patterns in the market, and we are profiling more of them. Third, the auditability conversation is heating up, especially in regulated industries; we have an upcoming piece on what an “auditable agentic stack” actually looks like in 2026.

If you operate one of the stacks we should be covering and we have not reached out, write to the address on the contributors page. We try to talk to everyone before we write the landscape piece, but the agentic-stack world is wide.

For now, the short version: the model layer is decided, the orchestration layer is consolidating, the tools layer is volatile, the state layer rewards discipline, the deployment layer rewards restraint, and the team shape matters more than the framework. We expect most of those statements to be true twelve months from now. We will tell you if any of them stop being true.

— The Editorial Team