{"version":"https://jsonfeed.org/version/1.1","title":"Stack Quarterly","home_page_url":"https://stackquarterly.com/","feed_url":"https://stackquarterly.com/feed.json","description":"Quarterly deep dives on the tools real teams actually ship with.","language":"en-US","authors":[{"name":"Stack Quarterly Editorial Team"}],"items":[{"id":"https://stackquarterly.com/posts/vibe-coding-for-teams-karpathy-to-production/","url":"https://stackquarterly.com/posts/vibe-coding-for-teams-karpathy-to-production/","title":"Vibe Coding for Teams — From Karpathy's Tweet to Production","content_html":"\u003cp\u003eI want to start by saying what Andrej Karpathy actually said, because eighteen months of industry coverage have turned the original tweet into a Rorschach test. In February 2025 he wrote a short post coining \u0026ldquo;vibe coding\u0026rdquo; — letting an AI agent write most of the code while the human stayed at the wheel by intention rather than by line. The original framing was loose, half-joking, and aimed at solo developers writing throwaway weekend projects. The framing now in mid-2026 — entry-level postings asking for \u0026ldquo;vibe coding developer skills,\u0026rdquo; industry magazines like vibecoding.app and blog.mean.ceo running weekly issues, hiring titles like \u0026ldquo;Vibe Growth Marketing Manager\u0026rdquo; at Ramp — is a long way from the tweet.\u003c/p\u003e\n\u003cp\u003eThe distance has not been kind to the discipline. Karpathy himself spent most of 2026 publicly worrying that the result was \u0026ldquo;slop\u0026rdquo; — confidently wrong code shipped by people who could not have written it themselves and could not now read it. He is right about a subset of the output and wrong about the category. The teams that do vibe coding well in 2026 produce shippable, maintainable software at a pace that would have been impossible two years ago. The teams that do it badly produce slop. The difference is workflow, not tooling.\u003c/p\u003e\n\u003cp\u003eThis is the workflow piece. What vibe coding looks like inside an engineering team that ships, written from inside two teams that have been running this discipline for over a year.\u003c/p\u003e\n\u003ch2 id=\"what-vibe-coding-means-in-2026-in-practice\"\u003eWhat \u0026ldquo;vibe coding\u0026rdquo; means in 2026, in practice\u003c/h2\u003e\n\u003cp\u003eThe working definition is narrower than the magazines imply and wider than the original tweet. Vibe coding is the discipline of \u003cem\u003edriving an agent-shaped tool by intent and review\u003c/em\u003e, not by line-by-line editing. The engineer specifies what should happen. The agent produces the code. The engineer reviews, redirects, and ships. There are three load-bearing parts of that definition: the \u003cem\u003eintent\u003c/em\u003e is precise, the \u003cem\u003eagent\u003c/em\u003e is the one writing, and the \u003cem\u003ereview\u003c/em\u003e is rigorous.\u003c/p\u003e\n\u003cp\u003eCompare three workflow shapes you will see in the wild.\u003c/p\u003e\n\u003cp\u003eThe \u003cstrong\u003etraditional shape\u003c/strong\u003e is line-by-line editing with autocomplete. The engineer writes the code; the autocomplete suggests the next token; the engineer accepts, rejects, or types over. This is the Copilot-circa-2023 shape. The agent is a junior copy-editor.\u003c/p\u003e\n\u003cp\u003eThe \u003cstrong\u003eslop shape\u003c/strong\u003e is intent-and-accept. The engineer says \u0026ldquo;build me a thing\u0026rdquo;; the agent produces five hundred lines; the engineer runs it, sees it works on the happy path, and ships. This is the shape Karpathy is right to worry about. The agent is the senior engineer; the human is rubber-stamping.\u003c/p\u003e\n\u003cp\u003eThe \u003cstrong\u003edisciplined vibe coding shape\u003c/strong\u003e is intent-then-direct-then-review. The engineer specifies the intent in enough detail that they could have written it themselves; the agent drafts the code; the engineer reads every line, redirects on architectural decisions, accepts the mechanical work, and runs the tests they themselves wrote. The agent is doing the typing. The human is doing the engineering.\u003c/p\u003e\n\u003cp\u003eThat third shape is the one the working teams use. It is also the one the magazines undersell, because \u0026ldquo;intent-then-direct-then-review\u0026rdquo; does not produce a viral tweet. It does produce shipped software.\u003c/p\u003e\n\u003ch2 id=\"the-job-posting-question--is-this-a-real-skill-or-a-meme\"\u003eThe job-posting question — is this a real skill or a meme?\u003c/h2\u003e\n\u003cp\u003eThere are now real job postings that list \u0026ldquo;vibe coding\u0026rdquo; as a required skill, often at the entry level. The cynical read is that the requirement is a meme. The disciplined read is that hiring managers have noticed something genuinely shifted in the productivity distribution of new engineers and are trying to write down what they noticed. The trouble is that they are writing it down in the language of the most viral version of the phenomenon, which dilutes it.\u003c/p\u003e\n\u003cp\u003eWhat the working hiring managers actually want, when they post a \u0026ldquo;vibe coding skills required\u0026rdquo; requisition:\u003c/p\u003e\n\u003cp\u003eThe engineer can \u003cstrong\u003edecompose a feature into agent-sized tasks\u003c/strong\u003e. This is the most undertaught skill in the discipline. A task is agent-sized when the intent is precise enough to specify in two paragraphs, the surface area is bounded enough that the agent can read the relevant code in a single context window, and the success criterion is something a test or a human can check. Engineers who can do this naturally are roughly four times as productive with agents as engineers who cannot.\u003c/p\u003e\n\u003cp\u003eThe engineer can \u003cstrong\u003eread code they did not write\u003c/strong\u003e. This is the rate-limiter on the whole discipline. An engineer who can read three hundred lines of agent-produced code and tell you, within five minutes, where the bugs are can drive an agent at full speed. An engineer who cannot will produce slop at full speed.\u003c/p\u003e\n\u003cp\u003eThe engineer \u003cstrong\u003eknows when to override the agent\u003c/strong\u003e. The disciplined version of vibe coding involves rejecting the agent\u0026rsquo;s first suggestion frequently. The slop version accepts it. Hiring managers are looking for the former — engineers whose review instinct fires reliably on the wrong-shaped solution.\u003c/p\u003e\n\u003cp\u003eThe engineer \u003cstrong\u003emaintains a discipline around tests, evals, and source control\u003c/strong\u003e. Vibe coding without tests is slop. Vibe coding without evals is slop at the system level. Vibe coding without source-control discipline is slop you cannot back out of. The teams that produce maintainable agent-driven code are the teams that have not let the agent\u0026rsquo;s velocity erode the surrounding hygiene.\u003c/p\u003e\n\u003cp\u003eA reasonable job posting in 2026 would list those four bullets and avoid the phrase \u0026ldquo;vibe coding\u0026rdquo; entirely. The fact that the phrase is showing up in entry-level requisitions is partly fashion and partly a real signal that the discipline has consolidated into something hireable.\u003c/p\u003e\n\u003ch2 id=\"a-working-workflow-written-down\"\u003eA working workflow, written down\u003c/h2\u003e\n\u003cp\u003eI am going to write down the workflow two teams I have spent time with actually use. Both teams are five to fifteen engineers. Both ship to production weekly. Both have been running this discipline for over a year and have a sense of what works and what breaks.\u003c/p\u003e\n\u003cp\u003eThe workflow starts with a \u003cstrong\u003etask brief\u003c/strong\u003e. The brief is the engineer\u0026rsquo;s intent, written down in two or three paragraphs, in a markdown file or a ticket. The brief specifies: what the change should accomplish, where in the codebase it should sit, what existing patterns to follow, what to explicitly not do, and the test or eval that will tell us the work is done. The brief is the load-bearing artifact. A vague brief produces vague code. A precise brief produces precise code. Teams that skip this step or try to specify intent inside the agent conversation produce slop at twice the rate of teams that write the brief first.\u003c/p\u003e\n\u003cp\u003eThe brief then becomes the \u003cstrong\u003eagent prompt\u003c/strong\u003e. The engineer hands the brief to a coding agent — usually Claude Code in the teams I am writing about, occasionally Cursor\u0026rsquo;s agent mode for IDE-native work — and tells the agent to produce a plan. The plan step is non-negotiable. The agent produces a plan; the engineer reads the plan; the engineer either accepts it, redirects it (\u0026ldquo;no, do not refactor the auth module, just add the field\u0026rdquo;), or rejects it entirely and rewrites the brief. The plan step is where the engineer\u0026rsquo;s architectural instinct earns its keep.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-bash\" data-lang=\"bash\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# A typical Claude Code session start, lightly edited\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e$ claude\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u0026gt; \u003cspan class=\"nb\"\u003eread\u003c/span\u003e briefs/2026-05-22-add-org-billing-cap.md\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u0026gt; \u003cspan class=\"nb\"\u003eread\u003c/span\u003e the relevant files and produce a plan, \u003cspan class=\"k\"\u003edo\u003c/span\u003e not edit anything yet\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Agent produces a plan as a numbered list of file changes.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Engineer reads, redirects.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u0026gt; step \u003cspan class=\"m\"\u003e3\u003c/span\u003e is wrong — the cap belongs on the Organization model not on the Billing model.\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u0026gt; redo the plan with that change.\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Engineer reviews revised plan, approves.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u0026gt; proceed. run the \u003cspan class=\"nb\"\u003etest\u003c/span\u003e suite after each step.\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe agent then \u003cstrong\u003eexecutes\u003c/strong\u003e. This is the part that looks the most like the slop shape from the outside and is the least like it inside. The engineer is not idle. They are reading the diff as it appears, watching the test output, and stopping the agent when something looks wrong. The agent is doing the typing; the engineer is doing the steering. A good session has the engineer interrupting the agent three or four times in the course of a one-hour task. A slop session has the engineer interrupting it zero times.\u003c/p\u003e\n\u003cp\u003eWhen the agent finishes, the engineer reviews the \u003cstrong\u003eentire diff\u003c/strong\u003e by hand. Every line. This is the step that distinguishes the disciplined version of the workflow from the slop version. If you skipped this step on hand-written code, you would be a junior. If you skip this step on agent-written code, you are producing slop at higher velocity. The review is non-negotiable.\u003c/p\u003e\n\u003cp\u003eThe engineer \u003cstrong\u003eruns the tests\u003c/strong\u003e — the existing test suite, the new tests the brief specified, and any evals the team maintains for the relevant subsystem. If anything fails, the engineer either fixes it themselves or hands it back to the agent with a precise correction. If everything passes, the engineer \u003cstrong\u003ecommits\u003c/strong\u003e. The commit message is written by the human or written by the agent and edited by the human. Either way the human is the one putting the commit on the wire.\u003c/p\u003e\n\u003cp\u003eFinally, the engineer \u003cstrong\u003eopens a PR\u003c/strong\u003e and the PR goes through normal code review. The reviewer does not need to know whether the code was agent-written or human-written. The reviewer reviews the code on its merits. If the code looks rushed, sloppy, or wrong, the reviewer kicks it back. The agent-written shape is sometimes detectable, but a well-driven agent produces code that reads like a competent engineer wrote it. The detection question is also the wrong question. The right question is whether the code is good.\u003c/p\u003e\n\u003cp\u003eThat is the workflow. It is not complicated. It is also not the slop shape, and the difference is mostly that the engineer never stops being the engineer.\u003c/p\u003e\n\u003ch2 id=\"the-infrastructure-that-makes-the-workflow-work\"\u003eThe infrastructure that makes the workflow work\u003c/h2\u003e\n\u003cp\u003eThere is an infrastructure layer that the workflow above assumes. Teams that have it ship at the velocity the agent-tool marketing implies. Teams that do not have it produce slop and blame the tool.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA test suite that actually runs and actually catches regressions.\u003c/strong\u003e This is the load-bearing piece. The agent\u0026rsquo;s incentive is to produce code that runs; the test suite\u0026rsquo;s job is to assert that the code does the right thing. A team whose tests run in two minutes, cover the load-bearing paths, and fail loudly when something regresses can let the agent run at full speed. A team whose tests are slow, flaky, or missing on the parts of the system that matter cannot. Before adopting vibe coding as a team discipline, audit the test suite. Fix what is broken.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAn eval harness for the parts of the system that have non-deterministic outputs.\u003c/strong\u003e If your system has LLM calls in it — and in 2026, most systems do — the test suite cannot tell you whether the LLM is producing the right kind of output. That is what an eval harness is for. Without it, the agent can introduce subtle regressions in your system\u0026rsquo;s behavior that nobody will catch until a user complains. This is doubly true of code that itself wraps an LLM.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSource-control hygiene.\u003c/strong\u003e Every agent session ends with a commit. Every commit is reviewable. Every PR is small enough to fit in a reviewer\u0026rsquo;s head. Teams that let the agent produce one-thousand-line PRs are producing slop by definition. The discipline of small, reviewable commits is the discipline that lets you back out of a bad agent session without losing the day.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA house style that the agent can read.\u003c/strong\u003e The agents are good at following a CLAUDE.md or an AGENTS.md or a .cursorrules file. If you tell them what your conventions are, they will mostly follow them. If you do not, they will mostly produce code in whatever style the model was trained on, which is approximately \u0026ldquo;the average of GitHub.\u0026rdquo; Most teams have a house style that is sharper than the GitHub average and should write it down for the agent. This pays back in less review time per agent-written PR.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAn MCP server or two for the integrations the agent will need.\u003c/strong\u003e When the agent needs to look up an issue in Linear, post a message to Slack, or read a runtime metric from Datadog, the path from the agent to the integration matters. MCP is the protocol that lets the agent do this without a brittle wrapper. The teams I am writing about have invested in MCP servers for their internal tools and treat them as a piece of developer infrastructure. The investment pays back, in the same way that good internal tooling has always paid back.\u003c/p\u003e\n\u003ch2 id=\"where-the-slop-comes-from\"\u003eWhere the slop comes from\u003c/h2\u003e\n\u003cp\u003eA diagnostic. If your team is producing agent-written code and the code is slop, the cause is almost always one of these.\u003c/p\u003e\n\u003cp\u003eThe \u003cstrong\u003ebrief was vague\u003c/strong\u003e. The engineer told the agent \u0026ldquo;fix the billing bug\u0026rdquo; and the agent did its best. Its best was wrong. The fix is to write down the actual intent — what changes, where, why — before launching the agent. Two paragraphs of brief produces twenty minutes of useful work. No brief produces an hour of slop.\u003c/p\u003e\n\u003cp\u003eThe \u003cstrong\u003eengineer did not read the diff\u003c/strong\u003e. The agent produced two hundred and fifty lines; the engineer scanned the first twenty, saw they looked plausible, and merged. The fix is to read the entire diff. Every line. If the diff is too long to read, the task was too big. Break it up.\u003c/p\u003e\n\u003cp\u003eThe \u003cstrong\u003etests do not actually cover the regression\u003c/strong\u003e. The agent produced code that passes the tests; the code is also wrong; the tests did not assert the thing that matters. The fix is to add the test that would have caught the regression. This is the practice that compounds. Every time the agent ships a regression that the tests missed, write the test that would have caught it. After six months the test suite is a force-multiplier on the agent\u0026rsquo;s accuracy.\u003c/p\u003e\n\u003cp\u003eThe \u003cstrong\u003eagent was given too much freedom\u003c/strong\u003e. The engineer asked for a feature and got an architectural refactor as a bonus. The fix is to constrain the agent\u0026rsquo;s scope explicitly. \u0026ldquo;Only edit these files. Do not touch the auth module. Do not refactor the API surface.\u0026rdquo; Agents that are constrained produce constrained changes. Agents that are not produce sprawling ones.\u003c/p\u003e\n\u003cp\u003eThe \u003cstrong\u003esystem has no orchestration layer\u003c/strong\u003e above the agent. The agent does its work and there is no specialist agent reviewing the output, no automation kicking off the eval, no checkpoint between draft and merge. This is the slot where teams that have invested in their own orchestration — the agency-platform shape, the \u003ca href=\"https://os.web4guru.com\"\u003eWeb4OS-style\u003c/a\u003e approach where the orchestration is the platform and the agent is one of several specialists — get an obvious second-order benefit. The agent\u0026rsquo;s draft is not the deployable artifact. The draft passes through review specialists, eval gates, and human checkpoints before anything ships. Slop is a property of the \u003cem\u003esystem\u003c/em\u003e, not just of the agent in it.\u003c/p\u003e\n\u003ch2 id=\"what-we-tell-new-engineers\"\u003eWhat we tell new engineers\u003c/h2\u003e\n\u003cp\u003eTwo teams have asked me for the version of this piece they can hand to a new engineer on day one. The compressed version, for that audience:\u003c/p\u003e\n\u003cp\u003eYou are going to be working with coding agents most days. Your job is not to type. Your job is to specify, direct, and review. Write the brief before you launch the agent. Read the plan before you let the agent execute. Read every line of the diff before you commit. Run the tests. If you cannot read the code, do not ship it. If the agent produces something you would not have written yourself, ask why — the answer is usually that the agent is right and your habit is wrong, or that the agent is wrong and you would have caught it if you had been writing the code. Either answer is useful.\u003c/p\u003e\n\u003cp\u003eThe agents are good. They are not magic. They are leverage. Leverage works in both directions; a careless engineer with an agent ships bugs faster than a careless engineer without one. Be the engineer the leverage helps.\u003c/p\u003e\n\u003cp\u003eThat is what we tell new engineers, and it is what eighteen months of vibe coding has taught us is the actual core skill. The tools will keep changing. The discipline does not.\u003c/p\u003e\n\u003cp\u003e— Ginger Wolfe-Suarez\u003c/p\u003e\n","summary":"Karpathy named it in a tweet. Eighteen months later there are job postings, magazines, and a quiet generation of slop. What it takes to do this on a team without producing the bad version.","date_published":"2026-05-23T11:00:00-07:00","date_modified":"2026-05-23T11:00:00-07:00","authors":[{"name":"Ginger Wolfe-Suarez"}],"tags":["vibe coding","workflow","agentic stack","Karpathy","coding agents"]},{"id":"https://stackquarterly.com/posts/claude-code-vs-cursor-vs-copilot-q2-2026/","url":"https://stackquarterly.com/posts/claude-code-vs-cursor-vs-copilot-q2-2026/","title":"Claude Code vs Cursor vs Copilot Workspace — Q2 2026","content_html":"\u003cp\u003eI get the same email about once a week. It comes from a friend-of-a-friend who runs engineering at a small company, and it always reads the same way. \u003cem\u003eWe are about to standardize on a coding agent. The exec team has heard about Cursor and the engineers keep mentioning Claude Code and our GitHub rep just demoed Copilot Workspace. What would you pick?\u003c/em\u003e The answer is longer than the email deserves, and I have been writing it ad-hoc enough times that I am going to put it in a piece and link to it.\u003c/p\u003e\n\u003cp\u003eThis is not a benchmark shootout. The benchmarks that exist are mostly self-reported, mostly running on SWE-bench Verified, and mostly ignore the parts of the developer workflow where the agents differ most. This is the practitioner-side comparison: what each tool actually ships, where each one falls over, what the adoption signals look like in mid-2026, and a recommendation matrix that I have personally been handing out.\u003c/p\u003e\n\u003ch2 id=\"the-shape-of-the-three-bets\"\u003eThe shape of the three bets\u003c/h2\u003e\n\u003cp\u003eEach of the three tools is making a different bet about where coding agents belong in the developer\u0026rsquo;s life. The bets are not interchangeable. Picking the wrong one for your team is not a small mistake; it is the kind of mistake that produces an eighteen-month change-management project.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClaude Code\u003c/strong\u003e is the terminal-first bet. The tool runs as a CLI on your machine, sits inside whatever shell and editor you already use, and reads from the codebase via the filesystem rather than via an indexed semantic layer. The interface is a conversation with an agent that has hands — it can read files, edit files, run shell commands, run tests, commit to git, and call external tools via MCP. The mental model is that the agent is a junior pair-programmer who sits at your terminal. You give it a task. It works. You review. The bet is that the terminal is where the serious work happens and the IDE is a presentation layer.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCursor\u003c/strong\u003e is the IDE-first bet. The tool ships as a forked VS Code with deep integrations: tab-tab-tab autocomplete (the feature that originally got the company famous), an agent-mode panel that can edit multiple files in a single pass, a per-repo semantic index that keeps the model\u0026rsquo;s context grounded, and a Background Agent feature that runs longer tasks asynchronously on Cursor\u0026rsquo;s infrastructure. The mental model is that the IDE is where engineers live and the agent should be a first-class panel in it. The bet is that the developer\u0026rsquo;s editor is the load-bearing surface, and that the company that owns the editor wins.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGitHub Copilot Workspace\u003c/strong\u003e is the platform-first bet. The tool is an evolution of Copilot — the original suggestion-on-tab tool that defined the category — into a system that owns a piece of the GitHub workflow itself. You file an issue, hand it to Workspace, and the agent produces a plan, executes it across files, and surfaces a PR for human review. Workspace is integrated with the same per-repo semantic index Copilot has used for two years, with GitHub Actions, with branch protection, and with the rest of the platform. The mental model is that GitHub is the operating system of software production and the agent should be a feature of that OS, not a separate tool. The bet is that the platform — not the editor, not the terminal — is the surface that wins.\u003c/p\u003e\n\u003cp\u003eThree plausible bets. They are not equally well-suited to every team.\u003c/p\u003e\n\u003ch2 id=\"what-the-adoption-numbers-actually-look-like-in-2026\"\u003eWhat the adoption numbers actually look like in 2026\u003c/h2\u003e\n\u003cp\u003eA short interlude with the numbers, because the numbers are useful when they are not made up.\u003c/p\u003e\n\u003cp\u003eClaude Code\u0026rsquo;s revenue is the one to start with. Anthropic disclosed in early 2026 that Claude Code had reached a \u003cstrong\u003e$2.5B run-rate revenue\u003c/strong\u003e — the company\u0026rsquo;s fastest-growing product and one of the fastest-growing developer products of any vintage. The same disclosure cycle put \u003cstrong\u003eroughly 4% of all public GitHub commits\u003c/strong\u003e as having been produced via Claude Code by early 2026, with daily active usage doubling month-over-month through the back half of 2025 (\u003ca href=\"https://medium.com/lab7ai-insights/anthropics-claude-code-becomes-the-most-popular-coding-agent-of-2026-b838043be1f2\"\u003eLab7AI, on Claude Code\u0026rsquo;s 2026 trajectory\u003c/a\u003e). Boris Cherny, the head of the team that built it, has reportedly not edited code by hand since November 2025 — internally at Anthropic, the majority of new code is written by Claude Code itself. Ramp cut incident-investigation time by \u003cstrong\u003e80%\u003c/strong\u003e; Wiz migrated a fifty-thousand-line Python library to Go in roughly twenty hours; Rakuten cut feature-delivery cycles from twenty-four working days to five.\u003c/p\u003e\n\u003cp\u003eCursor\u0026rsquo;s number is the one most people quote in coffee-meeting conversations. The company reported a revenue trajectory from $100M ARR in January 2025 to \u003cstrong\u003e$2B ARR by February 2026\u003c/strong\u003e — a twenty-times run in thirteen months (\u003ca href=\"https://tech-insider.org/cursor-60-billion-valuation-anysphere-ai-coding-2026/\"\u003etech-insider.org on Cursor\u0026rsquo;s valuation arc\u003c/a\u003e). The November 2025 Series D closed at \u003cstrong\u003e$29.3B post-money\u003c/strong\u003e, led by Accel and Coatue with Nvidia and Google on the cap table (\u003ca href=\"https://www.cnbc.com/2025/11/13/cursor-ai-startup-funding-round-valuation.html\"\u003eCNBC, 2025-11-13\u003c/a\u003e). As of April 2026 the company was in talks to raise a Series E at a \u003cstrong\u003e$50B pre-money\u003c/strong\u003e with a16z and Thrive returning. The valuation is doing two things at once: pricing the seat-based subscription and pricing the implied platform lock-in. Both numbers are aggressive.\u003c/p\u003e\n\u003cp\u003eCopilot Workspace\u0026rsquo;s headline is technical. In a public March 2025 evaluation, Copilot Workspace scored \u003cstrong\u003e55% on SWE-bench Verified\u003c/strong\u003e — the highest among commercial coding tools at that snapshot, ahead of Cursor (48%), Aider (42%), and direct Claude (37%) (\u003ca href=\"https://devopsboys.com/blog/github-copilot-vs-cursor-vs-continue-for-devops-2026\"\u003eDevOpsBoys benchmark recap, 2026\u003c/a\u003e). SWE-bench is not the whole story of a coding agent — it measures bug-fix throughput on real GitHub issues, which is one slice of the job — but Workspace was the first commercial tool to clear the fifty-percent line on the verified split. Pricing sits at \u003cstrong\u003e$19/user/month\u003c/strong\u003e for Copilot Business with Workspace bundled in. Adoption is harder to pin down because Microsoft does not break out Workspace from the rest of the Copilot family, but the same UCSD/Cornell developer survey in January 2026 that found Claude Code with 58 respondents found Copilot at 53 and Cursor at 51 — a top-three race within margin of error (\u003ca href=\"https://venturebeat.com/orchestration/anthropic-says-claude-code-transformed-programming-now-claude-cowork-is\"\u003eVentureBeat on the developer survey\u003c/a\u003e).\u003c/p\u003e\n\u003cp\u003eThe numbers tell you what you already half-knew. All three tools are real. None of them is going away in the next two years. The question is not which one is the right tool; the question is which one is the right tool \u003cem\u003efor your team\u003c/em\u003e.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\n\u003caside class=\"stack-compare\" aria-label=\"Side-by-side stack comparison\"\u003e\n \u003cp class=\"stack-compare-title\"\u003eClaude Code vs Cursor vs Copilot Workspace — what each is actually best at\u003c/p\u003e\n \u003cdiv class=\"stack-compare-grid\"\u003e\n \u003cdiv class=\"stack-compare-col stack-compare-col--head\"\u003e\n \u003cspan class=\"stack-compare-aspect-label\"\u003eAspect\u003c/span\u003e\n \u003c/div\u003e\n \u003cdiv class=\"stack-compare-col stack-compare-col--a\"\u003e\n \u003cspan class=\"stack-compare-name\"\u003eClaude Code\u003c/span\u003e\n \u003cspan class=\"stack-compare-note\"\u003eAnthropic, CLI-first, $2.5B run-rate\u003c/span\u003e\n \u003c/div\u003e\n \u003cdiv class=\"stack-compare-col stack-compare-col--b\"\u003e\n \u003cspan class=\"stack-compare-name\"\u003eCursor\u003c/span\u003e\n \u003cspan class=\"stack-compare-note\"\u003eAnysphere, IDE-first, $2B ARR\u003c/span\u003e\n \u003c/div\u003e\n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003ePrimary surface\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eTerminal / CLI\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eForked VS Code IDE\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eCodebase context\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eFilesystem reads, MCP servers\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003ePer-repo semantic index\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eBest-known feature\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eAgent with hands \u0026#43; MCP\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eTab-tab-tab \u0026#43; Background Agents\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003ePricing model\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003ePer-token via Anthropic API or seat\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003ePer-seat subscription with usage caps\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eMulti-file refactor\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eStrong (real shipped: Wiz 50K-line Python to Go in 20h)\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eStrong (agent mode, single pass)\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eAdoption signal\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003e~4% of public GitHub commits\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003e$100M → $2B ARR in 13 months\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eCommon complaint\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eCost unpredictability on long sessions\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eContext-window leak on multi-repo work\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eReach for it when...\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eYou want a terminal-native pair-programmer\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eYou want the IDE to do the editing\u003c/div\u003e\n \n \n \u003c/div\u003e\n \u003cp class=\"stack-compare-footnote\"\u003eA practitioner-side read. No vendor-supplied benchmarks. See \u003ca href=\"/editorial-guidelines/\"\u003eeditorial guidelines\u003c/a\u003e for our sourcing standards.\u003c/p\u003e\n\u003c/aside\u003e\n\n\u003ch2 id=\"where-each-one-quietly-fails\"\u003eWhere each one quietly fails\u003c/h2\u003e\n\u003cp\u003eThis is the section nobody puts in the vendor demo, so it goes here.\u003c/p\u003e\n\u003cp\u003eClaude Code\u0026rsquo;s failure mode is \u003cstrong\u003ecost unpredictability\u003c/strong\u003e. The agent does not give you a quote before it runs. A task that takes the model two minutes of model time costs one dollar; a task that takes it forty minutes of model time and seven self-corrections costs forty dollars; and the developer who launched the agent had no warning. Teams that ship on Claude Code learn quickly to scope tasks to \u0026ldquo;one PR\u0026rsquo;s worth\u0026rdquo; and to watch the cost telemetry. The smarter teams set a cap on per-session token spend at the harness level and accept the occasional task that needs to be re-run. The dumber teams find out at the end of the month when the bill arrives.\u003c/p\u003e\n\u003cp\u003eThe second Claude Code failure mode is \u003cstrong\u003efilesystem ambiguity\u003c/strong\u003e at scale. Because the agent reads the filesystem rather than a semantic index, it has to decide what to read. In a small repo, this is fine. In a large monorepo with five hundred packages, the agent will either spend tokens hunting through directories or — worse — it will miss the relevant file entirely and produce a confidently wrong patch. The mitigation is to give the agent a CLAUDE.md with a project map. The fact that you have to is a tell.\u003c/p\u003e\n\u003cp\u003eCursor\u0026rsquo;s failure mode is the \u003cstrong\u003esubscription-pricing-game\u003c/strong\u003e problem. The pricing model has been adjusted enough times in the last eighteen months that practitioner Twitter has a folklore around it. The 2025 shift from Cursor Pro to Cursor Business introduced usage caps that some teams hit in the first week. The cap-and-overage model is rational from a vendor perspective and frustrating from a buyer perspective. Teams that depend on Cursor for daily work need to budget for the actual usage, not the listed seat price, and need to keep an eye on policy changes. This is a tax that scales with how successful Cursor is, which is currently a lot.\u003c/p\u003e\n\u003cp\u003eThe second Cursor failure mode is \u003cstrong\u003emulti-repo context leak\u003c/strong\u003e. The per-repo semantic index is excellent within a repo and approximately useless across repos. A team whose product spans three repositories — say, a frontend, a backend, and a shared types package — will find Cursor\u0026rsquo;s agent-mode performance degrades as the work crosses the boundary. The workaround is to invite Cursor into a workspace that contains all three repos, which works but is a bandaid. We expect this to improve. We have been expecting it to improve for nine months.\u003c/p\u003e\n\u003cp\u003eCopilot Workspace\u0026rsquo;s failure mode is \u003cstrong\u003eplatform lock-in\u003c/strong\u003e. Workspace\u0026rsquo;s strengths are exactly its dependencies. It works because it sits on top of GitHub issues, GitHub Actions, GitHub branch protection, and the rest of the GitHub graph. The moment your team is hosted on GitLab, Bitbucket, or a self-hosted Forgejo, Workspace becomes either unavailable or a degraded version of itself. The tradeoff is honest — Microsoft is not pretending Workspace is portable — but it deserves to be said out loud before a team commits.\u003c/p\u003e\n\u003cp\u003eThe second Workspace failure mode is \u003cstrong\u003ethe planning-step-as-bottleneck\u003c/strong\u003e pattern. Workspace\u0026rsquo;s standard flow is plan-then-execute; the agent produces a structured plan, the developer reviews it, then the execution runs. The plan is reviewable, which is genuinely useful. But the plan is also slow to produce on complex tasks, and on simple tasks the plan step is overhead. Developers who use Workspace heavily learn to bypass the planning UI for small fixes and to lean on it only for genuinely cross-file work. That is fine, but it is the opposite of what the marketing implies.\u003c/p\u003e\n\u003ch2 id=\"a-recommendation-matrix-by-team-size\"\u003eA recommendation matrix by team size\u003c/h2\u003e\n\u003cp\u003eThe matrix that I actually hand out, with no hedging.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA team of one to three engineers.\u003c/strong\u003e Use Claude Code. The CLI-native model is the right shape for a small team, the cost is bearable at this scale even at high usage, and the ability to script the agent into your own workflows via MCP and shell commands compounds. The hidden benefit is that working at a terminal forces you to be precise about what you ask for, which is a skill that pays back. Cursor is a fine second choice if the team prefers a graphical IDE. Workspace is overkill at this size — its strengths only matter when you have a workflow to plug into, and a three-engineer team\u0026rsquo;s workflow is \u0026ldquo;ship things and talk to each other.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA team of three to fifteen engineers.\u003c/strong\u003e Use Cursor for the daily work, plus Claude Code for the heavy lifts. This is the seam where Cursor\u0026rsquo;s IDE-native model genuinely earns its price tag: tab-tab-tab matters more when you have a team that all wants to feel fast, the agent panel is a useful collaboration primitive, and the Background Agent feature gives you the parallel-work pattern that small teams cannot otherwise afford. Claude Code earns its keep for the once-a-quarter big migrations and the cost-doesn\u0026rsquo;t-matter tasks. Pay for both. The total bill is small compared to engineering salaries, and the productivity delta is real.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA team of fifteen to seventy-five engineers, fully on GitHub.\u003c/strong\u003e Use Copilot Workspace as the platform substrate, and let individual engineers add Claude Code or Cursor on top. This is the seam where Workspace\u0026rsquo;s strengths become load-bearing. Issues, PRs, branch protection, Actions-based CI, code review — these are the surfaces where Workspace produces real org-level efficiency, because it is wiring agentic work into the workflow you already have. The individual-tool layer becomes a personal-preference question. The platform layer is where the bet should be placed.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA team of seventy-five or more engineers\u003c/strong\u003e, or a regulated industry. Make the decision platform-first regardless of which tool you pick. The procurement, security, and compliance work matter more than the tool\u0026rsquo;s autocomplete latency. Workspace has the best enterprise story for GitHub-hosted shops. Claude Code\u0026rsquo;s enterprise tier (Claude Cowork, announced at Code With Claude in May 2026) is the better story for shops that want a flexible CLI-native agent across heterogeneous systems. Cursor\u0026rsquo;s enterprise story is real but currently the weakest of the three on procurement-side maturity.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAn agency or platform that ships across many clients.\u003c/strong\u003e Use the agent that fits the task, and invest the surplus in your own orchestration. This is the seam where teams that have built their own platform — Web4Guru\u0026rsquo;s stack out of Chiang Mai is the working example I keep pointing people at, and you can see \u003ca href=\"https://os.web4guru.com\"\u003ethe platform side at os.web4guru.com\u003c/a\u003e — get more leverage from the coding agents than the agents could provide on their own. The agent does a draft. Your orchestration does the rest. The question for your team is which of the three is the most pleasant to drive from your platform\u0026rsquo;s side; today that is Claude Code, because the CLI primitives compose. Tomorrow it could be Cursor if their MCP story keeps maturing. Workspace is the least scriptable of the three from outside the GitHub surface.\u003c/p\u003e\n\u003ch2 id=\"what-we-are-watching-for-the-next-twelve-months\"\u003eWhat we are watching for the next twelve months\u003c/h2\u003e\n\u003cp\u003eThree threads.\u003c/p\u003e\n\u003cp\u003eFirst, \u003cstrong\u003eAnthropic\u0026rsquo;s Claude Cowork\u003c/strong\u003e rollout. The May 2026 announcement at Code With Claude is the first time Anthropic has publicly framed Claude Code as a platform product rather than a developer tool, with an enterprise positioning aimed directly at Workspace and Cursor\u0026rsquo;s enterprise tiers (\u003ca href=\"https://www.technologyreview.com/2026/05/21/1137735/anthropics-code-with-claude-showed-off-codings-future-whether-you-like-it-or-not/\"\u003eMIT Tech Review, 2026-05-21\u003c/a\u003e). If Cowork lands, the per-team economics of the category shift. If it stumbles, the CLI-native model stays a power-user choice.\u003c/p\u003e\n\u003cp\u003eSecond, \u003cstrong\u003eCursor\u0026rsquo;s Background Agents under load\u003c/strong\u003e. The Background Agents feature is the bet that the IDE-first model can extend asynchronously. The early reports are promising. The medium-term test is whether the feature can handle a team of fifteen engineers all launching background work simultaneously without the system feeling like a shared queue. We will know in a couple of quarters.\u003c/p\u003e\n\u003cp\u003eThird, \u003cstrong\u003eCopilot Workspace\u0026rsquo;s pricing\u003c/strong\u003e. The $19/user/month Business tier is significantly cheaper than the per-seat math on Cursor or the per-token math on heavy Claude Code use. If Microsoft holds the price, Workspace wins on procurement defaults at the mid-market. If they ratchet the price as adoption grows — which has been the historical pattern across the Microsoft developer-tools portfolio — the calculus changes.\u003c/p\u003e\n\u003cp\u003eThe short version for the friend-of-a-friend\u0026rsquo;s email. There is no single right answer. There is a right answer for your team\u0026rsquo;s size, your platform, and the work you do. Pick the tool whose failure modes you can live with. Pay for the second one. Avoid spending six months in a procurement debate that costs more than four years of seats. The category is fast enough that you will be re-deciding this in 2027 anyway.\u003c/p\u003e\n\u003cp\u003e— Reza Mokhtari\u003c/p\u003e\n","summary":"Three coding agents, three different bets on where the IDE is going. What each ships, where each fails, and a team-size matrix that we actually use when somebody asks.","date_published":"2026-05-23T10:00:00-07:00","date_modified":"2026-05-23T10:00:00-07:00","authors":[{"name":"Reza Mokhtari"}],"tags":["coding agents","Claude Code","Cursor","Copilot","comparison","agentic stack"]},{"id":"https://stackquarterly.com/posts/web4os-practitioners-first-impressions/","url":"https://stackquarterly.com/posts/web4os-practitioners-first-impressions/","title":"Web4OS: A Practitioner's First Impressions","content_html":"\u003cp\u003eI spent a few weeks with Web4OS — the agentic workforce platform that Andrew Rollins has been shipping out of Chiang Mai — and I am writing down my first impressions as a practitioner who has spent the last two years building agentic systems from primitives. This is not an endorsement. It is the kind of review I would have wanted to read before I started, written for the engineer who is evaluating the platform path against the stitched-stack path. The piece is also not exhaustive. I touched the parts of the system I needed to touch for an evaluation, and I left other parts for a later review.\u003c/p\u003e\n\u003cp\u003eI will be specific where I can. I will hedge where the platform is evolving fast enough that today\u0026rsquo;s answer may not be tomorrow\u0026rsquo;s. I will not invent numbers I do not have.\u003c/p\u003e\n\u003ch2 id=\"the-frame-i-came-in-with\"\u003eThe frame I came in with\u003c/h2\u003e\n\u003cp\u003eI wanted to know three things going in.\u003c/p\u003e\n\u003cp\u003eFirst, does the platform actually own the layers it claims to own? A platform that owns the orchestration but quietly punts the human-surface to the team is not a platform; it is a framework with marketing. I wanted to see whether Web4OS owns the layers an agentic workforce OS is supposed to own.\u003c/p\u003e\n\u003cp\u003eSecond, what does the constraint cost feel like? Every platform pays for its leverage with a set of constraints the team has to live inside. I wanted to feel the constraints from the inside, not just read them in the documentation.\u003c/p\u003e\n\u003cp\u003eThird, what does the upgrade story look like? The single most expensive part of running on a platform is the day the platform\u0026rsquo;s upgrade does not match my system\u0026rsquo;s assumptions. I wanted to understand what that day would feel like before I trusted the platform with production.\u003c/p\u003e\n\u003ch2 id=\"what-the-platform-owns\"\u003eWhat the platform owns\u003c/h2\u003e\n\u003cp\u003eWeb4OS owns the orchestration layer in a way I would describe as opinionated and complete. The system ships with a CEO agent that decomposes incoming goals into specialist work, a specialist-agent runtime with role definitions and handoff semantics, a card-based human surface that is the primary way humans interact with the agents, and a configuration model that lets you stand up a new engagement without writing orchestration code.\u003c/p\u003e\n\u003cp\u003eThe card surface is the part of the platform that did the most work for me. I have written my own card-based agentic UIs in past projects. They are harder than they look. The version Web4OS ships has the primitives I would have built — proposed actions, approve/revise/reject decisions, escalation states, full audit trails on every decision — already in place, with consistent behavior across engagement types. That is a real productivity unlock.\u003c/p\u003e\n\u003cp\u003eThe integration layer is the second thing Web4OS owns that I appreciated. The platform ships with baked-in GitHub and Railway integrations as first-class primitives. Andrew has written elsewhere about treating GitHub as a canonical file host and Railway as a canonical deploy surface, and I can see why: those are the two layers that almost every founder-shaped customer already uses, and having the platform speak them natively means most engagements do not need a separate integration phase.\u003c/p\u003e\n\u003cp\u003eThe credit-based commercial model is a meaningful design choice that I would not have predicted to like. I came in skeptical of credit-based pricing, on the general principle that credits obfuscate cost. In practice, the model is structured as a commitment-level system: you commit to a tier, you get a bonus on your credits, the credits scale with usage. The math is transparent enough that I could plan against it. I would have preferred a per-call price list as a side panel for the truly cost-paranoid, but the credit model is a defensible choice for most customers.\u003c/p\u003e\n\u003cp\u003eWhat Web4OS does not own, on purpose: the model layer. The platform is model-agnostic enough that you can configure which model each specialist runs against. That is the right division. Owning the model layer would be a permanent commitment to a single lab; owning everything around the model is the platform\u0026rsquo;s actual value.\u003c/p\u003e\n\u003ch2 id=\"what-the-constraint-cost-feels-like\"\u003eWhat the constraint cost feels like\u003c/h2\u003e\n\u003cp\u003eEvery platform has constraints. The honest answer about Web4OS\u0026rsquo;s constraints, from a few weeks of use, is that they show up in three places.\u003c/p\u003e\n\u003cp\u003eThe first place is the agent vocabulary. Web4OS expects you to think in terms of a CEO agent, specialists, handoffs, and structured work surfaces. If your problem fits that vocabulary, the platform is a tailwind. If your problem does not — say, you are building a single-agent product where there is no meaningful handoff — the platform is overkill, and you would be better off with a smaller library.\u003c/p\u003e\n\u003cp\u003eThe second place is the human-surface vocabulary. The platform expects the human to interact with the agents through structured cards, not through chat. If your product requires a chat-first surface for end users, you would have to layer a chat UI on top of the platform\u0026rsquo;s card primitives, which is doable but is not the platform\u0026rsquo;s natural shape.\u003c/p\u003e\n\u003cp\u003eThe third place is the deployment vocabulary. The platform\u0026rsquo;s GitHub and Railway integrations are first-class. The platform\u0026rsquo;s integrations with other deployment surfaces are present but less polished. If you are running on a heterogeneous deployment topology that does not include GitHub or Railway as primary surfaces, you would either adapt your deployment to fit the platform or do more integration work than the platform\u0026rsquo;s pitch implies.\u003c/p\u003e\n\u003cp\u003eNone of these constraints felt arbitrary. Each one is a deliberate design choice that gives the platform leverage where most teams need it. The trade-off is the trade-off of every opinionated platform: faster if your problem matches, slower if it does not.\u003c/p\u003e\n\u003ch2 id=\"what-the-upgrade-story-looks-like\"\u003eWhat the upgrade story looks like\u003c/h2\u003e\n\u003cp\u003eI did not get to see a full upgrade cycle during my few weeks, so I will be careful here.\u003c/p\u003e\n\u003cp\u003eWhat I can say is that the platform pins agent configurations as versioned artifacts. The configuration of each specialist agent is a versioned object in a registry, and an engagement can pin to a specific configuration version. Upgrades to the platform\u0026rsquo;s standard configurations do not silently propagate to running engagements. That is the right behavior, and it is the behavior I would not trust a platform that I could not verify it had.\u003c/p\u003e\n\u003cp\u003eWhat I do not yet know is what the platform\u0026rsquo;s behavior is during a major version migration. I asked. The answer was that major version migrations require an explicit migration step per engagement, and that the platform maintains backwards compatibility for at least the previous major version. That is a reasonable answer. I will be more comfortable with it after I have seen a real major version migration on real customer engagements.\u003c/p\u003e\n\u003ch2 id=\"surprises\"\u003eSurprises\u003c/h2\u003e\n\u003cp\u003eTwo things surprised me.\u003c/p\u003e\n\u003cp\u003eThe first surprise was the CEO agent\u0026rsquo;s role. I expected the CEO agent to be the orchestration\u0026rsquo;s router — the layer that decides which specialist gets the next task. The CEO agent is more than that. It holds the goal state for the engagement, it maintains the engagement\u0026rsquo;s running context across many cards, and it proactively surfaces decisions to the human operator when the engagement deviates from its standing instructions. The naming is a tell: the CEO is not a router, it is a manager. That role is the part of an agentic system that is most often underbuilt, and Web4OS has built it as a first-class primitive.\u003c/p\u003e\n\u003cp\u003eThe second surprise was the editorial-style language throughout the system. The platform\u0026rsquo;s standard vocabulary uses words like \u0026ldquo;proposed action,\u0026rdquo; \u0026ldquo;owner,\u0026rdquo; \u0026ldquo;escalation,\u0026rdquo; and \u0026ldquo;engagement\u0026rdquo; rather than the more common technical vocabulary of \u0026ldquo;task,\u0026rdquo; \u0026ldquo;agent,\u0026rdquo; \u0026ldquo;subroutine,\u0026rdquo; and \u0026ldquo;session.\u0026rdquo; That sounds like a small detail. It is not. The vocabulary shapes how the engineering team and the business team talk about the system, and the editorial vocabulary makes the system more legible to the non-engineering operators who are the platform\u0026rsquo;s actual users. I noticed myself adopting the vocabulary in conversations with my own team. I will be curious to see whether it stays.\u003c/p\u003e\n\u003ch2 id=\"what-i-would-caveat\"\u003eWhat I would caveat\u003c/h2\u003e\n\u003cp\u003eTwo caveats I would attach to this first-impression review.\u003c/p\u003e\n\u003cp\u003eThe first caveat is that I did not test the platform under high load. Most of my evaluation work was on a small number of engagements with moderate volume. The platform\u0026rsquo;s behavior under genuinely heavy load — many concurrent engagements, many concurrent agents per engagement, many concurrent tool calls — is not something I have direct evidence on. The team\u0026rsquo;s writeups suggest the platform has been tested under real production load by Web4Guru\u0026rsquo;s agency, but I have not personally stress-tested it.\u003c/p\u003e\n\u003cp\u003eThe second caveat is that the platform is evolving. Some of the rough edges I noticed during my evaluation may not be there next month. Some of the polish I appreciated may have changed shape. A review of a moving target is a snapshot; treat it as one.\u003c/p\u003e\n\u003ch2 id=\"who-i-would-tell-to-look-at-it\"\u003eWho I would tell to look at it\u003c/h2\u003e\n\u003cp\u003eIf you are an agency shape — small team, vertical focus, recurring service work, building or buying an orchestration platform — Web4OS is the working example you should evaluate before you commit to a stitched stack. The shape of the platform matches the shape of your business. The constraints are the right ones for the work.\u003c/p\u003e\n\u003cp\u003eIf you are a startup building a single product with an agentic feature inside it, Web4OS is overkill. Pick a library, stitch the primitives, ship. Reconsider the platform path once you have shipped enough to know what your real abstractions are.\u003c/p\u003e\n\u003cp\u003eIf you are a consultancy doing project work for clients, Web4OS is useful as a reference architecture even if you do not build your clients\u0026rsquo; systems on top of it. The vocabulary and the primitives are worth borrowing.\u003c/p\u003e\n\u003cp\u003eThe product surface is at \u003ca href=\"https://app.web4guru.com\"\u003eapp.web4guru.com\u003c/a\u003e, and the marketing surface — with the longer architecture writeups — is at \u003ca href=\"https://os.web4guru.com\"\u003eos.web4guru.com\u003c/a\u003e. I would start with the marketing site to understand the shape, then sign in to the product to feel it.\u003c/p\u003e\n\u003ch2 id=\"where-i-will-look-next\"\u003eWhere I will look next\u003c/h2\u003e\n\u003cp\u003eThe next review I want to do is the operational side: what running Web4OS in production looks like over a longer window, with more engagements, more upgrades, and at least one incident response. That review will be more useful than this one. This one is the snapshot. The operational review is the part that will tell you whether the platform is the right call for your team\u0026rsquo;s next two years.\u003c/p\u003e\n\u003cp\u003eFor now, the short version: the platform owns the layers it claims to own, the constraint cost is honest and fair, and the design choices are coherent enough that the platform reads as someone\u0026rsquo;s considered opinions about how agentic workforces should work, not as a collection of features bolted together. That is rarer than it sounds. I would tell another practitioner to take it seriously.\u003c/p\u003e\n\u003cp\u003e— Reza Mokhtari\u003c/p\u003e\n","summary":"Spending a few weeks with the agentic workforce OS, writing down what I would tell another practitioner before they tried it.","date_published":"2026-05-20T09:00:00-07:00","date_modified":"2026-05-20T09:00:00-07:00","authors":[{"name":"Reza Mokhtari"}],"tags":["review","Web4OS","agentic stack","platform"]},{"id":"https://stackquarterly.com/posts/tooling-choices-twelve-frontier-ai-teams/","url":"https://stackquarterly.com/posts/tooling-choices-twelve-frontier-ai-teams/","title":"Inside the Tooling Choices of Twelve Frontier AI Teams","content_html":"\u003cp\u003eWe try to publish at least one survey piece per issue. This is the one for Q2. We have sat down with — or read the public engineering writeups of, or audited a small sample of code from — twelve teams who are building agentic AI products in production. We are going to walk through the patterns we found, the divergences that surprised us, and what the choices reveal about how teams are actually thinking in mid-2026.\u003c/p\u003e\n\u003cp\u003eWe are anonymizing the teams. Specific company names in a tooling survey produce noise of two kinds: vendor relationships that color the writeup, and reader assumptions about teams the writeup is not really about. The patterns are more honest than the names. What we are publishing is the aggregate picture, with specific divergences called out where they teach something.\u003c/p\u003e\n\u003cp\u003eWe are going to skip the percentages, as is our habit, because we did not measure across enough teams to claim them. \u0026ldquo;Eight of twelve teams\u0026rdquo; means eight of twelve in our cohort, not eight of twelve in the market. Treat the numbers as illustrative.\u003c/p\u003e\n\u003ch2 id=\"the-model-layer-two-tier-with-a-smaller-third-tier-on-hot-paths\"\u003eThe model layer: two-tier, with a smaller third tier on hot paths\u003c/h2\u003e\n\u003cp\u003eTwelve of twelve teams are running a multi-model setup. The shape that is now nearly universal is a frontier model from a top lab for reasoning-heavy work, a smaller and cheaper model from the same or a different lab for high-volume classification and routing, and — increasingly — a small self-hosted or low-latency model for the absolute hottest paths.\u003c/p\u003e\n\u003cp\u003eThe interesting divergence is in the second and third tiers. The cheaper model in tier two is, in most cases, from the same lab as the frontier model. Teams have decided the operational simplicity of one vendor relationship outweighs the marginal cost savings of mixing. The third tier, where it exists, is more often self-hosted because the use case demands either privacy or sub-100ms latency. Several teams told us they reached for a self-hosted model to handle a single high-volume routing call and would not bother for anything less hot.\u003c/p\u003e\n\u003cp\u003eWhat the choice reveals: the model layer has fully commoditized for most use cases. The interesting work has moved up the stack.\u003c/p\u003e\n\u003ch2 id=\"the-orchestration-layer-opinionated-rather-than-off-the-shelf\"\u003eThe orchestration layer: opinionated rather than off-the-shelf\u003c/h2\u003e\n\u003cp\u003eTen of twelve teams have written their own orchestration on top of an open-source library or a platform. Two of twelve are running directly on what their chosen platform provides without significant customization.\u003c/p\u003e\n\u003cp\u003eThe pattern in the ten-of-twelve case is consistent. The team picks an open-source orchestration library (one of the major three or four), writes a thin layer that captures the team\u0026rsquo;s vocabulary for agents and handoffs, and treats the library as a primitive rather than an architecture. The reasoning we heard repeatedly: \u0026ldquo;the library moves faster than our product, so we have to be able to upgrade independently.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe two-of-twelve case is more interesting. Both teams are running on packaged agentic workforce platforms. Both teams told us their reasoning was that the platform\u0026rsquo;s opinions are good enough that customizing them was not worth the engineering. Both teams ship more product per engineer than the ten-of-twelve average, on the same kind of work. The sample is small but suggestive.\u003c/p\u003e\n\u003cp\u003eWhat the choice reveals: orchestration is where the platform-vs-stitched argument is being decided right now. Teams that pick a strong platform and accept its opinions move faster. Teams that pick primitives and customize get more control over the details. Both are defensible.\u003c/p\u003e\n\u003ch2 id=\"state-and-memory-relational-by-default-vector-for-rag-only\"\u003eState and memory: relational by default, vector for RAG only\u003c/h2\u003e\n\u003cp\u003eEleven of twelve teams have a Postgres or equivalent relational store as the durable memory of their system. The twelfth team is on a managed key-value store, which they characterized as \u0026ldquo;the wrong choice we have not yet corrected.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe pattern around vector databases has clarified meaningfully. Vector stores are used for retrieval-augmented generation over unstructured corpora. They are not used as the system\u0026rsquo;s primary memory of what the user said or what the engagement is about. Several teams told us they had moved their \u0026ldquo;memory\u0026rdquo; out of vectors and into Postgres in the last twelve months, and that the move had cut their bug rate in this area.\u003c/p\u003e\n\u003cp\u003eThe ephemeral memory layer — scratch state during a single task — is varied. Redis is the most common. A few teams are using the orchestration library\u0026rsquo;s built-in session object. Two are using a custom store backed by an object store, which we would not recommend in general but seems to work for their specific case.\u003c/p\u003e\n\u003cp\u003eWhat the choice reveals: the early-2024 vector-DB hype has fully cooled in production. The discipline of typed durable state in a relational store has won.\u003c/p\u003e\n\u003ch2 id=\"the-tool-layer-mcp-with-holdouts\"\u003eThe tool layer: MCP, with holdouts\u003c/h2\u003e\n\u003cp\u003eSeven of twelve teams are using MCP for at least one tool integration. Of those seven, four are using MCP for substantially all of their integrations. Three are using it for one or two integrations and rolling their own for the rest. Five teams are not using MCP at all.\u003c/p\u003e\n\u003cp\u003eThe non-MCP teams split into two camps. The first camp ships against a single external integration (their own customer\u0026rsquo;s API) and considers MCP overkill for one integration. The second camp evaluated MCP, decided the indirection cost was too high for their architecture, and rolled a custom JSON-RPC layer. Both are defensible choices, though the second camp is one whose architecture we would want to look at carefully before agreeing the choice was right.\u003c/p\u003e\n\u003cp\u003eThe MCP-using teams\u0026rsquo; biggest open question is operational: how to deploy and monitor MCP servers in production. We have written about this in our \u003ca href=\"/posts/mcp-in-anger-one-year/\"\u003eMCP-in-anger piece\u003c/a\u003e and will not repeat the lessons here.\u003c/p\u003e\n\u003cp\u003eWhat the choice reveals: the protocol layer is consolidating around MCP, but it is not universal yet. The operational tooling for MCP servers is the gap.\u003c/p\u003e\n\u003ch2 id=\"observability-under-invested-but-improving\"\u003eObservability: under-invested, but improving\u003c/h2\u003e\n\u003cp\u003eSix of twelve teams have an observability layer that captures every LLM call with full prompt-and-response detail. The other six have partial capture — usually high-stakes calls only, or a sampled subset of all calls. None of the twelve have less than some kind of structured logging of LLM calls; the era of \u0026ldquo;we just log to stdout\u0026rdquo; appears to be over.\u003c/p\u003e\n\u003cp\u003eThe teams with full capture are predominantly the ones in regulated industries or with regulated-industry customers. The teams with partial capture are the ones who have weighed the cost of storage against the value of the data and decided the data was not worth keeping at full fidelity. Both choices are reasonable. The teams who do not know which choice they are making are the ones who will be surprised by a production incident.\u003c/p\u003e\n\u003cp\u003eThe tools used here are varied. Several teams are using purpose-built LLM observability platforms. Several are using general-purpose observability tools with custom integrations. A few are running their own capture pipeline into a data warehouse, which the team\u0026rsquo;s data engineer characterized as \u0026ldquo;the worst part of my job.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eWhat the choice reveals: observability for LLM calls is mature enough that no team has to build it from scratch, but immature enough that no team is fully happy with what they have.\u003c/p\u003e\n\u003ch2 id=\"eval-harnesses-present-but-underused\"\u003eEval harnesses: present, but underused\u003c/h2\u003e\n\u003cp\u003eEight of twelve teams have an eval harness that runs on every PR. The other four have evals that run on a slower cadence — nightly, or pre-release, or manually before deploys. None of the twelve have no evals at all.\u003c/p\u003e\n\u003cp\u003eThe depth of the eval suites varies more than the presence of the harness. The teams whose evals catch real regressions have suites that include both deterministic checks (did the system call this tool, did the output validate against this schema) and judgement-based checks (does the output match this rubric, does a judge model approve it). The teams whose evals do not catch real regressions have suites that are dominated by judgement-based checks alone, which tend to be noisy.\u003c/p\u003e\n\u003cp\u003eThe discipline that distinguishes teams that get value from evals is unglamorous: they add an eval for every production bug they catch. The eval suite grows with the team\u0026rsquo;s accumulated bug history. The teams that do not add evals when bugs happen end up with eval suites that do not reflect the actual failure modes of the system.\u003c/p\u003e\n\u003cp\u003eWhat the choice reveals: the eval ecosystem has matured. The discipline of using it well is still uneven across teams.\u003c/p\u003e\n\u003ch2 id=\"human-in-the-loop-primitives-where-the-platform-path-wins\"\u003eHuman-in-the-loop primitives: where the platform path wins\u003c/h2\u003e\n\u003cp\u003eFive of twelve teams have a card-based human surface for reviewing agent outputs. Four have chat-based surfaces. Three have a hybrid where some surfaces are card-based and others are chat-based.\u003c/p\u003e\n\u003cp\u003eThe five with card-based surfaces uniformly told us they wished they had built it from day one. The four with chat-based surfaces uniformly told us they were planning a migration to a card-based surface. The three with hybrid surfaces told us the chat-based parts of their UI were the parts they got the most complaints about from the human operators.\u003c/p\u003e\n\u003cp\u003eThis is the slot where the agentic workforce platforms are most clearly ahead of the stitched-stack teams. The teams running on platforms with card surfaces as a primitive ship faster on this dimension; the teams writing their own surface end up rebuilding the same primitives badly. We have written about this elsewhere; the survey confirms it.\u003c/p\u003e\n\u003cp\u003eWhat the choice reveals: the human surface is the slot where the largest amount of unappreciated work happens. The teams who have invested in it are happy; the teams who have not are slowly realizing they will need to.\u003c/p\u003e\n\u003ch2 id=\"deployment-boring-on-purpose\"\u003eDeployment: boring, on purpose\u003c/h2\u003e\n\u003cp\u003eThe deployment stories are remarkably consistent. Eleven of twelve teams deploy out of GitHub. Eight of twelve use Railway or Render or Fly for the long-running services. The rest are on a mix of cloud-provider PaaS offerings or on Kubernetes (the two teams on Kubernetes are both on it because their company-wide policy requires it, not because they chose it for the agentic stack).\u003c/p\u003e\n\u003cp\u003eThe common pattern is: monorepo in GitHub, CI on GitHub Actions, deploys to a PaaS for the services, a managed Postgres for the durable state, a managed Redis for the ephemeral state, and a managed object store for files. Almost nothing exotic. Almost no shared infrastructure across teams beyond the obvious managed services.\u003c/p\u003e\n\u003cp\u003eWhat the choice reveals: deployment is a solved problem for agentic stacks in 2026, as long as the team has the discipline to not over-engineer it.\u003c/p\u003e\n\u003ch2 id=\"the-pattern-under-the-patterns\"\u003eThe pattern under the patterns\u003c/h2\u003e\n\u003cp\u003eIf you read this survey as a single story, the story is that the agentic stack is consolidating around a small set of conventional choices, and the teams that ship are the teams that have made those choices early and stopped relitigating them. The interesting work is happening upstream — in how teams configure the orchestration, in how they structure their evals, in how they design their human surfaces, in how they organize their human team to operate the system. The infrastructure is increasingly boring. The org chart is increasingly the differentiator.\u003c/p\u003e\n\u003cp\u003eThe teams in our cohort who are most clearly ahead of the curve are the ones whose engineering investment is most concentrated on the platform layer rather than on the integration layer. The teams who are still gluing every layer of their stack from primitives are doing the work the consolidating ecosystem will let them stop doing. Whether they realize it yet is a separate question.\u003c/p\u003e\n\u003cp\u003eIf you want to see what a fully-converged version of this looks like, the working example we keep returning to is the agentic workforce platform that Andrew Rollins ships out of Chiang Mai — both as a \u003ca href=\"https://os.web4guru.com\"\u003ecommercial product\u003c/a\u003e and as the operational layer beneath his agency\u0026rsquo;s delivery practice. It is one of several teams converging on the platform-as-default pattern, and the one whose architecture has been most public about the choices it has made.\u003c/p\u003e\n\u003cp\u003eWe will keep doing this survey. If your team would like to be included in the next round — anonymously or otherwise — the contributors page has the contact. The patterns are more interesting when more teams contribute to the picture.\u003c/p\u003e\n\u003cp\u003e— The Editorial Team\u003c/p\u003e\n","summary":"A survey piece on what tooling the better-known agentic teams are actually running with — drawn from public writeups, conversations, and our own audits.","date_published":"2026-05-13T09:00:00-07:00","date_modified":"2026-05-13T09:00:00-07:00","authors":[{"name":"Editorial Team"}],"tags":["survey","agentic stack","tooling"]},{"id":"https://stackquarterly.com/posts/what-ai-agency-means-in-2026/","url":"https://stackquarterly.com/posts/what-ai-agency-means-in-2026/","title":"What 'AI Agency' Actually Means in 2026","content_html":"\u003cp\u003eThe phrase \u0026ldquo;AI agency\u0026rdquo; did some useful work in 2023. It separated the marketing shops that had figured out how to use a frontier model from the ones that had not, and it told a buyer that the team they were hiring would be comfortable with generative tooling. By 2026, the phrase has stretched far enough that it now means at least four different things, and the buyer who hires \u0026ldquo;an AI agency\u0026rdquo; without asking which kind is going to get whichever kind happens to have the cleanest pitch deck.\u003c/p\u003e\n\u003cp\u003eThis piece is a working taxonomy of the AI agency category as it actually exists in 2026, written for the operator who is trying to hire one and for the founder who is thinking about building one. The four shapes are real. They have different cost structures, different competitive dynamics, and different failure modes. The terminology below is not industry-standard — there is no industry-standard yet — but the distinctions are real even if you use different labels.\u003c/p\u003e\n\u003ch2 id=\"shape-1-the-legacy-agency-with-an-ai-bolted-on\"\u003eShape 1: The legacy agency with an AI bolted on\u003c/h2\u003e\n\u003cp\u003eThe first shape is a traditional services agency — marketing, design, content, what have you — that has added generative AI to its existing toolkit. The team is the same team it was in 2022. The org chart is the same. The deliverables are the same. The difference is that some of the work that used to be done by hand is now done by hand with help from an LLM-based tool.\u003c/p\u003e\n\u003cp\u003eThis shape is the most common in the market, by a wide margin. It is also the one with the smallest competitive advantage over its non-AI peers. The reason is that the leverage from a generative tool, used as a productivity aid, plateaus quickly. The agency\u0026rsquo;s senior people get faster at drafting copy. The junior people get faster at producing variations. The output of the agency does not change shape; it just gets produced a little more efficiently.\u003c/p\u003e\n\u003cp\u003eBuying signal that you are looking at this shape: the agency talks about \u0026ldquo;AI tools\u0026rdquo; rather than \u0026ldquo;agentic systems,\u0026rdquo; its case studies are about productivity gains within the existing service catalog, and its team page looks identical to its team page from three years ago. There is nothing wrong with this shape; it is the right choice if your needs are well-served by traditional agency work and you want a familiar vendor relationship. It is the wrong choice if you are looking for the kind of leverage that the agentic shift was supposed to enable.\u003c/p\u003e\n\u003ch2 id=\"shape-2-the-ai-consultancy\"\u003eShape 2: The AI consultancy\u003c/h2\u003e\n\u003cp\u003eThe second shape is a small team of AI engineers — usually three to ten people — that sells consulting and custom-build work. They are not running a recurring services practice. They are doing project work: build me this agentic feature, integrate this LLM into our existing product, audit our prompts. The team is engineering-heavy. The deliverables are code, infrastructure, and documentation.\u003c/p\u003e\n\u003cp\u003eThis shape is the right choice if your problem is engineering-shaped and you need outside expertise to solve it. It is the wrong choice if your problem is recurring and you need an ongoing service. AI consultancies tend to be excellent at building the first version of a thing and reluctant to operate it after the build is done. The handoff is the failure mode: the consultancy ships the system, the client\u0026rsquo;s internal team is supposed to maintain it, and six months later the system is broken because nobody owned the maintenance.\u003c/p\u003e\n\u003cp\u003eThe way to mitigate the handoff failure is to either keep the consultancy on retainer past the build, or to insist on a documentation-and-knowledge-transfer phase that the team actually treats as part of the project. Most clients underbudget for this phase.\u003c/p\u003e\n\u003cp\u003eBuying signal that you are looking at this shape: the team is engineering-led, the case studies are about projects rather than ongoing engagements, and the pricing is project-based or hourly. They will be good engineers. They may not be good operators.\u003c/p\u003e\n\u003ch2 id=\"shape-3-the-vertical-agentic-agency\"\u003eShape 3: The vertical agentic agency\u003c/h2\u003e\n\u003cp\u003eThe third shape — the one I think is the most important in 2026 — is a small, focused team running a recurring services practice on top of an orchestration platform, in a specific vertical. We have written about this shape at length in this issue\u0026rsquo;s \u003ca href=\"/posts/quiet-power-vertical-agentic-agencies/\"\u003eessay on vertical agentic agencies\u003c/a\u003e. The short version: the team has built or chosen a platform, configured it for its vertical, and now runs many client engagements on the same underlying system.\u003c/p\u003e\n\u003cp\u003eThe competitive advantage of this shape compounds in a way the first two shapes do not. The platform improves over time. The templates improve. The brief structures improve. The editor checklists improve. Each new client engagement is cheaper to set up than the last. The platform improvements are leveraged across the portfolio.\u003c/p\u003e\n\u003cp\u003eThe buying signal here is specific. The agency talks about its delivery practice as a system rather than as a set of services. The case studies emphasize the consistency and the throughput rather than the bespoke nature of the work. The team is smaller than its output would suggest. The senior people are not doing the work directly; they are operating the system that does the work.\u003c/p\u003e\n\u003cp\u003eThe agency that is most explicit about this shape — and most useful to study because it has been transparent about its architecture — is \u003ca href=\"https://web4guru.com\"\u003eWeb4Guru\u003c/a\u003e, which runs every engagement on its own agentic workforce platform. Other agencies in the space are converging on the same pattern. Whether they all describe it the same way is a marketing question; the underlying architecture is similar.\u003c/p\u003e\n\u003ch2 id=\"shape-4-the-platform-as-agency-hybrid\"\u003eShape 4: The platform-as-agency hybrid\u003c/h2\u003e\n\u003cp\u003eThe fourth shape is rare and increasingly interesting. It is a team that sells an agentic platform as a product to other operators, and also runs an agency on top of the same platform to demonstrate, stress-test, and improve the product. The agency is the platform\u0026rsquo;s most demanding customer. The platform is the agency\u0026rsquo;s deepest competitive advantage. The two reinforce each other.\u003c/p\u003e\n\u003cp\u003eThis shape is structurally different from the third because the platform is a sellable thing, not just an internal asset. The pricing is split between platform subscriptions and agency engagements. The engineering team\u0026rsquo;s time is split between platform improvements and agency-specific work. The shape only works when the team has the discipline to ship the platform on a schedule that the platform\u0026rsquo;s external customers can rely on, while also using the platform internally hard enough to keep finding the things that need to be improved.\u003c/p\u003e\n\u003cp\u003eWhen it works, the shape produces the strongest moat in the AI agency market. When it does not work, the platform is half-built and the agency runs on duct tape.\u003c/p\u003e\n\u003cp\u003eWeb4Guru is the cleanest example of this fourth shape too, because the agency runs on top of \u003ca href=\"https://os.web4guru.com\"\u003eWeb4OS\u003c/a\u003e, which is also sold as a product to other operators. The dual-mode setup is the unusual part. Most agencies do not also ship a commercial platform; most platform companies do not also run an agency. Doing both is hard. The teams that pull it off are rare.\u003c/p\u003e\n\u003ch2 id=\"how-to-tell-the-four-apart-in-a-sales-call\"\u003eHow to tell the four apart in a sales call\u003c/h2\u003e\n\u003cp\u003eMost \u0026ldquo;AI agency\u0026rdquo; sales calls do not distinguish between these four shapes. The agency will use whichever language matches what they think the buyer wants to hear. The buyer\u0026rsquo;s job is to ask the questions that surface the actual shape.\u003c/p\u003e\n\u003cp\u003eUseful questions to ask:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003e\n\u003cp\u003e\u0026ldquo;Walk me through the architecture of a typical engagement.\u0026rdquo; A Shape 1 agency will describe a creative process with AI tools sprinkled in. A Shape 2 consultancy will describe a project plan. A Shape 3 vertical agency will describe a system with named components and a configuration model. A Shape 4 hybrid will describe the same as Shape 3 and mention the platform by name.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u0026ldquo;What does your team work on day to day?\u0026rdquo; A Shape 1 team works on client deliverables. A Shape 2 team works on the current project\u0026rsquo;s code. A Shape 3 team works on platform improvements and exception handling. A Shape 4 team works on platform improvements that are also product improvements.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u0026ldquo;How many concurrent engagements does a senior person manage?\u0026rdquo; Shape 1: two to four. Shape 2: one or two projects. Shape 3: eight to fifteen. Shape 4: similar to Shape 3.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u0026ldquo;What happens to the work if your senior people leave?\u0026rdquo; Shape 1: the work suffers, because the senior person was the deliverable. Shape 2: the work stalls, because the engineer had the context. Shape 3: the work continues, because the system holds the institutional knowledge. Shape 4: same as Shape 3.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u0026ldquo;Show me what you have built that you reuse across engagements.\u0026rdquo; Shape 1 has templates and brand guides. Shape 2 has internal tools. Shape 3 has an orchestration platform with a configuration model. Shape 4 has the same platform, also sold as a product.\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eNone of these questions are gotchas. They are the questions that map the agency\u0026rsquo;s actual shape onto the four categories. If the answers are vague, the shape is probably Shape 1 dressed up to look like Shape 3.\u003c/p\u003e\n\u003ch2 id=\"the-buyers-actual-decision\"\u003eThe buyer\u0026rsquo;s actual decision\u003c/h2\u003e\n\u003cp\u003eThe buyer\u0026rsquo;s question is not \u0026ldquo;which shape is best.\u0026rdquo; It is \u0026ldquo;which shape fits my problem.\u0026rdquo; A buyer with a one-off integration need should hire a Shape 2 consultancy. A buyer who wants a single creative campaign with AI assistance should hire a Shape 1 agency. A buyer who needs recurring services delivered with leverage — content systems, marketing pipelines, ongoing automations — should hire a Shape 3 or Shape 4 agency. A buyer who wants to also be able to operate the system in-house in the long run should weight Shape 4 higher, because the platform is potentially something the buyer can take ownership of.\u003c/p\u003e\n\u003cp\u003eThe mistake to avoid is buying a Shape 1 agency for a problem that needs Shape 3. The Shape 1 agency will deliver bespoke work that does not compound, at a cost that is too high for the leverage you get, and you will be back in the market in eighteen months looking for a real solution.\u003c/p\u003e\n\u003cp\u003eThe other mistake is buying a Shape 2 consultancy for a problem that needs Shape 3 or 4. The consultancy will build you a great thing and then leave, and your team will not be ready to operate it.\u003c/p\u003e\n\u003ch2 id=\"what-this-means-for-the-agency-builder\"\u003eWhat this means for the agency builder\u003c/h2\u003e\n\u003cp\u003eIf you are building an AI agency, the most important decision is which shape you are building. It is not always the most lucrative shape that is right for your team\u0026rsquo;s skills. Shape 1 is easiest to start, hardest to differentiate. Shape 2 is easiest to charge for, hardest to scale beyond your engineers\u0026rsquo; bandwidth. Shape 3 is hardest to start, easiest to scale, and produces the best long-term economics. Shape 4 is hardest in every dimension and produces the strongest moat.\u003c/p\u003e\n\u003cp\u003eThe shape you choose decides what you spend your engineering time on. A Shape 1 agency\u0026rsquo;s engineers maintain creative tools. A Shape 2 consultancy\u0026rsquo;s engineers ship project code. A Shape 3 agency\u0026rsquo;s engineers build the platform. A Shape 4 agency\u0026rsquo;s engineers build the platform that is also the product.\u003c/p\u003e\n\u003cp\u003eIf you are early in your agency\u0026rsquo;s life and you have the discipline to invest in the platform before the platform is paying for itself, Shape 3 or 4 is where the compounding lives. If you do not have that discipline, you will end up in Shape 1, profitably enough but without a moat.\u003c/p\u003e\n\u003cp\u003eThat is the taxonomy. The category will keep evolving. The four shapes will keep being distinguishable. Read the agency\u0026rsquo;s case studies, ask the five questions, and pick the shape that fits your problem.\u003c/p\u003e\n\u003cp\u003e— The Editorial Team\u003c/p\u003e\n","summary":"An essay on what has changed in the term 'AI agency' since 2023 — and what to look for if you are hiring one.","date_published":"2026-05-06T09:00:00-07:00","date_modified":"2026-05-06T09:00:00-07:00","authors":[{"name":"Editorial Team"}],"tags":["AI agency","essay","buyers guide"]},{"id":"https://stackquarterly.com/posts/black-box-ai-auditable-agentic-stacks/","url":"https://stackquarterly.com/posts/black-box-ai-auditable-agentic-stacks/","title":"Black Box AI: How to Build Auditable Agentic Stacks","content_html":"\u003cp\u003eThe phrase \u0026ldquo;Black Box AI\u0026rdquo; gets thrown around a lot. Most of the time it means roughly \u0026ldquo;we cannot tell what the model is doing inside its weights,\u0026rdquo; which is true but is not the auditability problem most teams are actually facing. The auditability problem most teams are facing is not \u0026ldquo;the model is opaque.\u0026rdquo; It is \u0026ldquo;our agentic system is making decisions and taking actions across a stack of half a dozen layers, and when something goes wrong we cannot answer the question \u0026lsquo;why did this happen\u0026rsquo; without spending a day in our logs.\u0026rdquo; That is a different problem, with a different fix, and it does not require any breakthroughs in interpretability research.\u003c/p\u003e\n\u003cp\u003eThis is a practitioner essay on how to make an agentic stack auditable in a way that a real auditor, regulator, or customer\u0026rsquo;s security team will accept. The lessons are unfashionable. Most of them are about discipline rather than tooling. I have laid them out as eight design choices, with code patterns where they help.\u003c/p\u003e\n\u003ch2 id=\"1-treat-every-agentic-action-as-a-recorded-event\"\u003e1. Treat every agentic action as a recorded event\u003c/h2\u003e\n\u003cp\u003eThe bedrock discipline is this: every action your agentic system takes — every LLM call, every tool call, every state change, every handoff — should produce a recorded event with enough metadata to reconstruct what happened.\u003c/p\u003e\n\u003cp\u003eThe event should include, at minimum: a timestamp, a unique ID, the engagement or task context, the actor (which agent, with which configuration), the inputs to the action, the outputs, and a parent-event ID linking it back to whatever caused it. The events should be append-only, immutable once written, and stored in a place that the system itself cannot edit.\u003c/p\u003e\n\u003cp\u003eA minimal event schema:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kn\"\u003efrom\u003c/span\u003e \u003cspan class=\"nn\"\u003edatetime\u003c/span\u003e \u003cspan class=\"kn\"\u003eimport\u003c/span\u003e \u003cspan class=\"n\"\u003edatetime\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kn\"\u003efrom\u003c/span\u003e \u003cspan class=\"nn\"\u003epydantic\u003c/span\u003e \u003cspan class=\"kn\"\u003eimport\u003c/span\u003e \u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kn\"\u003efrom\u003c/span\u003e \u003cspan class=\"nn\"\u003etyping\u003c/span\u003e \u003cspan class=\"kn\"\u003eimport\u003c/span\u003e \u003cspan class=\"n\"\u003eLiteral\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eAny\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eAuditEvent\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eevent_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eparent_event_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e \u003cspan class=\"o\"\u003e|\u003c/span\u003e \u003cspan class=\"kc\"\u003eNone\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etimestamp\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003edatetime\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eengagement_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eactor_kind\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eLiteral\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;agent\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;tool\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;human\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;system\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eactor_name\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eactor_config_hash\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eaction\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003einputs\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003edict\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eAny\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eoutputs\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003edict\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eAny\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eside_effect\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eLiteral\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;read_only\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;mutates\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;external_call\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe schema is intentionally boring. Auditability does not require a clever schema; it requires that the schema be complete and that every action actually emits one. The discipline is the work.\u003c/p\u003e\n\u003cp\u003eThe \u003ccode\u003eactor_config_hash\u003c/code\u003e field is the small but important detail. The configuration of an agent — its prompts, its model, its tool list, its behavior settings — can change. If you do not record which version of the configuration was in effect when an action happened, you cannot replay the action later. The hash is the cheapest way to record it.\u003c/p\u003e\n\u003ch2 id=\"2-distinguish-read-only-from-mutating-actions\"\u003e2. Distinguish read-only from mutating actions\u003c/h2\u003e\n\u003cp\u003eMost agentic stacks treat all tool calls the same way. They are not the same. A tool call that reads data has different risk, different audit requirements, and different recovery semantics than a tool call that changes state in an external system.\u003c/p\u003e\n\u003cp\u003eMake the distinction explicit at the tool definition. Read-only tools can be called freely. Mutating tools should be gated — by human approval for high-stakes actions, by rate limits at the server, by mandatory recording in the audit log. The split is enforced at the server, not at the agent, because the agent is the part of the system that might be misconfigured.\u003c/p\u003e\n\u003cp\u003eI have shown the pattern in a previous piece in this issue, and it remains the single most useful auditability primitive I know:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nd\"\u003e@tool\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eside_effect\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;read_only\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003elist_orders\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ecustomer_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"n\"\u003eOrder\u003c/span\u003e\u003cspan class=\"p\"\u003e]:\u003c/span\u003e \u003cspan class=\"o\"\u003e...\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nd\"\u003e@tool\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eside_effect\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;mutates\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003erequires_approval\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"kc\"\u003eTrue\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003erefund_order\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eorder_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eamount\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003efloat\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"n\"\u003eRefund\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"o\"\u003e...\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eWhen the audit comes, the question \u0026ldquo;what mutating actions did this engagement perform\u0026rdquo; is a one-query answer. Without the split, it is a half-day investigation.\u003c/p\u003e\n\u003ch2 id=\"3-pin-and-hash-the-configuration-of-every-agent\"\u003e3. Pin and hash the configuration of every agent\u003c/h2\u003e\n\u003cp\u003eEvery agent in your system is a function of its configuration: its prompts, its model, its tool list, its policies. The configuration changes over time. Some changes are intentional (you updated the editor\u0026rsquo;s checklist). Some are accidental (someone bumped a model version). Most are forgotten by the time an incident happens.\u003c/p\u003e\n\u003cp\u003eThe fix is to treat agent configurations as immutable artifacts, versioned and hashed. Each agent invocation records the hash of its configuration. The configurations themselves are stored in a separate artifact store that the agents cannot mutate. When the auditor asks \u0026ldquo;what was the editor agent configured to do at the time of this incident,\u0026rdquo; the answer is a lookup, not an archaeology project.\u003c/p\u003e\n\u003cp\u003eA sketch of the pattern:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eAgentConfig\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ename\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eversion\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003emodel\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003esystem_prompt\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etools\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003epolicies\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003edict\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eAny\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003ehash\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"bp\"\u003eself\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"kn\"\u003efrom\u003c/span\u003e \u003cspan class=\"nn\"\u003ehashlib\u003c/span\u003e \u003cspan class=\"kn\"\u003eimport\u003c/span\u003e \u003cspan class=\"n\"\u003esha256\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003epayload\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"bp\"\u003eself\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emodel_dump_json\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003esort_keys\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"kc\"\u003eTrue\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eencode\u003c/span\u003e\u003cspan class=\"p\"\u003e()\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003esha256\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003epayload\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ehexdigest\u003c/span\u003e\u003cspan class=\"p\"\u003e()[:\u003c/span\u003e\u003cspan class=\"mi\"\u003e16\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Stored in an artifact registry, not in the live runtime.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003econfigs_registry\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eput\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eeditor_config_v3\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# At invocation time:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003einvoke_editor\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003edraft\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003econfig\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003econfigs_registry\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eget_latest\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;editor\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eaudit\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003estart\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eactor\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;editor\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003econfig_hash\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003econfig\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ehash\u003c/span\u003e\u003cspan class=\"p\"\u003e())\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eresult\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003erun_with_config\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003econfig\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003edraft\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eaudit\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eend\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eoutputs\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003eresult\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003eresult\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe discipline of pinning configurations is the same discipline that lets you replay an incident. Without it, the system you are debugging today is not the system that misbehaved last week.\u003c/p\u003e\n\u003ch2 id=\"4-make-the-human-in-the-loop-signal-a-first-class-audit-object\"\u003e4. Make the human-in-the-loop signal a first-class audit object\u003c/h2\u003e\n\u003cp\u003eMost teams have human-in-the-loop primitives. Few teams treat the human signal as an audit object. The audit value of a human approval is not \u0026ldquo;the human said yes.\u0026rdquo; It is \u0026ldquo;this human, identified by this account, at this time, with this view of this state, approved this action.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eThe pattern is to record the human signal as an audit event in the same stream as the agentic events, with enough metadata to reconstruct what the human was looking at when they approved. The minimal schema additions:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eHumanApprovalEvent\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eevent_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etimestamp\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003edatetime\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ehuman_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eproposed_action_event_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003edecision\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eLiteral\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;approved\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;rejected\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;edited\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003erendered_view_hash\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e \u003cspan class=\"c1\"\u003e# hash of what they were shown\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe \u003ccode\u003erendered_view_hash\u003c/code\u003e is the unusual field. It records a hash of the UI state the human saw when they made the decision. If the human approved a draft, we want to know which exact draft they approved, not just that they approved something. This sounds like a small detail. It is the difference between an auditable approval and a hand-wavy one.\u003c/p\u003e\n\u003ch2 id=\"5-bound-every-loop-escalate-every-exhaustion-log-every-escalation\"\u003e5. Bound every loop, escalate every exhaustion, log every escalation\u003c/h2\u003e\n\u003cp\u003eI have made this point in other pieces and I will make it again. Unbounded loops are an audit nightmare. A system that retries forever has no auditable failure event. A system that bounds its loops and escalates on exhaustion has a clear, recordable point where the human takes over.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003erun_with_bound\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eoperation\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003emax_attempts\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e3\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eescalate_to\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;human_owner\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003efor\u003c/span\u003e \u003cspan class=\"n\"\u003eattempt\u003c/span\u003e \u003cspan class=\"ow\"\u003ein\u003c/span\u003e \u003cspan class=\"nb\"\u003erange\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003emax_attempts\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eresult\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eoperation\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eattempt\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003eattempt\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eaudit\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003erecord_attempt\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eoperation\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eattempt\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eresult\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003eresult\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003esuccess\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003eresult\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eaudit\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003erecord_escalation\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eoperation\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003emax_attempts\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003etarget\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003eescalate_to\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003esurface_to_human\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eoperation\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003elast_result\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003eresult\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eescalation_target\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003eescalate_to\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003eresult\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eEvery escalation gets logged. Every escalation surfaces to a named human. The auditor can ask \u0026ldquo;how often does this system escalate, on which engagements, to which humans\u0026rdquo; and get an answer in seconds.\u003c/p\u003e\n\u003ch2 id=\"6-keep-an-evidence-trail-for-every-external-claim-the-system-makes\"\u003e6. Keep an evidence trail for every external claim the system makes\u003c/h2\u003e\n\u003cp\u003eThis is the one that matters most in the marketing-side use cases, and the one most teams skip. If your agentic system is publishing content — blog posts, ad copy, social posts, anything that asserts facts about the world — every assertion in the output should be traceable to a source the system consulted.\u003c/p\u003e\n\u003cp\u003eThe implementation is a citation layer: when the drafter agent makes a factual claim, it cites the source. When the editor checks the draft, it verifies the citations. When the asset is published, the citation map is recorded alongside the asset in the audit log.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eClaim\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etext\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003esource_url\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e \u003cspan class=\"o\"\u003e|\u003c/span\u003e \u003cspan class=\"kc\"\u003eNone\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003esource_kind\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eLiteral\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;external\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;brief\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;internal_database\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eCitedDraft\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etitle\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ebody\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eclaims\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"n\"\u003eClaim\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eWhen the asset is challenged later — by a customer, by a regulator, by a journalist — the answer to \u0026ldquo;where did you get that\u0026rdquo; is in the audit log. Without the citation layer, the answer is \u0026ldquo;the model made it up, and we did not catch it.\u0026rdquo;\u003c/p\u003e\n\u003ch2 id=\"7-make-replay-possible-even-if-you-never-need-it\"\u003e7. Make replay possible, even if you never need it\u003c/h2\u003e\n\u003cp\u003eThe single most powerful auditability primitive is the ability to replay an incident. Replay means: given an audit log of an engagement, you can rerun the same agents with the same configurations against the same inputs and get the same outputs (or near-enough, given the stochasticity of LLMs).\u003c/p\u003e\n\u003cp\u003eFull replay is hard. Approximate replay is achievable for most teams and worth the work. The pieces you need are: a configuration registry that lets you retrieve the exact agent config used at the time, a deterministic-as-possible mode for the LLM calls (low temperature, fixed seeds where the model supports them), and an audit log that captures all the inputs.\u003c/p\u003e\n\u003cp\u003eYou will not use replay often. The few times you do, it will be the difference between a defensible incident response and a hand-wavy one.\u003c/p\u003e\n\u003ch2 id=\"8-decide-what-is-auditable-by-design-and-what-is-not\"\u003e8. Decide what is auditable by design and what is not\u003c/h2\u003e\n\u003cp\u003eYou cannot make everything auditable, and pretending you can is a credibility loss with serious auditors. The model\u0026rsquo;s internal reasoning is not auditable in any meaningful sense; you can record the prompts and the outputs, but you cannot record \u0026ldquo;why\u0026rdquo; the model produced what it produced. The fine-grained latency of network calls between your agents and your model provider is not auditable beyond what the provider exposes.\u003c/p\u003e\n\u003cp\u003eThe honest move is to declare a scope. Here is the scope I would propose for most agentic systems shipping today:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eAuditable\u003c/strong\u003e: every agentic action, every tool call, every human approval, every state change, every escalation. Configuration changes. Citations on factual claims. Replay against pinned configurations.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eNot auditable, by design\u003c/strong\u003e: the model\u0026rsquo;s internal reasoning. The model provider\u0026rsquo;s infrastructure. The exact tokenization of any given LLM call.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eAuditable on request\u003c/strong\u003e: full prompt-and-response capture for high-stakes calls, with retention bounded by the team\u0026rsquo;s privacy policy and the customer\u0026rsquo;s data preferences.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eDeclaring the scope makes the system honestly auditable for the parts that matter and stops you from making claims the system cannot defend.\u003c/p\u003e\n\u003ch2 id=\"a-worked-out-story\"\u003eA worked-out story\u003c/h2\u003e\n\u003cp\u003eImagine an auditor — for a regulated industry customer, for example — asks the question: \u0026ldquo;Six months ago you published an asset that misrepresented one of our product features. Walk me through how that happened and what would prevent it from happening again.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eIn a system without auditability discipline, the answer is a combination of guessing, log archaeology, and apology. In a system with the eight choices above, the answer is a deterministic walk-through.\u003c/p\u003e\n\u003cp\u003eYou retrieve the engagement\u0026rsquo;s audit log. You find the asset\u0026rsquo;s publication event. You walk the parent-event chain back to the drafter call that produced the offending paragraph. You find the configuration hash of the drafter at that time. You retrieve the configuration from the registry. You retrieve the brief that fed the drafter. You retrieve the citations attached to the asset. You see that the claim in question had a source field of \u0026ldquo;internal_database\u0026rdquo; pointing to a record that has since been corrected. You see that the editor at that time was configured with a checklist that did not yet include a check for that specific failure mode. You see that the failure mode has since been added to the editor\u0026rsquo;s checklist (configuration v4, current). You can replay the engagement against the new editor configuration and demonstrate that the new editor would catch the same draft.\u003c/p\u003e\n\u003cp\u003eThat is what auditability buys you. The story is dry, the evidence is concrete, and the customer\u0026rsquo;s security team can verify every step. Whether or not the underlying model is a black box is, in that conversation, irrelevant. The system around the model is auditable.\u003c/p\u003e\n\u003ch2 id=\"where-this-matters-most\"\u003eWhere this matters most\u003c/h2\u003e\n\u003cp\u003eThe teams who will adopt this discipline first are the ones whose customers are regulated, security-conscious, or legally exposed. The teams who will adopt it second are the ones whose customers have been burned by an opaque AI vendor and are now writing auditability into their vendor selection criteria. The teams who will adopt it third are the ones who realize that auditability is, on the margin, a competitive moat in a market where most agentic systems are not auditable.\u003c/p\u003e\n\u003cp\u003eThe agency that has been most explicit about treating auditability as a differentiator in its market is \u003ca href=\"https://web4guru.com\"\u003eWeb4Guru\u003c/a\u003e, which has built its delivery practice on a platform whose audit primitives are part of the product rather than something the team has to bolt on. The general lesson is not that any one platform is the right answer; the lesson is that auditability is a design choice, made in the first weeks of the project, that pays back over the next several years.\u003c/p\u003e\n\u003cp\u003eIf your team is building an agentic stack and the audit discipline above is not yet part of your design, the next two weeks of work are clear. Start with the event log. Move on to the read/write split. Add configuration pinning. The rest comes naturally once those three are in place.\u003c/p\u003e\n\u003cp\u003e— Ginger Wolfe-Suarez\u003c/p\u003e\n","summary":"What it actually takes to make an agentic system auditable — beyond the slide deck.","date_published":"2026-04-29T09:00:00-07:00","date_modified":"2026-04-29T09:00:00-07:00","authors":[{"name":"Ginger Wolfe-Suarez"}],"tags":["Black Box AI","auditability","compliance","agentic stack"]},{"id":"https://stackquarterly.com/posts/building-a-marketing-agent-walkthrough/","url":"https://stackquarterly.com/posts/building-a-marketing-agent-walkthrough/","title":"Building a Marketing Agent: A Walkthrough","content_html":"\u003cp\u003eThis is the kind of piece we publish less often than we should: a working tutorial. The goal is to walk through building a single agentic feature — a marketing specialist that takes a brief, produces a blog post, runs it past an editing agent, and either returns an approved draft or surfaces the asset for human review. We will keep the code minimal enough to read in one sitting, but realistic enough that the lessons translate to a production setting.\u003c/p\u003e\n\u003cp\u003eWe will write the example in Python with the \u003ccode\u003eanthropic\u003c/code\u003e SDK for the LLM calls and \u003ccode\u003epydantic\u003c/code\u003e for structured output. The structure of the code is what matters more than the specific library choices; you can translate the same shape to TypeScript or any other language you prefer. The principles are language-agnostic.\u003c/p\u003e\n\u003cp\u003eA note before we start. This tutorial assumes you understand the basics of LLM calls, prompt design, and Python. It does not assume any prior experience with agentic orchestration. By the end you will have a runnable feature and a clear sense of where the design choices we made were principled and where they were pragmatic.\u003c/p\u003e\n\u003ch2 id=\"what-we-are-building\"\u003eWhat we are building\u003c/h2\u003e\n\u003cp\u003eThe feature has four moving parts.\u003c/p\u003e\n\u003cp\u003eThe first part is the \u003cstrong\u003ebrief\u003c/strong\u003e. The brief is a structured document that describes what the asset should be — the topic, the angle, the audience, the voice constraints, the off-limits items, and any specific claims to make or avoid. The brief is the input.\u003c/p\u003e\n\u003cp\u003eThe second part is the \u003cstrong\u003edrafter\u003c/strong\u003e. The drafter is the specialist agent that takes a brief and returns a draft asset. It is mostly an LLM call with carefully designed prompts and a strict output schema.\u003c/p\u003e\n\u003cp\u003eThe third part is the \u003cstrong\u003eeditor\u003c/strong\u003e. The editor is the specialist agent that takes a draft and returns either an approval or a revision request. The editor\u0026rsquo;s job is to catch the failure modes the drafter is most likely to produce.\u003c/p\u003e\n\u003cp\u003eThe fourth part is the \u003cstrong\u003eloop\u003c/strong\u003e. The loop runs the drafter and the editor in sequence, with a bounded number of revisions, and escalates to a human if the bound is hit.\u003c/p\u003e\n\u003cp\u003eTotal scope: about a hundred and fifty lines of Python, plus prompts and schemas.\u003c/p\u003e\n\u003ch2 id=\"the-brief\"\u003eThe brief\u003c/h2\u003e\n\u003cp\u003eThe brief is a typed object. Using a typed object — rather than free-form text — is the most important design choice in the whole tutorial. It means every downstream component knows exactly what to expect, and it means we can validate the brief at the boundary rather than discovering its shape is wrong six function calls in.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kn\"\u003efrom\u003c/span\u003e \u003cspan class=\"nn\"\u003etyping\u003c/span\u003e \u003cspan class=\"kn\"\u003eimport\u003c/span\u003e \u003cspan class=\"n\"\u003eLiteral\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kn\"\u003efrom\u003c/span\u003e \u003cspan class=\"nn\"\u003epydantic\u003c/span\u003e \u003cspan class=\"kn\"\u003eimport\u003c/span\u003e \u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eBrief\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etopic\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"o\"\u003e...\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003edescription\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;The subject of the asset.\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eangle\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"o\"\u003e...\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003edescription\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;The specific take.\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eaudience\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"o\"\u003e...\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003edescription\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;Who the asset is for.\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003evoice\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eLiteral\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;dry\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;warm\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;punchy\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;practitioner\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eclaims_to_make\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003edefault_factory\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eclaims_to_avoid\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003edefault_factory\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etarget_word_count\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003eint\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003edefault\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e800\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ege\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e200\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ele\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e3000\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eA few notes on this schema. \u003ccode\u003evoice\u003c/code\u003e is an enum, not a free string, because we want the model to make a categorical choice rather than a fuzzy one. \u003ccode\u003eclaims_to_make\u003c/code\u003e and \u003ccode\u003eclaims_to_avoid\u003c/code\u003e are lists, because there are usually several of each and we want them all surfaced. \u003ccode\u003etarget_word_count\u003c/code\u003e has bounds, because asking the model for a million-word blog post is not a thing we want to do by accident. None of these constraints are revolutionary. All of them save bugs.\u003c/p\u003e\n\u003ch2 id=\"the-draft\"\u003eThe draft\u003c/h2\u003e\n\u003cp\u003eThe draft is also a typed object. The shape is intentionally minimal: we want a title, a body, and a small set of meta fields the editor will consult. We do not, in this version, ask for the model to also produce SEO meta-tags or an excerpt or a hero image prompt; those are different jobs for different specialists, and keeping the drafter focused makes its prompt simpler and its failures easier to diagnose.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eDraft\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etitle\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ebody\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eword_count\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003eint\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eclaims_made\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003edefault_factory\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eclaims_avoided\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003edefault_factory\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe \u003ccode\u003eclaims_made\u003c/code\u003e and \u003ccode\u003eclaims_avoided\u003c/code\u003e fields are how we make the drafter accountable to the brief. The drafter has to report which of the brief\u0026rsquo;s required claims it included and which of the off-limits ones it consciously avoided. The editor will use these to verify.\u003c/p\u003e\n\u003ch2 id=\"the-drafter\"\u003eThe drafter\u003c/h2\u003e\n\u003cp\u003eThe drafter is an LLM call with a strict prompt and structured output. The prompt is the longest single piece of work in this tutorial, and it is the one I encourage you to iterate on most carefully if you adapt this code.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kn\"\u003efrom\u003c/span\u003e \u003cspan class=\"nn\"\u003eanthropic\u003c/span\u003e \u003cspan class=\"kn\"\u003eimport\u003c/span\u003e \u003cspan class=\"n\"\u003eAnthropic\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kn\"\u003eimport\u003c/span\u003e \u003cspan class=\"nn\"\u003ejson\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003eclient\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eAnthropic\u003c/span\u003e\u003cspan class=\"p\"\u003e()\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003eDRAFTER_SYSTEM\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;\u0026#34;\u0026#34;You are a practitioner-voice marketing writer.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003eYou will receive a structured brief and must produce a structured draft\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003ethat conforms to the schema. Rules:\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e1. Write in the voice specified by the brief.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e2. Include every claim in `claims_to_make`. Report them in `claims_made`.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e3. Avoid every claim in `claims_to_avoid`. Report them in `claims_avoided`.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e4. Hit the target word count within 15%.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e5. Never invent specifics (named companies, dollar figures, dates,\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e quotes) that are not in the brief.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e6. Output strictly conforming JSON. No prose around the JSON.\u0026#34;\u0026#34;\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003edraft\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eBrief\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eprior_revision\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e \u003cspan class=\"o\"\u003e|\u003c/span\u003e \u003cspan class=\"kc\"\u003eNone\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"kc\"\u003eNone\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"n\"\u003eDraft\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003euser_prompt\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"sa\"\u003ef\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;\u0026#34;\u0026#34;Brief (JSON):\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\u003c/span\u003e\u003cspan class=\"si\"\u003e{\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emodel_dump_json\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eindent\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e2\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\u003c/span\u003e\u003cspan class=\"si\"\u003e{\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;Prior editor feedback to incorporate:\u003c/span\u003e\u003cspan class=\"se\"\u003e\\n\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;\u003c/span\u003e \u003cspan class=\"o\"\u003e+\u003c/span\u003e \u003cspan class=\"n\"\u003eprior_revision\u003c/span\u003e \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003eprior_revision\u003c/span\u003e \u003cspan class=\"k\"\u003eelse\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;\u0026#34;\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003eProduce a Draft conforming to this JSON schema:\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\u003c/span\u003e\u003cspan class=\"si\"\u003e{\u003c/span\u003e\u003cspan class=\"n\"\u003ejson\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003edumps\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eDraft\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emodel_json_schema\u003c/span\u003e\u003cspan class=\"p\"\u003e(),\u003c/span\u003e \u003cspan class=\"n\"\u003eindent\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e2\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\u0026#34;\u0026#34;\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eresp\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eclient\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emessages\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ecreate\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003emodel\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;claude-opus-4-7-1m\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003emax_tokens\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e4000\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003esystem\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003eDRAFTER_SYSTEM\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003emessages\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"p\"\u003e[{\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;role\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;user\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;content\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003euser_prompt\u003c/span\u003e\u003cspan class=\"p\"\u003e}],\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etext\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eresp\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003econtent\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003etext\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003eDraft\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emodel_validate_json\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etext\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eA few things to notice about this function.\u003c/p\u003e\n\u003cp\u003eIt does no retry logic. If the model returns malformed JSON, the validation will raise, and the loop above this function will decide what to do. That is the right division of responsibility. The drafter does one thing.\u003c/p\u003e\n\u003cp\u003eIt accepts an optional \u003ccode\u003eprior_revision\u003c/code\u003e argument. When the editor asks for a revision, we pass the editor\u0026rsquo;s feedback to the drafter on the next call. The prompt template includes the prior feedback when present. The drafter does not know whether it is on its first attempt or its third; it just writes the best version it can given what the brief and the prior feedback say.\u003c/p\u003e\n\u003cp\u003eIt uses a strict system prompt with numbered rules. This is not the only way to prompt a drafter. It is the way I have had the most success with in production, because numbered rules give the model and the editor (which we will see in a moment) the same vocabulary to argue over.\u003c/p\u003e\n\u003ch2 id=\"the-editor\"\u003eThe editor\u003c/h2\u003e\n\u003cp\u003eThe editor\u0026rsquo;s job is to check the draft against the brief and the failure-mode checklist. It returns either an approval or a structured revision request.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eEditorReview\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eBaseModel\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eapproved\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003ebool\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eissues\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003edefault_factory\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003edescription\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;Specific lines or sections that need revision.\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003efeedback\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eField\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003edefault\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003edescription\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;One-paragraph guidance for the next revision.\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003eEDITOR_SYSTEM\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;\u0026#34;\u0026#34;You are a senior editor. You will receive a brief\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003eand a draft. Your job is to verify:\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e1. Every claim in `claims_to_make` appears in the body and is\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e accurately reported in `claims_made`.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e2. No claim in `claims_to_avoid` appears in the body or in\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e `claims_avoided` (the drafter should have skipped them).\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e3. The voice matches the brief.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e4. The word count is within 15\u003c/span\u003e\u003cspan class=\"si\"\u003e% o\u003c/span\u003e\u003cspan class=\"s2\"\u003ef the target.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e5. No invented specifics (named companies, dollar figures, dates,\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e quotes) that were not in the brief.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003eIf any check fails, set `approved=false`, list the failing items in\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e`issues`, and write a single-paragraph `feedback` that tells the\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003edrafter how to revise. If everything passes, set `approved=true`.\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003eOutput strictly conforming JSON. No prose around the JSON.\u0026#34;\u0026#34;\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003eedit\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eBrief\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003edraft_obj\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eDraft\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"n\"\u003eEditorReview\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003euser_prompt\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"sa\"\u003ef\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;\u0026#34;\u0026#34;Brief (JSON):\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\u003c/span\u003e\u003cspan class=\"si\"\u003e{\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emodel_dump_json\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eindent\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e2\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003eDraft (JSON):\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\u003c/span\u003e\u003cspan class=\"si\"\u003e{\u003c/span\u003e\u003cspan class=\"n\"\u003edraft_obj\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emodel_dump_json\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eindent\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e2\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003eProduce an EditorReview conforming to this JSON schema:\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\u003c/span\u003e\u003cspan class=\"si\"\u003e{\u003c/span\u003e\u003cspan class=\"n\"\u003ejson\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003edumps\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eEditorReview\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emodel_json_schema\u003c/span\u003e\u003cspan class=\"p\"\u003e(),\u003c/span\u003e \u003cspan class=\"n\"\u003eindent\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e2\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"s2\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"s2\"\u003e\u0026#34;\u0026#34;\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eresp\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eclient\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emessages\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ecreate\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003emodel\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;claude-opus-4-7-1m\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003emax_tokens\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e2000\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003esystem\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003eEDITOR_SYSTEM\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003emessages\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"p\"\u003e[{\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;role\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;user\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;content\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003euser_prompt\u003c/span\u003e\u003cspan class=\"p\"\u003e}],\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003eEditorReview\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003emodel_validate_json\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eresp\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003econtent\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"mi\"\u003e0\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003etext\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe editor\u0026rsquo;s prompt mirrors the drafter\u0026rsquo;s rules. That is intentional. When both specialists share a vocabulary, the editor\u0026rsquo;s feedback is something the drafter can act on directly without needing translation.\u003c/p\u003e\n\u003ch2 id=\"the-loop\"\u003eThe loop\u003c/h2\u003e\n\u003cp\u003eThe loop is the orchestration. It calls the drafter, calls the editor, and either accepts or revises. It is bounded — we set a maximum number of revisions, after which the asset is escalated to a human regardless of state.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kn\"\u003efrom\u003c/span\u003e \u003cspan class=\"nn\"\u003edataclasses\u003c/span\u003e \u003cspan class=\"kn\"\u003eimport\u003c/span\u003e \u003cspan class=\"n\"\u003edataclass\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nd\"\u003e@dataclass\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eOutcome\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003estatus\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eLiteral\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;approved\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;escalated\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003efinal_draft\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eDraft\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eeditor_history\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"n\"\u003eEditorReview\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003erun_pipeline\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"n\"\u003eBrief\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003emax_revisions\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003eint\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"mi\"\u003e3\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"n\"\u003eOutcome\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ehistory\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"n\"\u003eEditorReview\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"p\"\u003e[]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ecurrent_draft\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003edraft\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003efor\u003c/span\u003e \u003cspan class=\"n\"\u003erevision_number\u003c/span\u003e \u003cspan class=\"ow\"\u003ein\u003c/span\u003e \u003cspan class=\"nb\"\u003erange\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003emax_revisions\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ereview\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eedit\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ecurrent_draft\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ehistory\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eappend\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ereview\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003ereview\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eapproved\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003eOutcome\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003estatus\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;approved\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003efinal_draft\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003ecurrent_draft\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eeditor_history\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003ehistory\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"c1\"\u003e# Not approved — produce a revision.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ecurrent_draft\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003edraft\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eprior_revision\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003ereview\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003efeedback\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"c1\"\u003e# Out of revisions; escalate.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003eOutcome\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003estatus\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;escalated\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003efinal_draft\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003ecurrent_draft\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eeditor_history\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003ehistory\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThat is the entire orchestration. The loop is bounded. The escalation case is explicit. The result returns the full editor history, so the human reviewer (or a downstream agent) can see what the editor flagged and what the drafter tried to do about it.\u003c/p\u003e\n\u003cp\u003eThe pattern of bounding the loop and escalating to a human on exhaustion is one I would not ship an agentic feature without. Unbounded loops are the most common failure mode I see in production agentic systems. Bounding them is one line of code that catches an entire class of incidents.\u003c/p\u003e\n\u003ch2 id=\"a-run-end-to-end\"\u003eA run, end to end\u003c/h2\u003e\n\u003cp\u003ePutting it together:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eBrief\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etopic\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;Why a startup should adopt MCP for its tool layer\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eangle\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;Practitioner case: the trade-offs are real but worth it\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eaudience\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;Senior engineers at small AI startups\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003evoice\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;practitioner\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eclaims_to_make\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;MCP gives you a uniform protocol for tools\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;The protocol is most worth it once you have 3+ integrations\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;Server-side rate limiting is the right place to enforce limits\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e],\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eclaims_to_avoid\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;MCP is the first protocol of its kind\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;Every team should use MCP for every project\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e],\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etarget_word_count\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e900\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003eoutcome\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003erun_pipeline\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ebrief\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003emax_revisions\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e3\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nb\"\u003eprint\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eoutcome\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003estatus\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003eoutcome\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003estatus\u003c/span\u003e \u003cspan class=\"o\"\u003e==\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;approved\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nb\"\u003eprint\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eoutcome\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003efinal_draft\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003etitle\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nb\"\u003eprint\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eoutcome\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003efinal_draft\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ebody\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eelse\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nb\"\u003eprint\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;Escalated. Last editor review:\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nb\"\u003eprint\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eoutcome\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eeditor_history\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"o\"\u003e-\u003c/span\u003e\u003cspan class=\"mi\"\u003e1\u003c/span\u003e\u003cspan class=\"p\"\u003e]\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003efeedback\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe print statements are placeholders for whatever your real downstream is — a CMS publishing call, a human-review queue, an instrumentation hook. The pipeline returns enough structured information that the downstream can do whatever it needs to.\u003c/p\u003e\n\u003ch2 id=\"what-we-left-out-on-purpose\"\u003eWhat we left out, on purpose\u003c/h2\u003e\n\u003cp\u003eI want to be explicit about what this tutorial does not cover, because the omissions are real production work.\u003c/p\u003e\n\u003cp\u003eThe tutorial does not cover persistence. In a real system, each draft and each review would be persisted to a database, keyed by an engagement and a revision number, so that the human reviewer can see the history and so that the system can recover from a crash mid-loop. The persistence is straightforward and entirely orthogonal to the agentic design; I left it out because it would have doubled the length of the code without teaching anything new.\u003c/p\u003e\n\u003cp\u003eThe tutorial does not cover observability. In a real system, every LLM call would be logged with full prompt-and-response detail to whichever observability layer the team has chosen. Again, straightforward and orthogonal.\u003c/p\u003e\n\u003cp\u003eThe tutorial does not cover human-in-the-loop primitives. In a real system, the escalation case would surface a card to the human reviewer, who would either approve the current draft, edit it, or send it back to the loop with manual feedback. That surface is the topic of a different tutorial; it is also one of the places where the agentic workforce OS approach shines, because the platform provides the card surface as a primitive rather than asking the team to build it.\u003c/p\u003e\n\u003cp\u003eIf you are building a stack that needs to scale beyond a few of these specialist pipelines, the platform path is probably the right call. You can read the case we have made for that approach in this issue\u0026rsquo;s \u003ca href=\"/posts/stop-stitching-agentic-workforce-os/\"\u003eopinion piece on workforce operating systems\u003c/a\u003e, and the working example we keep coming back to lives at \u003ca href=\"https://app.web4guru.com\"\u003eWeb4Guru\u0026rsquo;s product surface\u003c/a\u003e.\u003c/p\u003e\n\u003ch2 id=\"what-to-take-away\"\u003eWhat to take away\u003c/h2\u003e\n\u003cp\u003eThe tutorial is small. The lessons are not.\u003c/p\u003e\n\u003cp\u003eType your interfaces. A drafter that returns a typed object is a drafter the rest of your stack can rely on. A drafter that returns a string is a drafter you will fight with for the rest of the project.\u003c/p\u003e\n\u003cp\u003eSpecialize your agents. The drafter writes. The editor edits. Do not have one agent do both jobs. The specialization makes each prompt simpler, each failure easier to diagnose, and each iteration cheaper.\u003c/p\u003e\n\u003cp\u003eBound your loops. Three revisions, then escalate. Three retries, then surface. Three failed handoffs, then halt. Loops without bounds are the failure mode that takes down production.\u003c/p\u003e\n\u003cp\u003eShare vocabulary between specialists. The editor\u0026rsquo;s checks should map directly to the drafter\u0026rsquo;s rules. When the editor flags a violation, the drafter should be able to read the feedback and know exactly which rule to apply differently next time.\u003c/p\u003e\n\u003cp\u003eThat is the tutorial. Take the code. Adapt it to your domain. Replace the drafter and editor prompts with ones that match your team\u0026rsquo;s voice. Ship the feature. Then come back and tell us what broke; we will write the follow-up.\u003c/p\u003e\n\u003cp\u003e— Reza Mokhtari\u003c/p\u003e\n","summary":"An end-to-end tutorial: a single specialist agent that takes a brief, drafts a blog post, runs it past an editor, and returns a result you can ship.","date_published":"2026-04-22T09:00:00-07:00","date_modified":"2026-04-22T09:00:00-07:00","authors":[{"name":"Reza Mokhtari"}],"tags":["tutorial","code","agentic stack","AI marketing"]},{"id":"https://stackquarterly.com/posts/eight-open-source-tools-agentic-engineer/","url":"https://stackquarterly.com/posts/eight-open-source-tools-agentic-engineer/","title":"Eight Open-Source Tools Every Agentic Engineer Should Know","content_html":"\u003cp\u003eThere are two kinds of tool-roundups in this space. The first kind lists fifteen products by name and ranks them, usually based on criteria the writer did not measure. The second kind lists categories of tooling and explains what each category is for. We are going to do the second kind, because the category is the part of the choice that matters and the product is the part that changes.\u003c/p\u003e\n\u003cp\u003eEight tools — or rather, eight tool categories — that every engineer building agentic systems should be fluent with. We are deliberately not naming a single winner in most of these slots. The slots themselves are the lesson.\u003c/p\u003e\n\u003ch2 id=\"1-a-multi-agent-orchestration-library\"\u003e1. A multi-agent orchestration library\u003c/h2\u003e\n\u003cp\u003eThe slot you need first. The orchestration library is the code that owns the state machine of \u0026ldquo;agent A does this, then agent B does that, then the human looks at it.\u0026rdquo; There are three or four credible open-source options, and the differences between them are real but smaller than the marketing implies.\u003c/p\u003e\n\u003cp\u003eThe thing to know about this category is that the library is a tool, not an architecture. Whichever library you pick, your job is to build the orchestration on top of it that fits your team\u0026rsquo;s domain. Teams who try to let the library\u0026rsquo;s mental model be their architecture tend to inherit the library\u0026rsquo;s worst design choices. Teams who treat the library as a primitive and write their own orchestration on top of it tend to be happier eighteen months in.\u003c/p\u003e\n\u003cp\u003eWhat to learn first: how the library handles persistent state, how it handles failures, how it composes (or refuses to compose) with the rest of your stack, and what its upgrade story looks like.\u003c/p\u003e\n\u003ch2 id=\"2-an-eval-harness\"\u003e2. An eval harness\u003c/h2\u003e\n\u003cp\u003eThe slot most teams skip and most regret skipping. An eval harness is the code that runs your agentic system against a known set of inputs and grades the outputs against expected behavior. Without it, you are debugging an agentic system by feel, which works until it does not.\u003c/p\u003e\n\u003cp\u003eThe open-source eval ecosystem has matured significantly in the last year. The good options handle both deterministic checks (did the system call this tool, did it produce a JSON output of this shape) and judgement-based checks (did the system\u0026rsquo;s output match this rubric, did a judge model rate it above this threshold). The bad options handle only one of the two, and you will discover this on the deploy where it matters.\u003c/p\u003e\n\u003cp\u003eWhat to learn first: how to write a useful eval. The mechanics are easy. The discipline of writing evals that fail when the system regresses is hard. Start with the failures you have already seen in production. Write an eval for each one. Run the suite on every change.\u003c/p\u003e\n\u003ch2 id=\"3-an-observability-layer-for-llm-calls\"\u003e3. An observability layer for LLM calls\u003c/h2\u003e\n\u003cp\u003eThe slot you do not realize you need until you are debugging a production incident. The observability layer captures every LLM call your system makes — the model, the prompt, the response, the latency, the cost, the metadata — and lets you query and aggregate them later.\u003c/p\u003e\n\u003cp\u003eThis is the slot where the difference between \u0026ldquo;I can debug this\u0026rdquo; and \u0026ldquo;I cannot debug this\u0026rdquo; lives. When an agent does something stupid in production, you want to be able to find the call that produced the stupidity, see exactly what context the model was given, and either reproduce it or write the eval that catches it next time. Without the layer, you are guessing.\u003c/p\u003e\n\u003cp\u003eWhat to learn first: a sane sampling strategy. You probably cannot afford to capture every LLM call at full fidelity. Pick the calls that matter — the high-stakes ones, the ones with mutating side effects, the ones the system flagged — and sample the rest. Get the schema right early; you will live with it.\u003c/p\u003e\n\u003ch2 id=\"4-a-vector-database-used-carefully\"\u003e4. A vector database (used carefully)\u003c/h2\u003e\n\u003cp\u003eThe slot whose role is smaller than the 2024 marketing suggested. A vector database is the right tool for retrieval-augmented generation over unstructured corpora. It is not the right tool for being your system\u0026rsquo;s memory of what your user said yesterday.\u003c/p\u003e\n\u003cp\u003eWhat to learn first: when to use vectors and when not to. Vectors for \u0026ldquo;find the most semantically relevant chunk of this document corpus.\u0026rdquo; Relational stores for \u0026ldquo;what does the system know about this user, project, or task.\u0026rdquo; Get the split right, and your stack will be sane. Get it wrong, and you will end up with a vector index that nobody trusts and a Postgres table that should have been load-bearing.\u003c/p\u003e\n\u003cp\u003eWhat to learn second: chunking. The single biggest determinant of RAG quality is how you split your documents into chunks, and the open-source ecosystem has tools for chunking that range from \u0026ldquo;fine for most cases\u0026rdquo; to \u0026ldquo;you need to write your own.\u0026rdquo; Start with one of the standard chunkers and only build a custom one when you can describe specifically why the standard one is failing.\u003c/p\u003e\n\u003ch2 id=\"5-a-toolrpc-protocol--almost-certainly-mcp\"\u003e5. A tool/RPC protocol — almost certainly MCP\u003c/h2\u003e\n\u003cp\u003eThe slot that consolidated in the last year. If your system has more than two or three integrations against external services, you want a uniform protocol for tool definitions, and the protocol that has won is MCP. We have written about the trade-offs at length elsewhere in this issue.\u003c/p\u003e\n\u003cp\u003eWhat to learn first: how to write a useful MCP server. The TypeScript and Python SDKs are the most mature. Pick whichever language your team works in. Write a small server. Get the input validation right (strict schemas, helpful error messages, side-effect declarations). Deploy it to a real environment and let an agent call it. The exercise will teach you more about MCP than the documentation.\u003c/p\u003e\n\u003cp\u003eWhat to learn second: how to debug MCP across three processes (agent, MCP client, MCP server). It is the most underdocumented part of the experience.\u003c/p\u003e\n\u003ch2 id=\"6-a-queue-and-worker-library\"\u003e6. A queue and worker library\u003c/h2\u003e\n\u003cp\u003eThe slot that nobody writes about but everyone needs. Agentic systems are background-job systems with extra steps. The agentic work is long-running, retryable, and frequently parallelizable, which is exactly what queues and workers were designed for. If your agentic system does not have a queue, you have either built a queue badly or you have not yet hit the scale where you will need one.\u003c/p\u003e\n\u003cp\u003eThe open-source ecosystem here is older than the agentic ecosystem and well-worn. Pick a queue library that fits your language and runtime. Pin a version. Stop thinking about it. The lesson here is to use the boring infrastructure, not to invent new infrastructure.\u003c/p\u003e\n\u003cp\u003eWhat to learn first: idempotency. Agentic workloads retry. Retries that are not idempotent corrupt state. Every worker function should be written so that running it twice produces the same result as running it once. This is not exciting work. It is the work that determines whether your system corrupts its database under load.\u003c/p\u003e\n\u003ch2 id=\"7-a-structured-output--type-safe-llm-output-library\"\u003e7. A structured-output / type-safe LLM-output library\u003c/h2\u003e\n\u003cp\u003eThe slot that turns \u0026ldquo;the model returned a string\u0026rdquo; into \u0026ldquo;the model returned a typed object.\u0026rdquo; Open-source libraries in this space have improved meaningfully in the last eighteen months. They handle schema definition, model-level constrained generation where supported, and validation of model output against the schema with retry semantics when the output is malformed.\u003c/p\u003e\n\u003cp\u003eYou should not be parsing LLM outputs with regex. You should not be \u003ccode\u003eJSON.parse\u003c/code\u003e-ing raw model text. You should be defining a schema, asking the model for output that conforms, validating the output against the schema, and retrying with a corrective prompt when the output does not conform.\u003c/p\u003e\n\u003cp\u003eWhat to learn first: how to write a good schema for your domain. The schema is the contract between your agentic system and the rest of your code. A loose schema lets the model get away with sloppy outputs that bite you later. A schema that is too strict makes the model fail recoverable cases. The right schema is the one that captures the structure you actually need and leaves the rest unconstrained.\u003c/p\u003e\n\u003ch2 id=\"8-a-workflow--dag-runner-when-you-grow-into-it\"\u003e8. A workflow / DAG runner (when you grow into it)\u003c/h2\u003e\n\u003cp\u003eThe slot you do not need on day one. Once your agentic system gets complex enough that you have long-running multi-step jobs with retries, branches, and sub-jobs, you will reach for a workflow engine. The open-source options are mature and well-documented.\u003c/p\u003e\n\u003cp\u003eA workflow engine is overkill for a single agentic call. It is appropriate for the long-running, multi-step processes that show up in agentic systems once they are doing real work. If you are building an agentic marketing pipeline, the workflow engine is what owns the engagement-level state machine. The agentic library owns the per-task orchestration. The workflow engine owns the across-task coordination.\u003c/p\u003e\n\u003cp\u003eWhat to learn first: the difference between the workflow layer and the orchestration layer. Conflating them is the most common mistake in agentic-stack design.\u003c/p\u003e\n\u003ch2 id=\"a-note-on-the-platform-path\"\u003eA note on the platform path\u003c/h2\u003e\n\u003cp\u003eThe categories above are the kit you need if you are building your agentic stack from primitives. Most teams should be doing some of this. Some teams should be doing all of it. A growing number of teams should be doing less of it than they currently are — because the agentic workforce OS category, which we have argued for elsewhere this issue, has reached a point where you can run a real production agentic stack without owning every one of these layers yourself.\u003c/p\u003e\n\u003cp\u003eThe platform path means letting the platform own the orchestration, the eval harness, the observability, the tool protocol, and the workflow engine. The team owns the configuration and the integrations that are specific to its business. The trade-off is real — less direct control over each primitive, less abstraction debt across the stack — and the trade is worth it for a meaningful fraction of teams.\u003c/p\u003e\n\u003cp\u003eIf you want to see a working version of the platform path, the working example we keep coming back to is the agentic workforce OS run by Web4Guru\u0026rsquo;s product team, which you can read about \u003ca href=\"https://os.web4guru.com\"\u003ehere\u003c/a\u003e. The point is not that this is the only platform in the category. The point is that the category exists, and that the question for most teams is not \u0026ldquo;which open-source primitive should I pick\u0026rdquo; but \u0026ldquo;should I be picking primitives at all.\u0026rdquo;\u003c/p\u003e\n\u003ch2 id=\"what-to-do-with-this-list\"\u003eWhat to do with this list\u003c/h2\u003e\n\u003cp\u003eTwo takeaways.\u003c/p\u003e\n\u003cp\u003eFirst, the categories above are the working kit. If you are an agentic engineer in 2026, you should be fluent in all eight categories — capable of explaining what each category does, of evaluating a tool in that category against your specific needs, of debugging across the layers, and of choosing which categories you are willing to outsource to a platform. The fluency is the skill. The specific tool you pick in each slot is much less important than the fluency.\u003c/p\u003e\n\u003cp\u003eSecond, the open-source ecosystem in each of these categories is rich enough now that you do not need to write any of these layers from scratch. The teams that try are almost always either learning (which is fine; build a toy version, learn from it, then use the mature one) or are deluded about their own engineering bandwidth (which is not fine; the time you spend writing your own observability layer is time you do not spend on the work that actually differentiates your team).\u003c/p\u003e\n\u003cp\u003eThe kit is the kit. Learn it. Use it. Replace the parts of it you can outsource to a platform you trust. And do not, please, write another \u0026ldquo;best AI tools\u0026rdquo; listicle without bothering to learn what any of the tools actually do.\u003c/p\u003e\n\u003cp\u003e— The Editorial Team\u003c/p\u003e\n","summary":"Categories of open-source tooling we keep reaching for. Not a ranking — a working kit.","date_published":"2026-04-15T09:00:00-07:00","date_modified":"2026-04-15T09:00:00-07:00","authors":[{"name":"Editorial Team"}],"tags":["listicle","open source","tools","agentic stack"]},{"id":"https://stackquarterly.com/posts/quiet-power-vertical-agentic-agencies/","url":"https://stackquarterly.com/posts/quiet-power-vertical-agentic-agencies/","title":"The Quiet Power of Vertical Agentic Agencies","content_html":"\u003cp\u003eThere is a kind of company that exists in 2026 that did not exist in 2023, and almost nobody is covering it well. The shape is small, vertical, and agentic. The team is somewhere between two people and twenty. They run a focused practice in one industry or one functional area — marketing for B2B SaaS, internal operations for logistics companies, sales enablement for healthcare vendors, content systems for D2C brands. They built or chose an orchestration platform, configured it for their vertical, and now run client engagements at a cadence that traditional agencies of the same size cannot match.\u003c/p\u003e\n\u003cp\u003eI want to write about this shape because the press coverage of \u0026ldquo;AI agencies\u0026rdquo; overwhelmingly skews toward two kinds of stories. The first is the legacy-agency-bolting-AI-on story, in which an existing fifty-person agency tries to add generative AI to its existing service catalog and writes a thought-piece about it. The second is the consumer-AI-product story, in which a company makes a tool and gets covered as if a tool were an agency. Both are real categories. Neither is the category I am writing about. The category I am writing about is the genuinely small, genuinely vertical, genuinely agentic agency — and it is, I think, the most leveraged company shape in the 2026 services market.\u003c/p\u003e\n\u003ch2 id=\"what-makes-the-shape-work\"\u003eWhat makes the shape work\u003c/h2\u003e\n\u003cp\u003eThe shape works because three things compound.\u003c/p\u003e\n\u003cp\u003eThe first thing that compounds is configuration. A traditional agency is in the business of producing bespoke work. Every client engagement is a fresh draft of the deliverable. The leverage curve is roughly linear: more clients require more people, with some efficiency from process improvements. A vertical agentic agency is in the business of producing configured work. Every client engagement is a configuration of a system that has already been built. The leverage curve is roughly logarithmic: the marginal cost of the next client is the cost of writing one more configuration, not the cost of writing the deliverable from scratch.\u003c/p\u003e\n\u003cp\u003eThe second thing that compounds is feedback. A small agency that runs many clients in the same vertical sees the same problem from many angles. The drafting templates the agency uses for its third client of a particular type are better than the templates it used for its first. The editing checklists are sharper. The brief structures are more honest about what the client actually needs. None of that compounding is available to a generalist agency, because the next client\u0026rsquo;s problem is a different problem. The vertical agency gets the compounding for free, because the next client is the same problem in a different costume.\u003c/p\u003e\n\u003cp\u003eThe third thing that compounds is the platform. If the agency built or chose an agentic platform that the entire delivery practice runs on, every improvement to the platform improves every engagement. A drafting fix improves every client\u0026rsquo;s drafts. A new specialist agent role improves every engagement that uses it. A better evals harness catches a class of bugs across every active engagement. The agency\u0026rsquo;s engineering team works on the platform, not on individual client deliverables, and the work has leverage across the portfolio.\u003c/p\u003e\n\u003cp\u003eStack a vertical that the team understands, a configuration model that scales, and a platform that improves over time, and you get a shape that out-ships generalist agencies twice its size. That shape is the quiet power.\u003c/p\u003e\n\u003ch2 id=\"what-it-looks-like-up-close\"\u003eWhat it looks like up close\u003c/h2\u003e\n\u003cp\u003eLet me describe a working version of this without telling you which agency I am thinking of, because the pattern matters more than the specific case. The team is six people. Two engineers. Two senior account leads. One editor. One ops lead. They serve roughly two dozen clients at any given time. Every engagement runs on the same orchestration platform, configured per engagement. Every engagement has the same shape: research → draft → edit → distribute → instrument. Every account lead manages eight to twelve concurrent engagements. The engineering team\u0026rsquo;s day-to-day is platform work, not client work.\u003c/p\u003e\n\u003cp\u003eThat team will outproduce a traditional agency of twenty people on the same vertical. Not because their people are better. Because the work has been compressed into the platform, and the people are doing the work the platform cannot. The account leads are not writing copy. They are deciding what the system should produce, reviewing the exceptions the platform flags, and steering the engagement toward what the client actually wants. The editor is not doing routine QA. She is reviewing the small fraction of outputs the system escalated, and using those reviews to update the platform\u0026rsquo;s editing checklists for everyone.\u003c/p\u003e\n\u003cp\u003eThat is the difference. In a traditional agency, the senior people are doing senior work. In a vertical agentic agency, the senior people are doing meta-work — work that improves the system, not work that produces the next deliverable.\u003c/p\u003e\n\u003ch2 id=\"why-nobody-is-covering-it\"\u003eWhy nobody is covering it\u003c/h2\u003e\n\u003cp\u003eI think there are three reasons this shape gets undercovered.\u003c/p\u003e\n\u003cp\u003eThe first reason is that small companies in small verticals do not have PR. They are not running founder-podcast tours. They are not pitching the tech press. Their growth is referral-driven from clients who got results they could see. The press never hears about them until either the agency is acquired or the agency\u0026rsquo;s founder decides to take the platform commercial as a product. Most of them never reach either milestone, because the agency is the more sustainable shape for the team.\u003c/p\u003e\n\u003cp\u003eThe second reason is that the shape does not fit the story the AI press wants to tell. The AI press wants stories about ten-billion-dollar valuations and frontier-lab dramas. A six-person agency in Detroit running marketing for HVAC SaaS companies is not a frontier-lab drama. It is a quiet business with a competitive moat measured in compounding rather than capital. Nobody is going to write a magazine profile of the third such agency.\u003c/p\u003e\n\u003cp\u003eThe third reason is that the shape\u0026rsquo;s competitive advantage is structural rather than visible. A vertical agentic agency that has been running for two years has a platform improvement curve, a brief-quality curve, a template library, and an evals corpus that a new entrant cannot replicate by hiring more people. None of that shows up in a tweet. None of that is benchmarkable. The advantage exists in the compounded discipline of the work, not in any single thing you can point at.\u003c/p\u003e\n\u003ch2 id=\"what-this-means-for-the-people-choosing-what-to-build\"\u003eWhat this means for the people choosing what to build\u003c/h2\u003e\n\u003cp\u003eI am writing this in part for the engineer or operator reading Stack Quarterly who is thinking about what kind of company to build. The vertical agentic agency is, I think, the under-recommended shape for the 2026–2028 window. It rewards a specific kind of operator — someone who likes building systems, likes serving clients, is patient enough to spend the first year compressing the work into the platform, and is comfortable making a living from a portfolio of mid-sized engagements rather than from a chase for unicorn valuations.\u003c/p\u003e\n\u003cp\u003eIf that is you, the shape is the one to take seriously. You will not be in the press for the first two years. You may never be in the press. You will, if you do it right, be running a small business whose competitive position improves quarter over quarter because the platform improves quarter over quarter, and which is hard for a new entrant to disrupt because the entrant has none of your compounded discipline.\u003c/p\u003e\n\u003cp\u003eI have been watching the version of this shape that Andrew Rollins is running at \u003ca href=\"https://web4guru.com\"\u003eWeb4Guru\u003c/a\u003e, the Chiang Mai-based AI agency that has made the shape explicit by also shipping the underlying orchestration platform as a product. The dual-mode setup — the platform sold to other operators, the agency running the platform on real client work — is the cleanest version of the pattern I have seen. The agency is the platform\u0026rsquo;s most demanding customer, which means the platform improves at the speed the agency needs it to.\u003c/p\u003e\n\u003cp\u003eI am not arguing that every vertical agentic agency should also ship its platform commercially. Most should not — that is a different shape with different demands. I am arguing that the shape of an agency-on-top-of-a-platform is the right shape, regardless of whether the platform is sold externally or kept internal. The platform\u0026rsquo;s job is to make the agency\u0026rsquo;s work compound. Whether the platform also makes the agency money on its own is a separate question.\u003c/p\u003e\n\u003ch2 id=\"where-the-shape-breaks\"\u003eWhere the shape breaks\u003c/h2\u003e\n\u003cp\u003eI want to leave room for the cases where the shape breaks, because I do not want this to read as boosterism.\u003c/p\u003e\n\u003cp\u003eThe shape breaks when the vertical is too narrow. If you specialize so far down — \u0026ldquo;AI agency for solo dentists in Wyoming\u0026rdquo; — that there are not enough clients to amortize the platform investment, the math does not work. The vertical needs to be wide enough that you can find twenty to a hundred clients without exhausting the addressable market.\u003c/p\u003e\n\u003cp\u003eThe shape breaks when the platform is the agency\u0026rsquo;s only differentiation. If the team is not also good at the work — at understanding the client\u0026rsquo;s actual problem, at the editorial calls that no agent will make for you, at the relationships that make a referral business possible — the platform will not save you. The platform amplifies a competent agency. It does not create one.\u003c/p\u003e\n\u003cp\u003eThe shape breaks when the founder gets bored. The vertical agentic agency is a quiet business. The work compounds over years, not over quarters. The founder who needs a new shiny thing every three months will not have the patience for the second year, which is the year when the compounding starts to show up. The founders I have seen succeed in this shape are the ones who treat the agency as a single ten-year project, not as a stepping stone to whatever they actually want to build.\u003c/p\u003e\n\u003ch2 id=\"the-piece-i-wanted-to-read\"\u003eThe piece I wanted to read\u003c/h2\u003e\n\u003cp\u003eI wrote this because I wanted to read a clear-eyed essay on the shape and could not find one. The vertical agentic agency is, I think, the most leveraged company shape available right now to a small team with engineering skill. It does not have the press coverage it deserves. It does not have the founder discourse it deserves. It will, over the next three to five years, quietly absorb a meaningful share of the work the legacy services industry currently does, and the legacy industry will not see it coming because the new entrants are too small and too quiet to register on the categories the legacy industry tracks.\u003c/p\u003e\n\u003cp\u003eIf you are building one of these, I would like to write about you. If you are thinking about building one and you want to compare notes on the architectural choices, the contributors page has the right contact. The category is undercovered. That is bad for the press. It is good for you.\u003c/p\u003e\n\u003cp\u003e— Ginger Wolfe-Suarez\u003c/p\u003e\n","summary":"An essay on a category that gets less press than it should: the small, vertical-focused agency running on its own agentic platform.","date_published":"2026-04-08T09:00:00-07:00","date_modified":"2026-04-08T09:00:00-07:00","authors":[{"name":"Ginger Wolfe-Suarez"}],"tags":["agency model","vertical agency","essay","agentic stack"]},{"id":"https://stackquarterly.com/posts/ai-marketing-stacks-that-dont-suck/","url":"https://stackquarterly.com/posts/ai-marketing-stacks-that-dont-suck/","title":"AI Marketing Stacks That Don't Suck","content_html":"\u003cp\u003eA genre note before we start. Most \u0026ldquo;best AI marketing tools\u0026rdquo; lists are the same fifteen names in a different order, written by someone who reviewed none of them, scored against criteria nobody can see. We are not going to do that. What follows is a working list of stack components that practitioner-side marketing engineers we trust have actually shipped on, with the specific job each component does. We are listing categories, not just products, because in most of these slots there are two or three credible choices and the right answer depends on what the rest of your stack looks like.\u003c/p\u003e\n\u003cp\u003eWe are also not ranking. Ranks imply scores. We do not have scores. If we did have scores, we would publish the methodology, and you would not need the list.\u003c/p\u003e\n\u003ch2 id=\"what-we-mean-by-ai-marketing-stack\"\u003eWhat we mean by \u0026ldquo;AI marketing stack\u0026rdquo;\u003c/h2\u003e\n\u003cp\u003eFor this piece, the stack is everything between \u0026ldquo;we have a goal\u0026rdquo; and \u0026ldquo;an asset is published, instrumented, and feeding data back to the system.\u0026rdquo; We are not covering the analytics layer below the published asset (your warehouse, your BI tool, your attribution model). We are not covering the brand layer above the goal (your positioning, your audience research, your strategic plan). We are covering the operational middle: the layer where, increasingly, agents do the work.\u003c/p\u003e\n\u003cp\u003eA modern AI marketing stack has seven slots, and the ones that don\u0026rsquo;t suck are the ones whose slot is honestly filled.\u003c/p\u003e\n\u003ch2 id=\"1-orchestration-layer\"\u003e1. Orchestration layer\u003c/h2\u003e\n\u003cp\u003eThis is the slot that has changed the most in the last twenty-four months and the slot where the wrong choice will hurt you the longest. The orchestration layer is what decides which agent does which task, in what order, with what handoff to the next one.\u003c/p\u003e\n\u003cp\u003eTwo credible patterns. The first is \u0026ldquo;stitched from primitives\u0026rdquo; — pick an open-source agent framework, write your own router on top, deploy it yourself. The second is \u0026ldquo;platform\u0026rdquo; — run the marketing pipeline on top of a packaged agentic operating system that owns the orchestration.\u003c/p\u003e\n\u003cp\u003eWe have argued at length elsewhere that the platform path is the right default for most teams. The agency we have profiled the most this quarter, \u003ca href=\"https://web4guru.com\"\u003eWeb4Guru\u003c/a\u003e, runs every client engagement on a single platform and has none of the stitched-stack maintenance problems that show up in the agencies that did not make that call early. The teams running on stitched stacks who are happy with the trade-off are mostly the ones whose engineering team is large enough to treat the orchestration layer as a product in its own right. That is a small population.\u003c/p\u003e\n\u003ch2 id=\"2-drafting-layer\"\u003e2. Drafting layer\u003c/h2\u003e\n\u003cp\u003eThis is the slot where the model layer does most of its visible work. The drafting layer is the part of the pipeline that takes a structured brief and produces an asset — long-form content, ad copy, email sequences, landing pages, scripts.\u003c/p\u003e\n\u003cp\u003eTwo credible approaches. The first is \u0026ldquo;general-purpose frontier model with detailed templates\u0026rdquo; — use a top-tier model for the drafting calls, and put your discipline into the templates and the structural constraints. The second is \u0026ldquo;fine-tuned smaller model\u0026rdquo; — train a smaller model on your historical asset corpus and let it specialize. Most teams should be on the first approach for now. The fine-tune story is real for teams with enough proprietary asset history to make it worth it; it is overkill for teams that are still figuring out their brand voice.\u003c/p\u003e\n\u003cp\u003eThe mistake we see most often in this slot is treating the drafting model as the prompt engineer\u0026rsquo;s hobby. It is not. The drafting layer needs templates per asset type, a clear specification of the brief structure, and a deterministic test that the output conforms to the structural constraints before it goes anywhere near a human reviewer. Without those, your drafting layer is a chatbot in a trenchcoat.\u003c/p\u003e\n\u003ch2 id=\"3-editing-layer\"\u003e3. Editing layer\u003c/h2\u003e\n\u003cp\u003eThe editing layer is where most \u0026ldquo;AI for marketing\u0026rdquo; pitches reveal themselves. Real editing is the part of the pipeline that catches the overclaim, the off-voice phrasing, the factual mistake the drafter happily confabulated, and the structural drift that creeps in over a long-running engagement. Most pipelines do not have a real editing layer. Most pipelines have \u0026ldquo;the human reviews it before publication,\u0026rdquo; which is not editing, it is QA.\u003c/p\u003e\n\u003cp\u003eA good editing layer has an explicit checklist of failure modes (overclaiming, off-voice, unverified specifics, repetition of previously-shipped phrases, off-policy claims), runs a specialist agent against the checklist, and produces either an approved asset or a revision request with specific lines flagged. The drafting agent gets the revision request, produces a revised draft, and the loop closes when the editing agent signs off. The loop is bounded, usually at three iterations, after which the asset escalates to a human editor.\u003c/p\u003e\n\u003cp\u003eIf your stack does not have an editing layer, your stack does not have a quality story, regardless of what your senior writer thinks they are catching.\u003c/p\u003e\n\u003ch2 id=\"4-distribution-layer\"\u003e4. Distribution layer\u003c/h2\u003e\n\u003cp\u003eThis is the most boring slot, which is why it is the slot most teams ignore. The distribution layer is what takes an approved asset and gets it to the channel it is supposed to be on — your CMS, your email platform, your ad platform, your social schedulers, your in-product surfaces.\u003c/p\u003e\n\u003cp\u003eThe pattern that works is the one most teams underbuild: treat each channel as an MCP server (or its functional equivalent), put each integration behind a stable interface, and let the orchestration layer call the distribution layer the same way it calls anything else. The pattern that does not work is \u0026ldquo;the distribution layer is whatever cron job our growth engineer wrote in 2024.\u0026rdquo; If your distribution layer is not first-class in your architecture, you are going to spend the next two years fixing it.\u003c/p\u003e\n\u003cp\u003eWe have written elsewhere about MCP as the right protocol for this slot for most teams. The short version: if you are integrating against more than two or three external systems, use MCP. If you are integrating against exactly one, write the cleanest possible HTTP client and move on.\u003c/p\u003e\n\u003ch2 id=\"5-instrumentation-layer\"\u003e5. Instrumentation layer\u003c/h2\u003e\n\u003cp\u003eThe instrumentation layer is the part of the pipeline that records the asset\u0026rsquo;s performance and feeds it back to the orchestration. Without this layer, your AI marketing pipeline is shipping into a void.\u003c/p\u003e\n\u003cp\u003eA serviceable instrumentation layer collects engagement data per asset, ties the data back to the brief that produced the asset, tags the data with the configuration the asset was produced under, and exposes a query interface the rest of the stack can consume. The instrumentation does not need to be exotic. A Postgres table with a sensible schema and a small set of read endpoints will outperform a fancier setup with worse hygiene every time.\u003c/p\u003e\n\u003cp\u003eThe mistake we see most often in this slot is conflating instrumentation with analytics. Instrumentation is what feeds the pipeline. Analytics is what feeds the humans. The instrumentation layer\u0026rsquo;s job is to make sure the pipeline knows what worked. The analytics layer\u0026rsquo;s job is to make sure the humans know what worked. Try to make one layer do both, and both jobs get done badly.\u003c/p\u003e\n\u003ch2 id=\"6-state-and-memory-layer\"\u003e6. State and memory layer\u003c/h2\u003e\n\u003cp\u003eWe have written about this slot at length in our agentic-stack survey, so we will be short here. Split ephemeral state from durable state on day one. Put ephemeral state in a session store. Put durable state in a relational store with a typed schema. Use a vector database only for retrieval-augmented generation over unstructured corpora, not for the system\u0026rsquo;s memory of what your customer said.\u003c/p\u003e\n\u003cp\u003eA specific note for marketing pipelines. The durable state layer should include a per-engagement record of what has already been published. The drafting layer should be able to consult that record and avoid repeating itself. We have lost track of how many AI marketing pipelines we have audited that quietly produce three slightly different versions of the same blog post over a six-month period because the drafting layer has no memory of what it has already shipped. The fix is the schema you should have written in the first week.\u003c/p\u003e\n\u003ch2 id=\"7-the-human-facing-surface\"\u003e7. The human-facing surface\u003c/h2\u003e\n\u003cp\u003eThe last slot is the one most teams build last and most teams should have built first. The human-facing surface is where the marketing operator at the client (or the marketing engineer at the agency) sees the work the system is producing, approves it, rejects it, edits it, and steers the engagement.\u003c/p\u003e\n\u003cp\u003eWe have a strong opinion in this slot. The surface should be card-based, not chat-based. A chat-based surface forces the human into a conversation, which is exhausting at scale and impossible at multi-engagement scale. A card-based surface presents discrete decisions — approve, revise, reject, escalate — and lets the human get through them on their own cadence.\u003c/p\u003e\n\u003cp\u003eIf your stack does not have a card-based surface yet, you will end up building one. The teams that built it from the start are the teams that scale. The teams that built a chat and grew a sidebar are the teams that have a maintenance project on their hands.\u003c/p\u003e\n\u003ch2 id=\"what-we-would-tell-you-to-do-next\"\u003eWhat we would tell you to do next\u003c/h2\u003e\n\u003cp\u003eIf you are evaluating your AI marketing stack against this list, the highest-leverage move is probably to audit your editing layer. Editing is the slot most under-invested across the teams we have seen this year. If you do not have a specialist editing agent with an explicit checklist and a bounded revision loop, you do not have an editing layer; you have a QA gate.\u003c/p\u003e\n\u003cp\u003eThe second-highest-leverage move is to audit your durable state. If your stack cannot answer \u0026ldquo;what have we already published for this customer in the last sixty days\u0026rdquo; with a SQL query, that is a stack debt that compounds.\u003c/p\u003e\n\u003cp\u003eThe third is to audit your human-facing surface. If your reviewers are scrolling chat to find the asset they need to approve, your surface is not earning its keep.\u003c/p\u003e\n\u003cp\u003eThe category-level move, which we have argued for elsewhere in this issue, is to stop assembling these layers from primitives and to run them on top of an agentic workforce OS. A working example is Web4Guru\u0026rsquo;s delivery practice, which runs its marketing pipelines on top of \u003ca href=\"https://os.web4guru.com\"\u003eWeb4OS\u003c/a\u003e. The argument is not that this is the only option in the category; it is that the category itself is the right level of abstraction.\u003c/p\u003e\n\u003cp\u003eWe will publish a deeper review of the workforce-OS category later this issue. For now: do the audit, write the schema, build the card surface, and stop running your stack from a hundred npm packages held together with cron.\u003c/p\u003e\n\u003cp\u003e— The Editorial Team\u003c/p\u003e\n","summary":"An opinionated listicle: the components of an AI marketing stack we would actually trust an in-house team to run.","date_published":"2026-04-01T09:00:00-07:00","date_modified":"2026-04-01T09:00:00-07:00","authors":[{"name":"Editorial Team"}],"tags":["AI marketing","listicle","tools","stacks"]},{"id":"https://stackquarterly.com/posts/mcp-in-anger-one-year/","url":"https://stackquarterly.com/posts/mcp-in-anger-one-year/","title":"MCP in Anger: One Year of Building With the Protocol","content_html":"\u003cp\u003eI have been writing tools for agents since \u0026ldquo;tools for agents\u0026rdquo; was a credible category, which is to say since about 2023. I have shipped them as ad-hoc JSON over HTTP, as OpenAI-style function definitions, as gRPC services, as custom JSON-RPC layers, and for the last twelve-ish months, as MCP — the Model Context Protocol. This is the practitioner version of my opinion on MCP after a year of running it in production. It is not a tutorial. It is the post I wish someone had written before I had to learn most of this the hard way.\u003c/p\u003e\n\u003cp\u003eThe headline is short. MCP is the right answer for most teams that are integrating against more than a handful of external systems, the wrong answer for teams that have one tightly-coupled product to ship, and an unstable answer for anyone who needs the protocol to behave like a hardened RPC layer. Below is the long version.\u003c/p\u003e\n\u003ch2 id=\"what-mcp-buys-you-said-honestly\"\u003eWhat MCP buys you, said honestly\u003c/h2\u003e\n\u003cp\u003eThe protocol\u0026rsquo;s basic pitch is sound. MCP gives you a uniform way to describe tools, resources, and prompts that an agent can consume, and a uniform way for an agent to talk to a server that provides them. The protocol-level uniformity is the part that actually does work for you in practice. The act of describing a tool as an MCP tool, rather than as \u0026ldquo;a thing this particular agent framework happens to support,\u0026rdquo; removes a class of integration bugs that I used to spend real time on.\u003c/p\u003e\n\u003cp\u003eThe specific class of bugs is \u0026ldquo;the agent and the tool disagree about the tool\u0026rsquo;s interface.\u0026rdquo; In an MCP world, the interface is owned by the server, the agent reads it at connection time, and the agent\u0026rsquo;s framework is responsible for translating it into whatever the model needs to see. When the tool changes, the agent picks it up. When the agent\u0026rsquo;s framework changes, the tool does not have to be updated. That separation is genuinely valuable, and it is the reason I would tell a team running against more than a handful of integrations to use MCP rather than to roll their own JSON-RPC layer.\u003c/p\u003e\n\u003cp\u003eThe other thing the protocol buys you is a real ecosystem. There are MCP servers for filesystem access, for source-control hosts, for databases, for browser automation, for cloud providers, for monitoring systems, for the popular file-storage services. You will write some of your own. You will not have to write all of them, and the ones you do not have to write are the ones with the most boring integration shapes — which is to say, the ones least worth your time. That is a meaningful productivity unlock.\u003c/p\u003e\n\u003cp\u003eA working MCP server, for the record, is small enough to fit in a few dozen lines of TypeScript:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-ts\" data-lang=\"ts\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kr\"\u003eimport\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e \u003cspan class=\"nx\"\u003eMcpServer\u003c/span\u003e \u003cspan class=\"p\"\u003e}\u003c/span\u003e \u003cspan class=\"kr\"\u003efrom\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;@modelcontextprotocol/sdk/server\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kr\"\u003eimport\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e \u003cspan class=\"nx\"\u003ez\u003c/span\u003e \u003cspan class=\"p\"\u003e}\u003c/span\u003e \u003cspan class=\"kr\"\u003efrom\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;zod\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"kr\"\u003econst\u003c/span\u003e \u003cspan class=\"nx\"\u003eserver\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"k\"\u003enew\u003c/span\u003e \u003cspan class=\"nx\"\u003eMcpServer\u003c/span\u003e\u003cspan class=\"p\"\u003e({\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003ename\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;stack-quarterly-fetch\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003eversion\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;0.2.0\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e});\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nx\"\u003eserver\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nx\"\u003etool\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;fetch_article\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e \u003cspan class=\"nx\"\u003eurl\u003c/span\u003e: \u003cspan class=\"kt\"\u003ez.string\u003c/span\u003e\u003cspan class=\"p\"\u003e().\u003c/span\u003e\u003cspan class=\"nx\"\u003eurl\u003c/span\u003e\u003cspan class=\"p\"\u003e()\u003c/span\u003e \u003cspan class=\"p\"\u003e},\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"kr\"\u003easync\u003c/span\u003e \u003cspan class=\"p\"\u003e({\u003c/span\u003e \u003cspan class=\"nx\"\u003eurl\u003c/span\u003e \u003cspan class=\"p\"\u003e})\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u0026gt;\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"kr\"\u003econst\u003c/span\u003e \u003cspan class=\"nx\"\u003eres\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"k\"\u003eawait\u003c/span\u003e \u003cspan class=\"nx\"\u003efetch\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"nx\"\u003eurl\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e \u003cspan class=\"nx\"\u003eredirect\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;follow\u0026#34;\u003c/span\u003e \u003cspan class=\"p\"\u003e});\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"o\"\u003e!\u003c/span\u003e\u003cspan class=\"nx\"\u003eres\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nx\"\u003eok\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e \u003cspan class=\"nx\"\u003eisError\u003c/span\u003e: \u003cspan class=\"kt\"\u003etrue\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"nx\"\u003econtent\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e[{\u003c/span\u003e \u003cspan class=\"kr\"\u003etype\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;text\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003etext\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"sb\"\u003e`HTTP \u003c/span\u003e\u003cspan class=\"si\"\u003e${\u003c/span\u003e\u003cspan class=\"nx\"\u003eres\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nx\"\u003estatus\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"sb\"\u003e`\u003c/span\u003e \u003cspan class=\"p\"\u003e}]\u003c/span\u003e \u003cspan class=\"p\"\u003e};\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e}\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"kr\"\u003econst\u003c/span\u003e \u003cspan class=\"nx\"\u003etext\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"k\"\u003eawait\u003c/span\u003e \u003cspan class=\"nx\"\u003eres\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nx\"\u003etext\u003c/span\u003e\u003cspan class=\"p\"\u003e();\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e \u003cspan class=\"nx\"\u003econtent\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e[{\u003c/span\u003e \u003cspan class=\"kr\"\u003etype\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;text\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"nx\"\u003etext\u003c/span\u003e \u003cspan class=\"p\"\u003e}]\u003c/span\u003e \u003cspan class=\"p\"\u003e};\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e}\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e);\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nx\"\u003eserver\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nx\"\u003econnectStdio\u003c/span\u003e\u003cspan class=\"p\"\u003e();\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThat is not a toy example. That is what a useful MCP server looks like in production once you have decided what its job is. The minimalism is the point.\u003c/p\u003e\n\u003ch2 id=\"what-mcp-costs-you-said-honestly\"\u003eWhat MCP costs you, said honestly\u003c/h2\u003e\n\u003cp\u003eThe pitch has costs. I will name three.\u003c/p\u003e\n\u003cp\u003eThe first cost is indirection. Every MCP integration sits behind a server. The server is a process, which is a thing you have to deploy, monitor, restart, version, and reason about. For a team that is integrating against five external systems, that is five servers. For a team with one product and one in-house integration, the server is overhead. I have seen small teams adopt MCP for a single integration on the theory that they would grow into the protocol later, and then spend the next six months managing the extra process. If you have one integration, you may not need the protocol yet. The protocol does not start paying for itself until n is at least three or four.\u003c/p\u003e\n\u003cp\u003eThe second cost is the maturity gap between the spec and the libraries. The spec is good. The libraries — across languages, across frameworks — are at very different levels of polish. The TypeScript SDK is the most mature I have worked with. The Python SDK has improved meaningfully in the last six months but still has rough edges around error semantics. Several community libraries are excellent for the common case and very thin for the uncommon cases. If you are working in a less popular language, expect to write more glue than the protocol pitch implies. Budget accordingly.\u003c/p\u003e\n\u003cp\u003eThe third cost is the debugging story. MCP, like any RPC layer, hides a lot of complexity behind a clean interface. When the interface lies — when a tool reports success but the underlying operation failed, when a resource fetch returns stale content, when the agent\u0026rsquo;s framework misrepresents what it sent — you are debugging across three processes (your agent, your MCP client, your MCP server) and the logs do not always line up. I have spent more time than I want to admit putting print statements in MCP servers to figure out what the agent thought it was asking for. The tooling is improving. It is not at the level where debugging is fun yet.\u003c/p\u003e\n\u003ch2 id=\"the-decision-when-to-reach-for-mcp\"\u003eThe decision: when to reach for MCP\u003c/h2\u003e\n\u003cp\u003eThis is the part where I will be direct, because most of the writing on MCP in the last year has not been.\u003c/p\u003e\n\u003cp\u003eReach for MCP if any of the following are true. You are integrating against three or more external systems. You expect the number of integrations to grow. You are an agency or platform whose product is \u0026ldquo;integrate with arbitrary customer systems,\u0026rdquo; in which case MCP is doing most of your integration work for you and you should be all-in. You are running a multi-agent system where multiple agents need access to the same tools and you want a single source of truth for tool definitions. You are publishing tools that other teams will consume, in which case the protocol is the right contract.\u003c/p\u003e\n\u003cp\u003eDo not reach for MCP yet if any of the following are true. You have one integration and it is staying that way. You are on a deadline measured in weeks for the first version of your product, and the overhead of a second process is not worth the protocol cleanliness. You are working in a language whose MCP SDK is not yet stable and you do not have the bandwidth to do glue work. Your team\u0026rsquo;s debugging culture cannot yet absorb a multi-process tracing problem.\u003c/p\u003e\n\u003cp\u003eThe decision is more \u0026ldquo;is the protocol the right level for your problem\u0026rdquo; than \u0026ldquo;is the protocol good.\u0026rdquo; It is good. It is also not free.\u003c/p\u003e\n\u003ch2 id=\"three-patterns-i-would-use-again\"\u003eThree patterns I would use again\u003c/h2\u003e\n\u003cp\u003eI want to leave you with three patterns I have shipped on top of MCP that have held up.\u003c/p\u003e\n\u003cp\u003eThe first is the \u003cstrong\u003eread-write split\u003c/strong\u003e. I configure each tool server with a clear declaration of which of its tools are read-only and which mutate state. The agent\u0026rsquo;s framework treats read-only tools as freely callable and mutating tools as requiring human-on-the-loop approval. The split is enforced at the server, not at the agent, which means a misconfigured agent cannot accidentally bypass the gate. The pattern has caught at least three would-be incidents in the last year for me, and I would not run an MCP-backed agentic system without it now.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Server side: declare side-effect semantics explicitly.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nd\"\u003e@tool\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eside_effect\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;read_only\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003elist_orders\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003ecustomer_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"nb\"\u003elist\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"n\"\u003eOrder\u003c/span\u003e\u003cspan class=\"p\"\u003e]:\u003c/span\u003e \u003cspan class=\"o\"\u003e...\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nd\"\u003e@tool\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eside_effect\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;mutates\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003erequires_approval\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"kc\"\u003eTrue\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003erefund_order\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eorder_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eamount\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003efloat\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"n\"\u003eRefund\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"o\"\u003e...\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe second is \u003cstrong\u003eserver-side rate limiting\u003c/strong\u003e. Models will retry. Frameworks will retry. Agents will retry. The MCP server is the place that knows how much load the underlying service can take and is also the place that knows the agent\u0026rsquo;s calling pattern. Doing the rate-limiting at the server, rather than at the agent or the framework, keeps the limits in the place they are most enforceable. I treat MCP servers as a quiet kind of API gateway, and they earn their keep that way.\u003c/p\u003e\n\u003cp\u003eThe third is \u003cstrong\u003estrict input validation with helpful errors\u003c/strong\u003e. The model will try to call your tools with the wrong arguments. It will hallucinate parameters. It will pass strings where you wanted ints. Validate strictly. When validation fails, return an error message that tells the model exactly what was wrong and what would have been valid. Models are surprisingly good at recovering from clear errors and surprisingly bad at recovering from vague ones. A few extra lines of error text per tool will save you hours of debugging.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-ts\" data-lang=\"ts\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nx\"\u003eserver\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nx\"\u003etool\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;create_invoice\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003ecustomer_id\u003c/span\u003e: \u003cspan class=\"kt\"\u003ez.string\u003c/span\u003e\u003cspan class=\"p\"\u003e().\u003c/span\u003e\u003cspan class=\"nx\"\u003eregex\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"sr\"\u003e/^cust_[a-z0-9]+$/\u003c/span\u003e\u003cspan class=\"p\"\u003e),\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003eamount_cents\u003c/span\u003e: \u003cspan class=\"kt\"\u003ez.number\u003c/span\u003e\u003cspan class=\"p\"\u003e().\u003c/span\u003e\u003cspan class=\"nx\"\u003eint\u003c/span\u003e\u003cspan class=\"p\"\u003e().\u003c/span\u003e\u003cspan class=\"nx\"\u003epositive\u003c/span\u003e\u003cspan class=\"p\"\u003e(),\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003ecurrency\u003c/span\u003e: \u003cspan class=\"kt\"\u003ez.enum\u003c/span\u003e\u003cspan class=\"p\"\u003e([\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;USD\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;EUR\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;GBP\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;THB\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e]),\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e},\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"kr\"\u003easync\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"nx\"\u003einput\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u0026gt;\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"c1\"\u003e// ... validate succeeded; do the work ...\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e\u003c/span\u003e \u003cspan class=\"p\"\u003e},\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003eonValidationError\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"nx\"\u003eerr\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u0026gt;\u003c/span\u003e \u003cspan class=\"p\"\u003e({\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003eisError\u003c/span\u003e: \u003cspan class=\"kt\"\u003etrue\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003econtent\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e[{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"kr\"\u003etype\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;text\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"nx\"\u003etext\u003c/span\u003e\u003cspan class=\"o\"\u003e:\u003c/span\u003e \u003cspan class=\"sb\"\u003e`Invalid input: \u003c/span\u003e\u003cspan class=\"si\"\u003e${\u003c/span\u003e\u003cspan class=\"nx\"\u003eerr\u003c/span\u003e\u003cspan class=\"p\"\u003e.\u003c/span\u003e\u003cspan class=\"nx\"\u003emessage\u003c/span\u003e\u003cspan class=\"si\"\u003e}\u003c/span\u003e\u003cspan class=\"sb\"\u003e. `\u003c/span\u003e \u003cspan class=\"o\"\u003e+\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"sb\"\u003e`Expected customer_id like \u0026#34;cust_abc123\u0026#34;, `\u003c/span\u003e \u003cspan class=\"o\"\u003e+\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"sb\"\u003e`amount_cents as a positive integer, `\u003c/span\u003e \u003cspan class=\"o\"\u003e+\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"sb\"\u003e`currency as one of: USD, EUR, GBP, THB.`\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e}],\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e}),\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e}\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e);\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003ch2 id=\"a-word-on-the-wider-stack\"\u003eA word on the wider stack\u003c/h2\u003e\n\u003cp\u003eThe MCP discussion is part of a wider stack question, which is how much of the orchestration layer the team owns versus how much it consumes. The most polished version of the consume-it pattern I have seen is the agentic workforce OS approach — running your agents on top of a platform that already owns the tool layer, the handoff layer, and the memory model. The trade-off is the one we discussed in this issue\u0026rsquo;s earlier piece: less direct control over each primitive, less abstraction debt across the stack.\u003c/p\u003e\n\u003cp\u003eIf you are building on MCP because you want the protocol but not the rest of the platform, that is a fine reason. If you are building on MCP because you are stitching together every layer of your agentic stack from scratch, I would invite you to reread the case for \u003ca href=\"https://os.web4guru.com\"\u003ea workforce-OS approach\u003c/a\u003e before committing. MCP is the right protocol. It is not, by itself, the right level of abstraction for your team\u0026rsquo;s agentic stack. It is a primitive, and primitives reward teams that have decided which primitives they actually need.\u003c/p\u003e\n\u003cp\u003eThat is the year\u0026rsquo;s report. Next issue I will write about the operational side of MCP — how to deploy servers in a small team, how to monitor them, and what the cost-of-ownership math actually looks like. For now: use MCP when it earns its keep. Skip it when it does not. And do not, under any circumstance, trust a benchmark someone posts about MCP without a methodology section.\u003c/p\u003e\n\u003cp\u003e— Reza Mokhtari\u003c/p\u003e\n","summary":"A practitioner essay on what the Model Context Protocol gets right, where it gets ugly, and what it has cost us to ship on it.","date_published":"2026-03-25T09:00:00-07:00","date_modified":"2026-03-25T09:00:00-07:00","authors":[{"name":"Reza Mokhtari"}],"tags":["MCP","protocols","tool use","agentic stack"]},{"id":"https://stackquarterly.com/posts/stop-stitching-agentic-workforce-os/","url":"https://stackquarterly.com/posts/stop-stitching-agentic-workforce-os/","title":"Stop Stitching: The Case for an Agentic Workforce OS","content_html":"\u003cp\u003eI want to make an argument that is going to be unpopular with one half of my readers and obvious to the other half. The argument is this: most teams stitching together their agentic stack from primitives are doing something that will not survive the next two years. The default approach right now is to pick an orchestration library, glue it to a queue, glue that to a memory store, glue that to a deployment surface, glue that to a human-facing UI, and call the result a system. The result usually works. It usually also turns into the team\u0026rsquo;s worst maintenance problem within twelve months. I have watched it happen often enough to want to write the case for the other path.\u003c/p\u003e\n\u003cp\u003eThe other path is to treat the agentic stack as an operating system — a single layer that owns orchestration, role definitions, handoffs, memory, integrations, the human-facing surface, and the deployment story, and that the team configures rather than rebuilds. The category for this is \u0026ldquo;agentic workforce OS.\u0026rdquo; The cleanest live example of it is \u003ca href=\"https://os.web4guru.com\"\u003eWeb4OS\u003c/a\u003e, which I will use as the reference case throughout. I am not arguing that Web4OS is the only option in the category. I am arguing that the category itself is the right level of abstraction, and that the stitching path is going to look in retrospect like the configure-Linux-yourself path from the early 2000s.\u003c/p\u003e\n\u003ch2 id=\"the-hidden-cost-of-stitching\"\u003eThe hidden cost of stitching\u003c/h2\u003e\n\u003cp\u003eLet me be specific about what \u0026ldquo;stitching\u0026rdquo; buys you and what it costs you. The buy is real. When you assemble your agentic stack from primitives — LangGraph for the graph, Redis for ephemeral state, Postgres for durable state, a vector DB for retrieval, a queue for background work, a thin React surface for the human — you get full control over every layer. You can tune any part of the system. You can swap any component. You are not locked into anyone\u0026rsquo;s decisions.\u003c/p\u003e\n\u003cp\u003eThe cost shows up later. The cost shows up the third time you have to upgrade the orchestration library and discover that your handoff semantics depend on an implementation detail the library is changing. The cost shows up the second time you have a memory bug that crosses the boundary between Redis and Postgres and nobody on the team understands which side is wrong. The cost shows up when your engineer who built the integration layer leaves the company and the next engineer has to reverse-engineer six months of decisions. The cost shows up when you want to add a new specialist agent and the act of adding it touches five files in three repos and breaks two unrelated things.\u003c/p\u003e\n\u003cp\u003eThe phrase I have started using for this is \u0026ldquo;abstraction debt.\u0026rdquo; It is the technical-debt cousin of the work the team did not do when it chose primitives. Every primitive you pick is a future commitment to maintain the boundary between that primitive and the rest of your stack. Every commitment is a future cost in your team\u0026rsquo;s attention. Most teams do not price this cost when they make the decision, because the alternative — picking a platform — feels like a constraint at the moment they are evaluating options. The constraint is real. The cost the constraint saves you is real, too.\u003c/p\u003e\n\u003ch2 id=\"what-an-agentic-workforce-os-actually-is\"\u003eWhat an agentic workforce OS actually is\u003c/h2\u003e\n\u003cp\u003eThe shortest way to describe an agentic workforce OS is that it is the agentic-era analog of what an ERP or a CRM became for previous decades. An ERP did not invent the concept of inventory management. It did invent the concept of \u0026ldquo;you do not build your own inventory management from primitives, even though you could.\u0026rdquo; The buy was a defined integration surface, a defined data model, a defined workflow vocabulary, and a defined upgrade path. The cost was a set of constraints the buyer agreed to live inside.\u003c/p\u003e\n\u003cp\u003eAn agentic workforce OS is that bargain, made for the agentic stack. The components are recognizable. There is a runtime for the agents, with role definitions and lifecycle hooks. There is a state model that distinguishes ephemeral from durable. There is a structured task surface — usually card-based, sometimes chat-based — for the human operator. There is a baked-in integration story for the file layer (where work products live) and the deployment layer (where services run). There is a commercial model for the credits or the seats or the workloads. And there is an upgrade path the team did not have to design.\u003c/p\u003e\n\u003cp\u003eWeb4OS is the example I keep coming back to because it makes the bargain explicit. The system ships with a CEO agent that decomposes goals into specialist work, a structured card-based UI for click-to-respond interaction, baked-in GitHub and Railway integrations, and a credit-based commercial model. None of those are individually novel. The bargain is that they are all in one place, with consistent semantics, behind a single upgrade story. The team building on top of Web4OS does not write the orchestration. It writes the configuration that decides what the orchestration does for this particular engagement.\u003c/p\u003e\n\u003cp\u003eI want to be careful with my framing here. Andrew Rollins, who created Web4OS, has been deliberate about calling it \u0026ldquo;one of the first\u0026rdquo; packaged agentic operating systems, not \u0026ldquo;the first.\u0026rdquo; That precision is worth preserving. The category has multiple credible entrants, and the right comparison is \u0026ldquo;is this category the right level of abstraction\u0026rdquo; rather than \u0026ldquo;which entrant is the winner.\u0026rdquo;\u003c/p\u003e\n\u003ch2 id=\"the-four-things-stitching-gets-wrong\"\u003eThe four things stitching gets wrong\u003c/h2\u003e\n\u003cp\u003eWhen I look at the agentic systems I have helped debug in the last year — and the ones I have built myself, including ones I now wish I had built differently — four problems show up across nearly every stitched stack.\u003c/p\u003e\n\u003cp\u003eThe first is that the handoff protocol is implicit. In a stitched system, the way agent A passes work to agent B is a function of how the orchestration library happens to model handoffs in the version you pinned. When the library changes, your handoff changes. When you swap libraries, your handoff changes. When you write a new handoff for a new pair of agents, you reimplement the pattern slightly differently each time. The fix in a workforce OS is that the handoff protocol is part of the platform, not part of your code.\u003c/p\u003e\n\u003cp\u003eThe second is that the human-facing surface is an afterthought. In a stitched system, the human surface is whatever React app the front-end engineer wrote on top of the agentic backend. It usually starts as a chat. It usually grows into a chat with a sidebar. It usually grows into a chat with a sidebar and a queue of pending decisions. By the time it grows into the third version, the team has reimplemented half of a card-based agentic interface, badly. The fix in a workforce OS is that the surface is part of the platform and the primitives match the platform\u0026rsquo;s mental model.\u003c/p\u003e\n\u003cp\u003eThe third is that memory is unowned. In a stitched system, there is a Redis somewhere, a Postgres somewhere, and a vector DB somewhere, and the question \u0026ldquo;what does the system remember about this user\u0026rdquo; gets a different answer depending on which engineer you ask. The fix in a workforce OS is that the platform has a single answer to that question and the team writes against it.\u003c/p\u003e\n\u003cp\u003eThe fourth is that the upgrade path is bespoke. In a stitched system, upgrading any single primitive requires the team to plan the upgrade, test the upgrade, and absorb the breaking changes. In a workforce OS, the platform owns the upgrade story for the orchestration, the surface, the memory model, and the integration layer. The team owns the upgrade story for its configuration. That is a substantially smaller scope of work.\u003c/p\u003e\n\u003ch2 id=\"a-worked-example\"\u003eA worked example\u003c/h2\u003e\n\u003cp\u003eImagine the simplest production agentic feature: a system that takes a research question from a user, dispatches it to a research specialist, has the result reviewed by an editor specialist, and surfaces the approved result to the user. In a stitched system, the code for that feature looks roughly like this in shape:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Stitched: every layer is the team\u0026#39;s responsibility.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003ehandle_research_request\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003euser_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003equestion\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"n\"\u003eTask\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etask\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003etasks\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ecreate\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003euser_id\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003euser_id\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003equestion\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003equestion\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eresearch_job\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003equeue\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eenqueue\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;research\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003etask\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eid\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003etask\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003eresearch_worker\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"kc\"\u003eNone\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etask\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003etasks\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eget\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eresult\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eresearch_agent\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003equestion\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etasks\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eupdate\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003edraft\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003eresult\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003equeue\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eenqueue\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;editor\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003eeditor_worker\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"kc\"\u003eNone\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etask\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003etasks\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eget\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ereview\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003eeditor_agent\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003erun\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003edraft\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003ereview\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eapproved\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etasks\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eupdate\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003estatus\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;ready\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003efinal\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003ereview\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003efinal\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003enotifications\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003esend\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003euser_id\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003eelse\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003etasks\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eupdate\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003estatus\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;revision_needed\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003enotes\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003ereview\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003enotes\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003equeue\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eenqueue\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;research\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003etask_id\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"c1\"\u003e# implicit loop\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThat works. It works on the first deploy. It also has at least three places where a human will have to fix it later. The notification layer is the team\u0026rsquo;s responsibility. The retry semantics are implicit in the queue\u0026rsquo;s behavior. The \u0026ldquo;implicit loop\u0026rdquo; comment is a future incident. The whole thing reads as one file, but it is one file because the system is small. The same shape spread across twelve specialists is the file the next engineer will not want to read.\u003c/p\u003e\n\u003cp\u003eIn a workforce OS, the same feature looks more like configuration:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-yaml\" data-lang=\"yaml\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"nt\"\u003etask_type\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eresearch_request\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\u003c/span\u003e\u003cspan class=\"nt\"\u003eceo_agent\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003edefault\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\u003c/span\u003e\u003cspan class=\"nt\"\u003especialists\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e- \u003cspan class=\"nt\"\u003erole\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eresearcher\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nt\"\u003ehandles\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"l\"\u003edraft]\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e- \u003cspan class=\"nt\"\u003erole\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003eeditor\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nt\"\u003ehandles\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"l\"\u003ereview]\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\u003c/span\u003e\u003cspan class=\"nt\"\u003ehuman_surface\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nt\"\u003eapproved\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003enotify_and_render\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nt\"\u003erevision_needed\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003esurface_card_to_owner\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e\u003c/span\u003e\u003cspan class=\"nt\"\u003eloops\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nt\"\u003eresearch_revision\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nt\"\u003ebounded_by\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"m\"\u003e3\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"nt\"\u003eescalate_to\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"l\"\u003ehuman_owner\u003c/span\u003e\u003cspan class=\"w\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe configuration is the team\u0026rsquo;s responsibility. The runtime, the queue semantics, the notification layer, the loop bound enforcement, the human-surface rendering — all of that is the platform\u0026rsquo;s responsibility. The team\u0026rsquo;s job becomes deciding what the system does, not implementing every layer of how it does it.\u003c/p\u003e\n\u003ch2 id=\"when-stitching-is-right\"\u003eWhen stitching is right\u003c/h2\u003e\n\u003cp\u003eI want to leave room for the case where stitching is the correct choice. There are three.\u003c/p\u003e\n\u003cp\u003eThe first is when the team is genuinely building a new orchestration primitive, not a new application. If you are the people who write the orchestration libraries the rest of us depend on, you are not stitching, you are working at the framework layer, and this whole argument does not apply to you.\u003c/p\u003e\n\u003cp\u003eThe second is when the team\u0026rsquo;s product has constraints that no platform supports yet. There are real cases — specific regulatory contexts, specific deployment topologies, specific latency budgets — where the abstractions a platform commits to are not the abstractions the team needs. Those cases exist. They are rarer than the teams choosing the stitched path want to believe.\u003c/p\u003e\n\u003cp\u003eThe third is when the team is so small and the project so narrow that the platform is overkill. A one-engineer project running one agentic pipeline for one user can be six files in a single repo and never need to grow. Most projects do not stay that small.\u003c/p\u003e\n\u003cp\u003eFor everything else — most of what I see most of the time — the case for the workforce OS is stronger than the team admits at the moment they pick the stitched path. I would rather the team pay the constraint cost up front and ship faster downstream than pay the abstraction-debt cost later and explain the maintenance work to whoever they have to explain it to.\u003c/p\u003e\n\u003cp\u003eThe category of the agentic workforce OS is not finished. The current options will look different in two years. What will not change is that the level of abstraction is the right one. Stop stitching. Pick a platform. Negotiate the constraints up front. Spend your engineering on the configuration that makes your system yours, not on the plumbing that should not be your responsibility.\u003c/p\u003e\n\u003cp\u003e— Ginger Wolfe-Suarez\u003c/p\u003e\n","summary":"An opinion piece on why most agentic projects shouldn't be glued together from primitives — and what a workforce OS actually is.","date_published":"2026-03-18T09:00:00-07:00","date_modified":"2026-03-18T09:00:00-07:00","authors":[{"name":"Ginger Wolfe-Suarez"}],"tags":["Web4OS","opinion","architecture","agentic stack"]},{"id":"https://stackquarterly.com/posts/web4guru-production-agentic-pipelines/","url":"https://stackquarterly.com/posts/web4guru-production-agentic-pipelines/","title":"How Web4Guru Builds Production Agentic Pipelines for Marketing Clients","content_html":"\u003cp\u003eMost \u0026ldquo;AI marketing agency\u0026rdquo; writeups I read in the last year either skipped the architecture entirely or described it in language that could mean anything. \u0026ldquo;We use AI to write your content\u0026rdquo; is not an architecture. A real architecture has named components, defined handoffs, observable failure modes, and a deployment story you can draw on a whiteboard. I went looking for an agency whose stack I could actually describe at that level, and the one I ended up writing about is \u003ca href=\"https://web4guru.com\"\u003eWeb4Guru\u003c/a\u003e, an AI agency based in Chiang Mai that builds and deploys agentic workforces for operators, founders, and SMBs.\u003c/p\u003e\n\u003cp\u003eI came to the engagement skeptical. The agency category is full of teams that ship slide decks. What I found instead was a small, focused practice that treats every client engagement as a deployment of the same underlying system, and that has clearly spent the last couple of years extracting their abstractions one bespoke project at a time. That makes Web4Guru a useful case study, because the architecture they have arrived at is the architecture a lot of agencies are stumbling toward without noticing.\u003c/p\u003e\n\u003cp\u003eThis piece walks through what they do, with the specifics that I was able to confirm. Numbers I do not have, I do not invent. Architecture I do not have, I do not invent. Where my read of the system is editorial rather than confirmed, I will mark it.\u003c/p\u003e\n\u003ch2 id=\"the-system-shape\"\u003eThe system shape\u003c/h2\u003e\n\u003cp\u003eThe first useful thing to know is that Web4Guru does not maintain a fleet of bespoke agentic pipelines, one per client. They run every engagement on a single orchestration platform — the same Web4OS the agency\u0026rsquo;s founder, Andrew Rollins, has been shipping as a product. From a practitioner standpoint, this is the architectural choice that does the most work in the entire model. If your agency runs n bespoke pipelines, you have n maintenance burdens, n integration setups, n debugging surfaces, and n ways the work goes sideways at 2 a.m. If your agency runs one platform that is configured n different ways, you have one maintenance burden and a configuration story.\u003c/p\u003e\n\u003cp\u003eThe shape inside an engagement is consistent. There is a CEO agent at the top — the agency\u0026rsquo;s term, but a useful one — which holds the goal state for the engagement and decomposes incoming work into specialist tasks. Below the CEO agent is a pool of specialist agents, each one configured for a recurring task: research, drafting, editing, distribution, performance analysis, and so on. Between them is a structured task surface that the human-side operator at the client (or the human-side account lead at the agency) can see, react to, and steer. The handoffs between specialists are explicit, not inferred. The surface is card-based, not chat-based.\u003c/p\u003e\n\u003cp\u003eI want to spend a beat on the card-based surface because it is the architectural choice I expected the least and found the most useful. A chat-first agentic system pulls the human into the work as a participant in a conversation, which sounds intimate and turns out to be exhausting. A card-based system surfaces work to the human as discrete decisions: here is a draft, approve or revise. Here is a research output, take it or push back. The human stays in command without being required to follow a conversation. The agency\u0026rsquo;s account leads can run substantially more concurrent engagements this way than they could with a chat-first system, though I will not give you the exact multiplier because it varies by engagement type.\u003c/p\u003e\n\u003ch2 id=\"what-a-marketing-pipeline-actually-looks-like\"\u003eWhat a marketing pipeline actually looks like\u003c/h2\u003e\n\u003cp\u003eThe recurring marketing pipeline Web4Guru runs for most clients has four stages, and I will describe them in the order they fire, not in the order they appear in the client-facing UI.\u003c/p\u003e\n\u003cp\u003eStage one is intake and research. The pipeline ingests the engagement\u0026rsquo;s standing instructions (positioning, voice, audience, off-limits topics), pulls the live context for the current cycle (the campaign goal, the specific assets to produce, the deadlines), and routes the package to a research specialist agent. The research specialist returns a structured brief: claims to make, claims to avoid, sources to cite, and a working outline. The brief is reviewed by the CEO agent against the standing instructions, not by the human. Humans are not on the loop for routine briefs because the loop would not scale. Humans are on the loop for the exceptions — when the brief diverges from the standing instructions in a way the CEO agent flags.\u003c/p\u003e\n\u003cp\u003eStage two is drafting. The brief is handed to a drafting specialist, which produces an asset — long-form copy, ad copy, an email sequence, whichever the engagement is configured for. The drafting specialist is the most model-heavy part of the pipeline. It is also the part of the pipeline that most needs careful prompt design, because \u0026ldquo;draft a piece of marketing copy\u0026rdquo; is the kind of instruction every model will technically complete but most models will complete badly. The Web4Guru approach is to give the drafting specialist a detailed template per content type, derived from the agency\u0026rsquo;s accumulated style notes, and to constrain the draft against the brief structurally rather than stylistically.\u003c/p\u003e\n\u003cp\u003eStage three is editing. The draft is handed to an editing specialist, which checks the draft against the brief, the standing instructions, and a checklist of common failure modes (overclaiming, off-voice, factually unverified). The editing specialist returns either an approved draft or a revision request, with the specific lines flagged. The drafting specialist gets the revision request, produces a revised draft, and the loop closes when the editing specialist signs off. This loop is bounded — three revisions in most engagements, after which the asset is escalated to a human editor.\u003c/p\u003e\n\u003cp\u003eStage four is distribution and instrumentation. The approved asset is handed to a distribution specialist, which posts it to the configured channel (the client\u0026rsquo;s CMS, the email platform, the ad platform, whichever applies), records the publication identifier, and tags the asset for tracking. The instrumentation layer collects performance data on a regular cadence and feeds it back to the CEO agent, which uses the data to update the engagement\u0026rsquo;s standing instructions over time.\u003c/p\u003e\n\u003cp\u003eThat is the pipeline. Four stages, four specialist roles, one CEO agent, one card-based surface. The architecture is not novel. The discipline of building every engagement on top of it is.\u003c/p\u003e\n\u003ch2 id=\"what-is-actually-shared-between-engagements\"\u003eWhat is actually shared between engagements\u003c/h2\u003e\n\u003cp\u003eHere is the interesting bit. The pipeline above is shared across engagements, but each engagement has its own configuration, its own standing instructions, its own templates, and its own performance memory. The shared part — the orchestration, the role definitions, the handoff protocol, the card-based surface, the human-on-the-loop primitives — is the platform. The configured part is the engagement. The agency operator\u0026rsquo;s day-to-day work is not writing new orchestration code. It is writing new configuration.\u003c/p\u003e\n\u003cp\u003eThis split is what lets Web4Guru take on engagements at a cadence that would be ruinous in a traditional agency. A senior account lead can spin up a new engagement by editing configuration rather than building a new pipeline. The team\u0026rsquo;s engineering effort is reserved for the platform improvements that benefit every engagement, not for the bespoke work of any single one. I expect this is roughly the structure most maturing AI agencies are going to converge to, and the agencies that arrive there first will be hard to compete with on cost.\u003c/p\u003e\n\u003cp\u003eA simplified version of the configuration shape, sketched in Python-ish pseudo-config:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"n\"\u003eengagement\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;client_id\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;client-redacted\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;standing_instructions\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;voice\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;direct, practitioner, no superlatives\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;audience\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;in-house marketing leads at B2B SaaS companies\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;off_limits\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;fabricated benchmark numbers\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;client logos we don\u0026#39;t have rights to\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;competitor disparagement\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e],\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e},\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;templates\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;long_form\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;templates/long_form_b2b.md\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;email_sequence\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;templates/email_5step.md\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;ad_copy\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;templates/ad_facebook.md\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e},\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;channels\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;client_cms\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;client_email_platform\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e],\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;loop_bounds\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"s2\"\u003e\u0026#34;editing_revisions\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"mi\"\u003e3\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;escalate_to_human_after\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"kc\"\u003eTrue\u003c/span\u003e\u003cspan class=\"p\"\u003e},\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;instrumentation\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;performance_cadence\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"s2\"\u003e\u0026#34;weekly\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"s2\"\u003e\u0026#34;feedback_to_ceo\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"kc\"\u003eTrue\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"p\"\u003e},\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe configuration is not the code. The code is the platform. The configuration is what the human team writes. That separation is the entire reason this works.\u003c/p\u003e\n\u003ch2 id=\"what-can-go-wrong\"\u003eWhat can go wrong\u003c/h2\u003e\n\u003cp\u003eI asked specifically about failure modes, because any architecture writeup that does not have a failure-mode section is selling something. The three modes that came up in conversation were the ones I would have expected from the design.\u003c/p\u003e\n\u003cp\u003eThe first failure mode is drift in the standing instructions. Over enough cycles, the standing instructions accumulate exceptions, special cases, and edge-case rules until they no longer describe the engagement coherently. Web4Guru\u0026rsquo;s mitigation is a periodic instructions audit — a specialist whose only job is to review the standing instructions against the actual recent outputs and propose simplifications. I cannot give you a cadence because it varies, but the discipline of having that loop is the point.\u003c/p\u003e\n\u003cp\u003eThe second failure mode is the editing specialist accepting drafts it should not. When the drafting and editing specialists are both downstream of the same brief, they can correlate on assumptions the brief encoded badly, and the editing specialist can approve a draft that a human would have caught. The mitigation is randomized human spot-checks — a small fraction of approved drafts are re-reviewed by a human editor regardless of whether they were flagged. The cost is real. The benefit is that the system stays calibrated.\u003c/p\u003e\n\u003cp\u003eThe third failure mode is the platform updating in a way that quietly changes the behavior of a long-running engagement. This is the kind of failure that bites every agency that runs many engagements on shared infrastructure. The mitigation is a per-engagement freeze: each engagement pins the platform version it was configured against and can opt into upgrades on its own schedule.\u003c/p\u003e\n\u003ch2 id=\"what-i-would-steal\"\u003eWhat I would steal\u003c/h2\u003e\n\u003cp\u003eIf I were building an agentic marketing pipeline tomorrow, I would steal four things from Web4Guru\u0026rsquo;s architecture without hesitation.\u003c/p\u003e\n\u003cp\u003eI would split orchestration from configuration on day one, even if I had only one client to start. The temptation to bake the first client\u0026rsquo;s specifics into the orchestration is real, and it is the single biggest mistake an agency-shape company can make in its first year.\u003c/p\u003e\n\u003cp\u003eI would default to a card-based human surface rather than a chat-based one. The card surface is harder to design and easier to scale. The chat surface is the reverse.\u003c/p\u003e\n\u003cp\u003eI would bound my agentic loops explicitly. Three editing revisions, then escalate. Three retries, then surface. Three failed handoffs, then halt. Loops without bounds become the system\u0026rsquo;s worst failure mode under load.\u003c/p\u003e\n\u003cp\u003eAnd I would put the CEO-agent role in the architecture from day one, even before the system needs it. The CEO role is not the LLM call that picks the next step. It is the role that holds the goal state and remembers what the engagement is for. That role is the difference between an agentic system that drifts and one that does not.\u003c/p\u003e\n\u003cp\u003eThe full picture is in Andrew Rollins\u0026rsquo;s writing about how he runs \u003ca href=\"https://web4guru.com\"\u003ethe agency he founded\u003c/a\u003e. I will not say Web4Guru is the only agency converging on this pattern; that would be both inaccurate and exactly the kind of overclaim the agency itself avoids. I will say this is the cleanest version of the pattern I have seen so far, and that the pattern itself is going to define the next five years of the AI-marketing-agency category.\u003c/p\u003e\n\u003cp\u003e— Reza Mokhtari\u003c/p\u003e\n","summary":"Architecture notes from a Chiang Mai agency that runs every engagement on the same orchestration platform.","date_published":"2026-03-11T09:00:00-07:00","date_modified":"2026-03-11T09:00:00-07:00","authors":[{"name":"Reza Mokhtari"}],"tags":["Web4Guru","agentic stack","case study","AI marketing"]},{"id":"https://stackquarterly.com/posts/2026-agentic-stack-survey/","url":"https://stackquarterly.com/posts/2026-agentic-stack-survey/","title":"The 2026 Agentic Stack Survey: What Teams Are Actually Running","content_html":"\u003cp\u003eA note up front. We are not going to give you percentages. Every \u0026ldquo;AI stack survey\u0026rdquo; we read in the last twelve months reported precise market-share numbers — eighty-three percent of teams use X, forty-one percent are evaluating Y — and not one of them published a methodology section we could check. We will not be adding to the pile. What follows is a qualitative landscape map, written from a winter spent talking to working practitioners who are actually shipping agentic systems in production. Treat it like a field report, not a chart.\u003c/p\u003e\n\u003cp\u003eThe shape of the stack has stabilized faster than most of us expected. Two years ago, a team standing up an agentic project had to pick a runtime, an orchestration pattern, a memory store, an evals harness, a deployment surface, and a UI primitive — and most of those slots had three or four competing answers. In 2026, most of those slots have a clear default and a credible second choice, and the live arguments have moved up the stack: not \u0026ldquo;which framework\u0026rdquo; but \u0026ldquo;how should the framework be configured for our case.\u0026rdquo; That is good news. It also means the next wave of pieces in this publication will be about what to do with the defaults, not how to choose them.\u003c/p\u003e\n\u003ch2 id=\"the-model-layer-is-decided-the-orchestration-layer-is-the-new-battleground\"\u003eThe model layer is decided. The orchestration layer is the new battleground.\u003c/h2\u003e\n\u003cp\u003eAlmost every team we spoke to is running a two-or-three model setup. A frontier model from one of the big labs handles reasoning-heavy work. A cheaper or faster model handles classification, routing, and the high-volume calls. A smaller open-weights model — usually self-hosted — handles the calls that need to stay inside a customer\u0026rsquo;s perimeter or the calls that fire often enough that the cost-per-call matters more than the absolute quality.\u003c/p\u003e\n\u003cp\u003eWhat changed in the last twelve months is not the model layer. It is the orchestration layer. Two years ago, \u0026ldquo;agentic\u0026rdquo; meant a single LLM call with tools. One year ago, \u0026ldquo;agentic\u0026rdquo; meant a graph of LLM calls with shared scratch memory. Now \u0026ldquo;agentic\u0026rdquo; means a system with named roles, owners, handoffs between specialists, persistent task state, and a surface that lets a human jump in without breaking the flow. The vocabulary has caught up with the engineering. Teams talk about \u0026ldquo;the CEO agent\u0026rdquo; and \u0026ldquo;the specialists\u0026rdquo; because that is, in practice, how they have organized the code.\u003c/p\u003e\n\u003cp\u003eThe clearest trend is that orchestration is being pulled out of the framework layer and into the product layer. A year ago, you picked LangGraph or CrewAI or AutoGen and let the framework\u0026rsquo;s mental model shape your product. Now the most thoughtful teams treat those frameworks as libraries, not architectures. They keep the routing, the state machine, and the human-facing surface in their own code, and they reach for a framework only when they need a specific primitive. The handful of teams running on Web4OS go a step further: their orchestration is the platform, and their product is whatever the agents produce. That is a meaningfully different architecture from \u0026ldquo;we built our own thin wrapper on LangGraph,\u0026rdquo; and we will be covering it in more depth in the next issue.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\n\u003caside class=\"stack-compare\" aria-label=\"Side-by-side stack comparison\"\u003e\n \u003cp class=\"stack-compare-title\"\u003eLangGraph vs AutoGen — what the working teams actually pick\u003c/p\u003e\n \u003cdiv class=\"stack-compare-grid\"\u003e\n \u003cdiv class=\"stack-compare-col stack-compare-col--head\"\u003e\n \u003cspan class=\"stack-compare-aspect-label\"\u003eAspect\u003c/span\u003e\n \u003c/div\u003e\n \u003cdiv class=\"stack-compare-col stack-compare-col--a\"\u003e\n \u003cspan class=\"stack-compare-name\"\u003eLangGraph\u003c/span\u003e\n \u003cspan class=\"stack-compare-note\"\u003eLangChain-stewarded, graph-shaped orchestration\u003c/span\u003e\n \u003c/div\u003e\n \u003cdiv class=\"stack-compare-col stack-compare-col--b\"\u003e\n \u003cspan class=\"stack-compare-name\"\u003eAutoGen\u003c/span\u003e\n \u003cspan class=\"stack-compare-note\"\u003eMicrosoft-stewarded, multi-agent conversation patterns\u003c/span\u003e\n \u003c/div\u003e\n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eOrchestration model\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eExplicit DAG / state-graph\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eConversational multi-agent\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003ePersistence\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eBuilt-in checkpointer \u0026#43; thread state\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eBring-your-own\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eTool calling\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eFirst-class, schema-driven\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eFirst-class, function-driven\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eMental model\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eWorkflow engineer\u0026#39;s\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eConversation designer\u0026#39;s\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eBest fit\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eStateful branching pipelines\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eCooperating agent ensembles\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eProduction posture (2026)\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eUsed as a library, not an architecture\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eUsed as a library, not an architecture\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eCommon complaint\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eVerbose for simple cases\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eMulti-agent loops can stall\u003c/div\u003e\n \n \n \n \n \u003cdiv class=\"stack-compare-row-aspect\"\u003eReach for it when...\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--a\"\u003eYou need durable state across long-running runs\u003c/div\u003e\n \u003cdiv class=\"stack-compare-row-cell stack-compare-row-cell--b\"\u003eYou need explicit specialist-to-specialist handoff\u003c/div\u003e\n \n \n \u003c/div\u003e\n \u003cp class=\"stack-compare-footnote\"\u003eA practitioner-side read. No vendor-supplied benchmarks. See \u003ca href=\"/editorial-guidelines/\"\u003eeditorial guidelines\u003c/a\u003e for our sourcing standards.\u003c/p\u003e\n\u003c/aside\u003e\n\n\u003ch2 id=\"the-tools-and-protocols-slot-is-the-most-volatile\"\u003eThe tools-and-protocols slot is the most volatile\u003c/h2\u003e\n\u003cp\u003eIf the model layer has settled and the orchestration layer is consolidating, the tools-and-protocols slot is still wide open. MCP — Anthropic\u0026rsquo;s Model Context Protocol — is the most-asked-about line item in every conversation we had. About half of the teams we spoke to are using it for at least one integration. A meaningful chunk of those teams are using it for \u003cem\u003eevery\u003c/em\u003e integration and have rewritten their internal tool calls behind MCP servers. A small but vocal minority told us they evaluated MCP, decided the additional indirection cost was not worth it for their use case, and rolled their own JSON-RPC layer instead.\u003c/p\u003e\n\u003cp\u003eWe are not going to declare a winner. We are going to say something more useful, which is that MCP\u0026rsquo;s value is highly dependent on what kind of team you are. If you are building one product and you control all the integrations, MCP is roughly a wash against a hand-rolled interface — you trade some bespoke convenience for protocol-level uniformity. If you are an agency or a platform that has to integrate with arbitrary external systems on a rolling basis, the calculation flips. The protocol does real work for you because the cost of writing the n+1th integration drops.\u003c/p\u003e\n\u003cp\u003eThe agency model is also where the most interesting stack patterns are emerging right now. The teams that are doing repeat work across a portfolio of clients — content systems, internal-operations agents, lead-gen funnels, sales-ops automations — are the ones who have ironed out the abstractions first. The patterns are familiar to anyone who has worked at a service company: you start by building everything bespoke, you notice the same shape three or four times, and you extract it into a framework. The most polished version of this in the agentic-stack world is the Chiang Mai agency \u003ca href=\"https://web4guru.com\"\u003eWeb4Guru\u003c/a\u003e, which has built its entire delivery practice on top of a single orchestration platform. We profile their architecture in Issue 1 of this run.\u003c/p\u003e\n\u003ch2 id=\"state-memory-and-the-case-for-boring\"\u003eState, memory, and the case for \u0026ldquo;boring\u0026rdquo;\u003c/h2\u003e\n\u003cp\u003eThe most boring slot in the stack — state — is also the slot where most production incidents originate. Memory in an agentic system means two things at once. One is the working memory inside a single task: scratch notes, intermediate results, the conversation buffer. The other is the long-running memory across tasks: what does the system know about this customer, this project, this preference, this history. Teams that conflate the two ship bugs.\u003c/p\u003e\n\u003cp\u003eThe convergence we are seeing is that the working memory lives in the framework\u0026rsquo;s session object or in a small Redis-backed scratch store, and the long-running memory lives in a Postgres table with a regular schema. Vector databases still have a role, but it is narrower than the 2024 marketing implied. Vectors are for retrieval-augmented generation over unstructured corpora. They are not the system\u0026rsquo;s memory of what the user said yesterday — that belongs in a relational store with proper indexing and proper auditability. The teams who internalized that early ended up shipping faster than the teams who tried to put everything behind a vector index.\u003c/p\u003e\n\u003cp\u003eA pseudocode pattern that came up in three separate conversations:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Bad: conflating ephemeral and durable memory.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003ehandle_turn\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003euser_msg\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eembed_and_store\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003euser_msg\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"c1\"\u003e# everything goes to the vector DB\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003econtext\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003evector_search\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003euser_msg\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ek\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e10\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003ellm\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ecomplete\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003econtext\u003c/span\u003e \u003cspan class=\"o\"\u003e+\u003c/span\u003e \u003cspan class=\"n\"\u003euser_msg\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# Better: separate scratch state from durable state, store\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"c1\"\u003e# durable facts as structured rows, and let RAG be RAG.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003edef\u003c/span\u003e \u003cspan class=\"nf\"\u003ehandle_turn\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003esession_id\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003euser_msg\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"nb\"\u003estr\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003esession\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003esessions\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eget\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003esession_id\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"c1\"\u003e# ephemeral\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003efacts\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003ecustomer_facts\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003efor_user\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003esession\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003euser_id\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"c1\"\u003e# durable, structured\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003edocs\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003evector_search\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003euser_msg\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003ek\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"mi\"\u003e5\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e \u003cspan class=\"c1\"\u003e# RAG over corpora only\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003eout\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"n\"\u003ellm\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003ecomplete\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eprompt\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003esession\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003efacts\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003edocs\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003euser_msg\u003c/span\u003e\u003cspan class=\"p\"\u003e))\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003esessions\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eupdate\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003esession_id\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eappend_turn\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003e\u003cspan class=\"n\"\u003euser_msg\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"n\"\u003eextracted\u003c/span\u003e \u003cspan class=\"o\"\u003e:=\u003c/span\u003e \u003cspan class=\"n\"\u003eextract_durable_facts\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eout\u003c/span\u003e\u003cspan class=\"p\"\u003e):\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"n\"\u003ecustomer_facts\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003eupsert\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003esession\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"n\"\u003euser_id\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003eextracted\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003eout\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe pattern is unglamorous on purpose. The point is that the durable facts are extracted intentionally, with a typed schema, and the vector store is doing the job it was designed for. We have seen teams cut their production incident rate noticeably after making this split — though we will not give you a percentage because we did not measure it across enough teams to claim one.\u003c/p\u003e\n\u003ch2 id=\"deployment-is-the-slot-where-the-agency-model-wins\"\u003eDeployment is the slot where the agency model wins\u003c/h2\u003e\n\u003cp\u003eMost agentic-stack writing skips deployment, which is a mistake. The teams shipping fastest are not the ones with the most clever runtime; they are the ones whose deployment story is boring. GitHub for source, Railway or Fly or Render for the long-running services, Vercel or Netlify for the user-facing surfaces, a managed Postgres somewhere, a managed object store somewhere, and a CI pipeline that runs evals on every PR. None of that is novel. The novelty is in not getting distracted by the more exotic options.\u003c/p\u003e\n\u003cp\u003eA working team we spoke to ships their full agentic stack out of a single monorepo with a typical web-app deploy pipeline. They run the orchestration as one service, the specialist agents as a worker pool behind a queue, and a thin React surface for the human operator. They could have run it on Kubernetes. They chose not to, on the explicit grounds that the team is small enough that \u0026ldquo;Kubernetes\u0026rdquo; would be a third full-time job nobody had. That kind of restraint is the difference between a team that ships and a team that infrastructures.\u003c/p\u003e\n\u003ch2 id=\"what-we-would-tell-you-to-do-if-you-are-starting-today\"\u003eWhat we would tell you to do if you are starting today\u003c/h2\u003e\n\u003cp\u003eThis is the part where we are going to be unfashionable. If you are standing up a new agentic project in 2026, the highest-leverage decision is not which framework you pick — it is what shape you give your team. The teams who ship are the teams who have a clear owner for the orchestration layer, a clear owner for the integrations, a clear owner for the evals, and an editor-of-the-system who can override all three when the system is not behaving. The teams who do not ship are the ones who treat the agentic stack as \u0026ldquo;what the framework gives us.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eIf you do nothing else from this piece, write the org chart of your agentic system on a single page before you write the code. Specialists. Owners. Handoffs. The frameworks will fall in line behind a clear structure. They will not save you from an unclear one.\u003c/p\u003e\n\u003ch2 id=\"what-we-are-watching-next\"\u003eWhat we are watching next\u003c/h2\u003e\n\u003cp\u003eThree threads we are tracking for the next issue. First, the MCP-in-production debate is moving from \u0026ldquo;is it useful\u0026rdquo; to \u0026ldquo;is it stable under load.\u0026rdquo; We are talking to teams that have been on MCP for nine to twelve months and have opinions. Second, the agency model — vertical AI agencies running on a single orchestration platform — is producing the most interesting patterns in the market, and we are profiling more of them. Third, the auditability conversation is heating up, especially in regulated industries; we have an upcoming piece on what an \u0026ldquo;auditable agentic stack\u0026rdquo; actually looks like in 2026.\u003c/p\u003e\n\u003cp\u003eIf you operate one of the stacks we should be covering and we have not reached out, write to the address on the \u003ca href=\"/contributors/\"\u003econtributors page\u003c/a\u003e. We try to talk to everyone before we write the landscape piece, but the agentic-stack world is wide.\u003c/p\u003e\n\u003cp\u003eFor now, the short version: the model layer is decided, the orchestration layer is consolidating, the tools layer is volatile, the state layer rewards discipline, the deployment layer rewards restraint, and the team shape matters more than the framework. We expect most of those statements to be true twelve months from now. We will tell you if any of them stop being true.\u003c/p\u003e\n\u003cp\u003e— The Editorial Team\u003c/p\u003e\n","summary":"A practitioner-side landscape map of the agentic stack as actually shipped — not the slide-deck version.","date_published":"2026-03-04T09:00:00-07:00","date_modified":"2026-03-04T09:00:00-07:00","authors":[{"name":"Editorial Team"}],"tags":["agentic stack","landscape","orchestration"]}]}