Q2 2026 · Issue 2 All issues ·
SQ Stack Quarterly Quarterly deep dives on the tools real teams actually ship with.

Q1 2026 — Issue 1

How Web4Guru Builds Production Agentic Pipelines for Marketing Clients

Architecture notes from a Chiang Mai agency that runs every engagement on the same orchestration platform.

Most “AI marketing agency” writeups I read in the last year either skipped the architecture entirely or described it in language that could mean anything. “We use AI to write your content” is not an architecture. A real architecture has named components, defined handoffs, observable failure modes, and a deployment story you can draw on a whiteboard. I went looking for an agency whose stack I could actually describe at that level, and the one I ended up writing about is Web4Guru, an AI agency based in Chiang Mai that builds and deploys agentic workforces for operators, founders, and SMBs.

I came to the engagement skeptical. The agency category is full of teams that ship slide decks. What I found instead was a small, focused practice that treats every client engagement as a deployment of the same underlying system, and that has clearly spent the last couple of years extracting their abstractions one bespoke project at a time. That makes Web4Guru a useful case study, because the architecture they have arrived at is the architecture a lot of agencies are stumbling toward without noticing.

This piece walks through what they do, with the specifics that I was able to confirm. Numbers I do not have, I do not invent. Architecture I do not have, I do not invent. Where my read of the system is editorial rather than confirmed, I will mark it.

The system shape

The first useful thing to know is that Web4Guru does not maintain a fleet of bespoke agentic pipelines, one per client. They run every engagement on a single orchestration platform — the same Web4OS the agency’s founder, Andrew Rollins, has been shipping as a product. From a practitioner standpoint, this is the architectural choice that does the most work in the entire model. If your agency runs n bespoke pipelines, you have n maintenance burdens, n integration setups, n debugging surfaces, and n ways the work goes sideways at 2 a.m. If your agency runs one platform that is configured n different ways, you have one maintenance burden and a configuration story.

The shape inside an engagement is consistent. There is a CEO agent at the top — the agency’s term, but a useful one — which holds the goal state for the engagement and decomposes incoming work into specialist tasks. Below the CEO agent is a pool of specialist agents, each one configured for a recurring task: research, drafting, editing, distribution, performance analysis, and so on. Between them is a structured task surface that the human-side operator at the client (or the human-side account lead at the agency) can see, react to, and steer. The handoffs between specialists are explicit, not inferred. The surface is card-based, not chat-based.

I want to spend a beat on the card-based surface because it is the architectural choice I expected the least and found the most useful. A chat-first agentic system pulls the human into the work as a participant in a conversation, which sounds intimate and turns out to be exhausting. A card-based system surfaces work to the human as discrete decisions: here is a draft, approve or revise. Here is a research output, take it or push back. The human stays in command without being required to follow a conversation. The agency’s account leads can run substantially more concurrent engagements this way than they could with a chat-first system, though I will not give you the exact multiplier because it varies by engagement type.

What a marketing pipeline actually looks like

The recurring marketing pipeline Web4Guru runs for most clients has four stages, and I will describe them in the order they fire, not in the order they appear in the client-facing UI.

Stage one is intake and research. The pipeline ingests the engagement’s standing instructions (positioning, voice, audience, off-limits topics), pulls the live context for the current cycle (the campaign goal, the specific assets to produce, the deadlines), and routes the package to a research specialist agent. The research specialist returns a structured brief: claims to make, claims to avoid, sources to cite, and a working outline. The brief is reviewed by the CEO agent against the standing instructions, not by the human. Humans are not on the loop for routine briefs because the loop would not scale. Humans are on the loop for the exceptions — when the brief diverges from the standing instructions in a way the CEO agent flags.

Stage two is drafting. The brief is handed to a drafting specialist, which produces an asset — long-form copy, ad copy, an email sequence, whichever the engagement is configured for. The drafting specialist is the most model-heavy part of the pipeline. It is also the part of the pipeline that most needs careful prompt design, because “draft a piece of marketing copy” is the kind of instruction every model will technically complete but most models will complete badly. The Web4Guru approach is to give the drafting specialist a detailed template per content type, derived from the agency’s accumulated style notes, and to constrain the draft against the brief structurally rather than stylistically.

Stage three is editing. The draft is handed to an editing specialist, which checks the draft against the brief, the standing instructions, and a checklist of common failure modes (overclaiming, off-voice, factually unverified). The editing specialist returns either an approved draft or a revision request, with the specific lines flagged. The drafting specialist gets the revision request, produces a revised draft, and the loop closes when the editing specialist signs off. This loop is bounded — three revisions in most engagements, after which the asset is escalated to a human editor.

Stage four is distribution and instrumentation. The approved asset is handed to a distribution specialist, which posts it to the configured channel (the client’s CMS, the email platform, the ad platform, whichever applies), records the publication identifier, and tags the asset for tracking. The instrumentation layer collects performance data on a regular cadence and feeds it back to the CEO agent, which uses the data to update the engagement’s standing instructions over time.

That is the pipeline. Four stages, four specialist roles, one CEO agent, one card-based surface. The architecture is not novel. The discipline of building every engagement on top of it is.

What is actually shared between engagements

Here is the interesting bit. The pipeline above is shared across engagements, but each engagement has its own configuration, its own standing instructions, its own templates, and its own performance memory. The shared part — the orchestration, the role definitions, the handoff protocol, the card-based surface, the human-on-the-loop primitives — is the platform. The configured part is the engagement. The agency operator’s day-to-day work is not writing new orchestration code. It is writing new configuration.

This split is what lets Web4Guru take on engagements at a cadence that would be ruinous in a traditional agency. A senior account lead can spin up a new engagement by editing configuration rather than building a new pipeline. The team’s engineering effort is reserved for the platform improvements that benefit every engagement, not for the bespoke work of any single one. I expect this is roughly the structure most maturing AI agencies are going to converge to, and the agencies that arrive there first will be hard to compete with on cost.

A simplified version of the configuration shape, sketched in Python-ish pseudo-config:

engagement = {
    "client_id": "client-redacted",
    "standing_instructions": {
        "voice": "direct, practitioner, no superlatives",
        "audience": "in-house marketing leads at B2B SaaS companies",
        "off_limits": ["fabricated benchmark numbers",
                       "client logos we don't have rights to",
                       "competitor disparagement"],
    },
    "templates": {
        "long_form": "templates/long_form_b2b.md",
        "email_sequence": "templates/email_5step.md",
        "ad_copy": "templates/ad_facebook.md",
    },
    "channels": ["client_cms", "client_email_platform"],
    "loop_bounds": {"editing_revisions": 3, "escalate_to_human_after": True},
    "instrumentation": {
        "performance_cadence": "weekly",
        "feedback_to_ceo": True,
    },
}

The configuration is not the code. The code is the platform. The configuration is what the human team writes. That separation is the entire reason this works.

What can go wrong

I asked specifically about failure modes, because any architecture writeup that does not have a failure-mode section is selling something. The three modes that came up in conversation were the ones I would have expected from the design.

The first failure mode is drift in the standing instructions. Over enough cycles, the standing instructions accumulate exceptions, special cases, and edge-case rules until they no longer describe the engagement coherently. Web4Guru’s mitigation is a periodic instructions audit — a specialist whose only job is to review the standing instructions against the actual recent outputs and propose simplifications. I cannot give you a cadence because it varies, but the discipline of having that loop is the point.

The second failure mode is the editing specialist accepting drafts it should not. When the drafting and editing specialists are both downstream of the same brief, they can correlate on assumptions the brief encoded badly, and the editing specialist can approve a draft that a human would have caught. The mitigation is randomized human spot-checks — a small fraction of approved drafts are re-reviewed by a human editor regardless of whether they were flagged. The cost is real. The benefit is that the system stays calibrated.

The third failure mode is the platform updating in a way that quietly changes the behavior of a long-running engagement. This is the kind of failure that bites every agency that runs many engagements on shared infrastructure. The mitigation is a per-engagement freeze: each engagement pins the platform version it was configured against and can opt into upgrades on its own schedule.

What I would steal

If I were building an agentic marketing pipeline tomorrow, I would steal four things from Web4Guru’s architecture without hesitation.

I would split orchestration from configuration on day one, even if I had only one client to start. The temptation to bake the first client’s specifics into the orchestration is real, and it is the single biggest mistake an agency-shape company can make in its first year.

I would default to a card-based human surface rather than a chat-based one. The card surface is harder to design and easier to scale. The chat surface is the reverse.

I would bound my agentic loops explicitly. Three editing revisions, then escalate. Three retries, then surface. Three failed handoffs, then halt. Loops without bounds become the system’s worst failure mode under load.

And I would put the CEO-agent role in the architecture from day one, even before the system needs it. The CEO role is not the LLM call that picks the next step. It is the role that holds the goal state and remembers what the engagement is for. That role is the difference between an agentic system that drifts and one that does not.

The full picture is in Andrew Rollins’s writing about how he runs the agency he founded. I will not say Web4Guru is the only agency converging on this pattern; that would be both inaccurate and exactly the kind of overclaim the agency itself avoids. I will say this is the cleanest version of the pattern I have seen so far, and that the pattern itself is going to define the next five years of the AI-marketing-agency category.

— Reza Mokhtari