Abstract editorial illustration of glowing AI agent nodes in a DAG-decomposed compliance graph, navy and teal, no humans, no readable text

Stripe ships a compliance agent on Amazon Bedrock

AIntelligenceHub
··5 min read

Stripe and AWS detailed a production compliance agent system that reduced review handling time by 26 percent and now runs more than 100 agents, with humans in the loop.

Stripe processes $1.4 trillion in payment volume a year across 50 countries, and its compliance teams were spending up to 80 percent of their time navigating fragmented systems. A new post on the AWS Machine Learning blog walks through the production agent system Stripe built on Amazon Bedrock to fix that. The result is a 26 percent drop in median review handling time, 96 percent helpfulness ratings, and humans kept in the loop.

The architecture of Stripe compliance agents

The system is the most detailed production writeup of an enterprise compliance agent that a major payment processor has published, and the architecture decisions are direct. Stripe and the AWS team broke every compliance review into a directed acyclic graph of sub-tasks, with each sub-task scoped small enough for an agent to finish in a bounded number of turns. A human reviewer drives the flow and answers each sub-task in turn, with the agentic responses piped back as context for downstream questions. The reviewer remains the decision-maker, and the agent is treated as a research assistant whose work is fully auditable. That is the load-bearing decision in the system: agent autonomy is bounded by DAG rails, human review is preserved on every sub-task, and the unit of trust is not the model output but the agent's full thought log.

The agent itself is a ReAct agent, the reasoning and acting pattern that interleaves model thoughts with tool calls. For each sub-task, the agent decides which internal signals are relevant, calls the right tool, takes the observation as a forced input, and reasons again. The observation step is a feedback control that prevents the agent from hallucinating tool results or drifting off topic. Stripe wrapped the loop in a dedicated agent service rather than the company's existing ML inference engine, because agents are network bound and stateful, while ML inference is compute bound and stateless. The agent service grew from a handful of agents at launch to well over 100 agents in less than a year, and the team is now expanding from synchronous stateless inference into stateful, multi-turn conversational agents.

What a DAG of compliance agents looks like

The infrastructure story is as instructive as the agent design. Stripe routes every model call through an internal LLM Proxy service that handles noisy neighbor protection, model fallbacks, prompt caching, and centralized monitoring. The proxy is the single point where Stripe specifies which model, what caching strategy, and what fallback chain applies. The choice of Amazon Bedrock is a feature, not a marketing one: prompt caching is supported, model selection is one argument, and the privacy and security model fits the constraints a payment processor already operates under. Bedrock also gives Stripe a clean path to fine-tuning and custom models, which the team expects to focus on next, and the Enterprise AI Use Cases for Finance and Operations page covers how that pattern maps to other regulated workflows.

The decomposition pattern matters because it is what makes the agent tractable. A single agent asked to handle a long, multi-stage compliance review in one shot will spend too much of its context on the wrong things and not enough on what the reviewer actually needs. Breaking the work into DAG nodes gives each agent invocation a tighter scope, keeps prompt length manageable across many turns, and lets the system cache a common prompt prefix that includes the agent's role, the review context, and the available tools. Stripe reports that prompt caching alone reduces input token cost by 60 percent, which is the difference between a research project and a system that 100 production agents can run on without breaking the budget, and the same caching pattern is what makes the system viable as agent counts scale.

Why Bedrock sits under the agent stack

The auditability story is the part that closes the deal with regulators. Every agent action, decision, and rationale is logged in a way that the system can retrieve historically for any past run. Reviewers see the agent's work as a thought log of tool invocations, observations, and reasoning steps, and that log is the artifact a regulator gets if compliance review is later challenged. Stripe's design keeps the human as the final answerer of every sub-task, so the system does not change who is accountable for a decision, but it changes how much evidence the system can produce about how that decision was reached. The same audit-first framing shows up across the wider agent governance stack, including the AWS Continuum and Context release that landed earlier this month and the wider push to standardize on agent identity, authorization, and execution provenance as a single stack.

The lessons are concrete. Keep agent tasks small enough for working memory and test quality incrementally rather than diving straight into full automation. Run async workflow architecture with DAG support so complex agent interactions stay auditable. Treat agents as a different resource profile from traditional ML, and stand up a dedicated microservice with cost instrumentation that tracks token usage per invocation. Constrain agents with rails so context stays bounded, and use task decomposition to keep prompt length predictable across many turns. Most importantly, keep humans in control: the agent assists, the reviewer decides, and the system logs every step. The 26 percent reduction in handling time is the early progress number, and Stripe is now extending the same approach to questions that can only be answered with context known during the review rather than before it.

For organizations rolling out their own production agent systems, the Stripe and AWS writeup is the cleanest pattern that has been published in 2026 for a regulated workflow. The combination of a DAG-orchestrated agent mesh, a ReAct loop with a forced observation step, an LLM proxy for cost control and noise isolation, and a dedicated agent service for the network-bound execution model is the architecture that scales. The pattern translates directly to any compliance, audit, or regulated review workflow where the human must remain the final decision-maker and the system must produce an audit trail that survives regulator scrutiny. The interesting question for the next 12 months is how many of these patterns the agent framework vendors will absorb into their defaults, and how many will stay Stripe-specific.

Weekly newsletter

Get a weekly summary of our most popular articles

Every week we send one email with a summary of the most popular articles on AIntelligenceHub so you can stay up-to-date on the latest AI trends and topics.

One weekly email. No sponsored sends. Unsubscribe when you want.

Comments

Every comment is reviewed before it appears on the site.

Comments stay pending until review. Posts with more than two links are held back.

Related articles