Shadow agents: enterprise IT can't see what runs at the API layer

Most enterprise monitoring was built to watch humans. Autonomous AI agents do not log in, do not generate session records, and do not wait for a human to approve. They run at the API layer, chain tools together, and complete multi-step workflows. CIO columnist Lucas Bonner says the governance gap is structural, and he outlines what closing it actually requires.

The framing matters because it changes what the question becomes. It is no longer whether an organization will run shadow agents. It already does. The question is whether anyone in IT can reconstruct what those agents are doing, where the data is moving, and which decisions were made on the organization's behalf.

The catalyst is economic. Enterprise teams that embedded frontier AI models into everyday workflows quickly discovered that per-token cloud inference costs compound fast once agents run autonomously, making hundreds of API calls per task rather than one. The push toward local AI processing is the response. Google's Gemma 4 12B, released in June 2026 and designed to run on consumer-grade hardware with 16GB of VRAM, brings multimodal AI fully local to enterprise laptops without any cloud API dependency. For finance teams, that is cost relief. For IT governance teams, it is a new category of exposure. When inference moves onto thousands of distributed laptops, centralized telemetry disappears. The natural network choke points that monitoring tools rely on vanish with it. Without visibility infrastructure built before rollout, IT has no reliable way to know what those agents are accessing or deciding in the organization's name.

Why shadow agents break the visibility stack

Every monitoring tool, security scanner, and compliance platform most enterprises rely on was designed to track human behavior: logins, session durations, and file accesses triggered by a person at a keyboard. The implicit assumption in all of it is that a human is somewhere in the loop, generating observable signals. Agentic AI generates none of those signals. It operates at the API layer, bypasses the user interface entirely, retrieves context from data stores, reasons over it, and takes action. It does not log in. It produces no session record.

Box's April 2026 launch of the Box Agent shows exactly how fast enterprise software is moving in this direction. The Box Agent works natively on the enterprise content layer, respecting existing permissions and compliance controls while it autonomously searches, summarizes, and routes documents. That is solid engineering for business teams. It also means that contract reviews, approval chains, and regulatory filings can now be executed by an agent that leaves no login trace in the monitoring systems IT manages. The compliance consequence is real. An agent can chain tools in ways that move sensitive data from a secured internal store to an external processing endpoint because the agent found the connection useful, all within valid permissions, with no single step appearing suspicious and no record in any system IT is watching. The violation happens in the reasoning layer.

The architecture required is a shift from perimeter defense to runtime isolation. Perimeter defense assumes you control what enters the environment. When agents run locally, call external APIs dynamically, and chain tools based on autonomous reasoning, the perimeter boundary is no longer a meaningful control surface. Microsoft's Agent Executor, part of the Microsoft Agent Framework, provides a practical model here. The Agent Executor wraps an agent in a sandboxed runtime that manages session state, conversation context, and tool permission boundaries within a controlled envelope. An agent inside a properly configured executor cannot reach unauthorized systems or take unapproved actions regardless of what the model decides to do. The security guarantee shifts from trusting the model's output to controlling what it is allowed to execute. For any organization under compliance mandates, that distinction between trust and control is not a nuance. It is the design requirement.

Closing the governance gap also requires a new technical role that most enterprise IT teams have not hired for: the forward-deployed AI engineer, a distinct discipline from DevOps. A DevOps engineer asks whether the system is up. A forward-deployed AI engineer asks whether the agent is doing what was intended and only that. Their work covers three areas. Prompt governance treats the instructions that drive agent behavior as code, with version control, hardening against prompt injection attacks, and re-testing after every model version change. Guardrail design defines in technical terms what each agent is permitted to access, which external systems it may contact, and which categories of action, financial transactions, credential access, and outbound data transfers require human authorization before the agent can proceed. RAG pipeline governance scopes retrieval correctly and audits it on a consistent schedule, one of the most underestimated security responsibilities in agentic deployment, since overly permissive retrieval creates data exposure paths that are hard to detect until something has already gone wrong.

Platforms built for the multi-agent fleet

One sandboxed agent with clear guardrails is manageable. A fleet of coordinating agents with distinct permissions, running simultaneously across cloud, desktop, and on-premises environments, is a qualitatively different problem that requires dedicated infrastructure. Automation Anywhere's EnterpriseClaw, launched in May 2026 with Cisco, NVIDIA, Okta, and OpenAI as partners, is the most complete platform to address this so far. NVIDIA contributes OpenShell, an open-source runtime for deploying autonomous agents safely, plus NIM microservices with Nemotron models for on-premises customers. Okta handles cross-agent identity management and policy enforcement across the entire agent fleet. Cisco AI Defense provides an agent-specific threat detection layer that conventional network monitoring cannot replicate. OpenAI enables production workflows on its latest models. The platform gives IT a single governance surface: centralized policy, behavioral monitoring, and auditable observability across every agent regardless of where it runs. The core principle is that no agent, cloud-hosted or running locally on a laptop, operates outside a defined policy boundary. EnterpriseClaw is currently in preview, with general availability expected later in 2026.

That last point is what separates the platforms that will work from the ones that will not. The forward-deployed AI engineer is the person who defines the policy boundary. The platform is the surface that enforces it. Without both, the agent fleet is operating on borrowed time inside whatever defaults the model vendor shipped. With both, the question stops being whether agents are running and starts being whether they are running inside the policy that the organization actually approved.

The next quarter for IT leaders

Building governance into a personal AI agent took deliberate effort: a permission guard with path allowlists, blocked commands, manual approval triggers, and a chained audit log. That overhead for a personal tool on a single laptop previews what enterprises face at orders of magnitude larger scale, across systems they did not build and agents they did not deploy themselves. The tools are available. The architectural patterns are documented. What is missing in most organizations is the deliberate decision to build governance in parallel with deployment, not as remediation after the first incident.

Every shadow agent in the environment was approved somewhere, by someone, for a specific purpose. The question is whether anyone still has a current, verifiable line from that approval to what the agent is doing right now. If the answer is no, or uncertain, that is where the work needs to start. Shadow agents are not a future problem. They are in production today, summarizing documents, routing decisions, and interacting with systems the monitoring tools cannot observe. IT leaders who build real accountability infrastructure around them will be positioned to harness autonomous AI with confidence. The ones who wait will spend their time explaining, after the fact, how something happened that nobody could see.

For more on the structural access-control problem in agent-driven environments, see our guide to enterprise AI governance and the related Shadow AI access control story from June. The original CIO column by Lucas Bonner is the primary source for the framing above.

Shadow agents: enterprise IT can't see what runs at the API layer

Why shadow agents break the visibility stack

Platforms built for the multi-agent fleet

The next quarter for IT leaders

Get a weekly summary of our most popular articles

Comments

Related articles

Netzilo brings AIDR runtime governance to Amazon Bedrock AgentCore

Warner AI AGENT Act sets the first US rules for autonomous agents

Anthropic puts Fable 5 back online after 18 days of export controls