Attackers Are Exploiting the Way AI Agents Choose Their Tools

Eighty-eight percent of organizations reported confirmed or suspected AI agent security incidents in the last twelve months. In healthcare, that figure climbs to 92.7%. These numbers come from a survey of more than 900 executives and practitioners by Gravitee's State of AI Agent Security 2026 report, and they're striking not just because the rate is high, but because of what companies believed before they got hit.

Eighty-two percent of executives said they're confident their existing policies protect against unauthorized agent actions.

They're wrong. And the gap isn't a misconfiguration problem. It's structural.

The Root Cause: AI Agents Trust Tool Descriptions Nobody Verified

When an enterprise AI agent needs to complete a task, it doesn't pull from a fixed list of pre-approved capabilities. It queries a tool registry (a shared catalog of available functions) and selects the best match by reading natural-language descriptions. There's no cryptographic verification of those descriptions. No behavioral audit. No human in that selection loop.

Nik Kale, a principal engineer specializing in enterprise AI platforms and security, describes the gap plainly: AI agents choose tools from shared registries by matching natural-language descriptions, but no human is verifying whether those descriptions are true.

That gap has opened an attack surface most security teams weren't built to monitor. And attackers found it before the security tooling did.

Tool poisoning covers several related attacks, each targeting a different phase of how an agent selects and uses tools. Tool impersonation is the most direct: an attacker publishes a tool to a shared registry with prompt-injection payloads embedded in its description. Because the agent's reasoning engine processes those descriptions through the same language model it uses to select tools, a well-crafted description can redirect the agent's behavior without any exploit code. Metadata manipulation is a close variant, where a malicious tool uses false behavioral descriptions to win selection over legitimate alternatives, then performs different operations once invoked.

Behavioral drift is harder to catch. A tool passes initial review when first published. Weeks or months later, its server-side behavior quietly changes, routing request data to an external endpoint, for example. The signature still matches. Provenance records still look valid. Nothing in standard security tooling flags it. Bait-and-switch attacks split the difference: a tool behaves correctly during the discovery phase, when audits might check it, then switches to malicious behavior during actual invocation.

The Invariant Labs team formally demonstrated the MCP Tool Poisoning Attack in April 2025, targeting the Model Context Protocol, the standard that governs how agents communicate with tools. CyberArk subsequently extended this to what it called Full-Schema Poisoning, where malicious instructions are embedded across the entire schema structure rather than just the description field. The attacks have grown more sophisticated since that initial disclosure. What started as an academic demonstration is now something 88% of enterprises say they've experienced in some form.

The security industry spent the last decade building defenses around artifact integrity. Code signing, SLSA provenance chains, software bills of materials: all of these answer the same question, which is whether an artifact really is what it claims to be. That's not the right question for tool registries. What agents need is behavioral integrity. Not just "is this tool what it says it is" but "does this tool actually do what it says it does, every time it runs, at the moment it runs." Artifact controls verify the artifact at rest. They say nothing about server-side runtime behavior. A tool can pass every supply-chain check in your security stack and still phone home to an external endpoint the moment it's invoked in production.

Even if you solved the tool registry integrity problem, AI agents carry a second structural weakness that makes containment difficult: most of them don't have real identities. The Gravitee survey found that only 21.9% of organizations treat AI agents as independent, identity-bearing entities with their own access controls and audit trails. Most agents inherit credentials from service accounts or rely on shared infrastructure. And 45.6% of teams use shared API keys for agent-to-agent authentication, which means one compromised agent can potentially impersonate another. This is how a tool poisoning attack escalates from a data exfiltration problem into a lateral movement problem. Among the organizations surveyed, 25.5% of deployed agents can create and task other agents. When child agents inherit compromised tool selections from a poisoned parent, the infection can propagate without any additional attacker interaction.

In the Gravitee survey, 82% of executives said their existing policies protect against unauthorized agent actions. Only 21% have runtime visibility into what their agents are doing at any given moment. And on average, only 47.1% of an organization's deployed agents are actively monitored or secured. More than half of all enterprise AI agents are running right now without security oversight or logging. Most AI security tooling was built for a different threat model: one centered on prompt injection at the user interface, not at the tool registry.

What a Real Defense Looks Like and How to Start Building One

Kale's proposed architecture introduces a runtime verification proxy that sits between an agent and the tools it invokes. Three validation mechanisms anchor the design.

Discovery binding locks the relationship between a tool as it appears during the discovery phase and as it appears at invocation time. If a tool's schema or endpoint has changed between those two moments, the proxy flags or blocks the call. Endpoint allowlisting is the simplest control to deploy immediately. Before an agent calls an external endpoint, that endpoint must appear on a pre-approved list. This doesn't prevent poisoning, but it contains the blast radius: a tool that tries to exfiltrate data to an unlisted host gets stopped at the network layer before any data leaves. Output schema validation checks what a tool actually returns against what it declared it would return. Tools that start producing unexpected fields, unusual data shapes, or suspiciously large payloads trigger a review queue rather than completing silently.

Kale's lightweight proxy implementation adds less than 10 milliseconds to each invocation, a rounding error in most enterprise workflows. The recommended rollout sequence is graduated: endpoint allowlisting first, then output schema validation, then discovery binding, then full behavioral monitoring as maturity increases. You don't have to build the full stack before getting meaningful protection.

Technical controls help, but organizational practices matter too. Agents need to be governed the same way privileged accounts are governed, with scoped access, audit logs, and rotation policies. ServiceNow and NVIDIA's Project Arc, announced last week, takes a complementary approach by making agent actions auditable by design, baking governance into the agent execution environment rather than treating it as an afterthought. Runtime tool verification and execution-layer auditability are both necessary components, not alternatives. For enterprises building or expanding AI agent programs, our Enterprise AI Governance Checklist covers the broader governance posture that tool-level security should sit inside.

The Gravitee survey paints a picture of organizations that moved faster than their security practices. 80.9% of technical teams are already past planning and into active testing or production. Only 14.4% can say all their deployed agents went live with full security and IT approval.

For most teams, waiting for industry-wide standards before acting isn't a realistic option. Tool poisoning attacks are already happening at scale. The practical starting point is endpoint allowlisting: enumerate every external endpoint your agents are permitted to reach, enforce that list at the network layer, and review the list regularly. It's not a complete solution, but it closes the most obvious exfiltration path immediately.

The second step is agent identity. Start treating agents as first-class identity subjects rather than extensions of service accounts. That means scoped credentials, separate audit logs, and rotation policies that don't rely on a shared key surviving indefinitely.

Third, build output logging into every agent pipeline before you need it. Debugging a poisoning incident without logs is nearly impossible. Most organizations don't realize they need them until after the incident.

Fourth, build a tool registry review process that evaluates behavioral claims, not just code provenance. Ask: what external endpoints does this tool call? What data does it return? Does either of those answers change over time? If you can't answer those questions, you're relying on the same artifact integrity framing that the attacks are designed to bypass.

The organizations that close these gaps first won't just have fewer incidents. They'll have better visibility into which incidents they do have, what actually happened, and how to respond faster.

Eighty-eight percent of enterprises already found out the hard way that "our policies protect us" and "we can see what our agents are doing" are not the same thing.

Attackers Are Exploiting the Way AI Agents Choose Their Tools

The Root Cause: AI Agents Trust Tool Descriptions Nobody Verified

What a Real Defense Looks Like and How to Start Building One

Get a weekly summary of our most popular articles

Comments

Related articles

ByteDance Boosts AI Budget 25% to $30 Billion, Pivoting to Chinese Chips

The EU Just Rewrote Key Parts of Its AI Law. Here's What Changed.

OpenAI Adds Real-Time Reasoning to Voice AI With GPT-Realtime-2