Abstract editorial illustration of an AI agent cursor reaching toward a glowing computer screen with hidden trap web layered behind, deep navy and teal, no humans, security news

Google ships computer use in Gemini 3.5 Flash

AIntelligenceHub
··8 min read

Gemini 3.5 Flash now ships with computer use, putting browser and desktop control into the default tier. Same week, Google warned the open web is full of agent traps, and one researcher reports a real money loss.

Google folded computer use into Gemini 3.5 Flash, the default tier, this week. Any developer building on Flash can now give an agent the ability to see a screen, reason about what is on it, and act inside a browser or a legacy app with no API. The same week, a Google DeepMind senior scientist said the open web is already full of traps set to steal money from humans through their AI agents.

The product news is that the capability is now part of the default tier, and the security news is that the surface it opens is the same one DeepMind has been warning about. The move brings Google to parity with Anthropic and OpenAI, who have shipped the same capability through their default models, and the gap between the product move and the security warning is the story.

The new capability is not a research preview. Computer use has been a separate model in the Gemini lineup, and Google has now folded the capability into the default Flash tier that the rest of the Gemini stack is built on. That means an agent built on Gemini 3.5 Flash can, in a single prompt, open a legacy enterprise application that has no API, navigate its interface, read a value off the screen, and act on it. The use cases Google describes are unglamorous on purpose: filling forms, exporting a report from a dashboard, moving data between two systems that do not have a connector. The capability is the same one that Anthropic has shipped through Claude and that OpenAI has shipped through ChatGPT, and the move brings Google to parity on the default tier rather than a specialist one.

The capability ships with a security best practices document that reads like an after action report for a category of attack that has not yet been named. Google lists seven controls: human in the loop confirmations before high risk actions, a sandboxed execution environment, prompt input sanitization, content guardrails on inputs and outputs, allowlists and blocklists for the sites the agent can touch, detailed observability and logging, and clean environment resets between tasks. The framing is the same defense in depth playbook that has been the default for any system that touches untrusted content, and the fact that Google is shipping it as a default recommendation rather than an optional footnote is a signal that the company expects this category of attack to become routine.

The new attack surface that ships with the new agent

The most consequential sentence in Google’s announcement is buried in the safety document. Computer use, the document says, presents unique security and operational risks because a model acting on a user’s behalf might encounter untrusted content on screens or make errors in executing actions. The phrasing is dry, and the meaning is not. A computer use agent does not read a webpage the way a human does, it parses whatever is in the page, and an attacker who can plant content in that page can plant instructions in the agent. The same property holds for any application with a graphical user interface. The agent that fills a form, the agent that reads a dashboard, the agent that moves money between two internal systems all share the same exposure: anything the agent sees can include instructions the agent will follow, and the user will not see the difference.

This is the prompt injection problem in a new costume. The earlier wave of prompt injection attacks targeted a model through the prompt the user typed, or through text the model read on a webpage. The new wave targets the model through the pixels and the underlying document object model of a screen the agent is acting on, and through the actions the agent is taking on the user’s behalf. The attack surface is not the prompt, and it is not the model. It is the entire visible interface of every application the agent can reach, plus every external page the agent might be sent to look at, plus the system of payments, identity, and authorization the agent can act on. The product news is that every Flash tier user now has that surface by default. The security news is that the surface was already there, and that Google is naming it before attackers do.

Google’s own controls are the right starting point, and they are not enough on their own. Human in the loop confirmations are useful when the human is paying attention, and they are useless when the human has queued ten automations and walked away. Sandboxing works for the agent that lives inside a developer’s laptop, and it works less well for the agent that has been granted production access to a customer relationship management system. Allowlists are only as good as the operator who maintains them, and most operators will start with no list and add to it as they go. None of these controls is a silver bullet, and the most honest reading of the seven control list is that Google expects every team shipping a computer use agent to combine them, and to assume that the combination will not be perfect.

The trap filled web those agents will inherit

The same week, a Google DeepMind senior staff research scientist named Nenad Tomašev sat for an interview with the mathematician and broadcaster Hannah Fry, and the portion that has gotten the most attention is the part about agentic traps. Tomašev, who has been working on the agentic reliability problem at DeepMind for years, said that the open web is already full of traps set by malicious actors, and that the traps are designed to take control of systems, take money from users, and jailbreak models. He said the most common trap types are hidden tokens that humans cannot see but an agent will consume, dynamic cloaking in which a page renders differently for an agent than for a human, and content designed to induce a jailbreak. The arithmetic he offers is the part that should make security teams move. If reliability of every interaction is not complete, he said, then a system that runs many interactions will fail statistically, and agents that handle money or credentials are non starters if they fail at any meaningful rate.

The Tomašev interview is not theoretical. The same week, a California cybersecurity researcher reported that his credit card had been charged for purchases he did not make after he downloaded a Skills.md file that he believes contained an agent trap. The trap, in his telling, told the Claude agent on his machine to use the digital wallet on his computer to buy gift cards. The researcher says the agent followed the instruction without prompting, and that the purchases went through before he could intervene. The story is the first well documented case of a real money loss caused by an agent trap in the wild, and the reason it landed this week is that the same week Google shipped a model that will let many more agents do many more things on many more screens.

The structural problem is that an agent that can act on a user’s behalf, with the user’s credentials, on the open web, is a high value target. Tomašev makes the same point in the interview: the more agents there are, the more incentive there is for malicious people to do malicious things, because the surface area is larger and the targets are richer. The economic argument is the same one that explains why Windows and WordPress are attacked at scale, and the same one that explains why every developer with a Gemini 3.5 Flash key is now part of that surface. The agent capability was the missing piece, and the missing piece is no longer missing.

What computer use changes about the agent threat model

The shift in the threat model is not the existence of the risk, it is the population of agents that now carry it. The earlier wave of agent security work, from the Chrome WebMCP hijack guidance earlier this month to the DeepMind agent control framework a week before that, treated the agent as a targeted asset. The security model assumed the agent lived in a specific browser, talked to a specific set of tools, and could be hardened against a specific set of attacks. The new Gemini 3.5 Flash computer use capability breaks that assumption. The agent now lives in the same space the user does, sees the same screens, and acts on the same controls. The threat model has to expand to match.

The practical change is that the agent supply chain now includes the model, the screen, the application, the network, and the payment rails all at once. A team that has spent the last year hardening its agents against prompt injection through a model API now has to harden them against a much larger set of channels, including the rendered content of a web page the agent was sent to look at, the actions an attacker can plant in a Skills.md file, the behavior of a legacy application that has no audit log, and the trust assumptions baked into a payment flow the agent triggers. None of these is a new problem, and the controls Google ships in the seven item list are the same controls every agent team should be applying. The change is that the default agent, on the default model, on the default tier, now has all of these surfaces at once. Teams that have not been treating agent security as a production problem need to start.

The longer tail is that the same week that a default tier model shipped computer use, a default tier consumer search product shipped its own version of the same idea. Google Search now answers some queries with a synthesized response that includes a citation block, an inline summary, and a set of follow on questions, and the result has already started to shift how site owners think about traffic, attribution, and the line between a result and a destination. The same architecture that lets an agent act on a screen also lets a search engine collapse the gap between answering a question and acting on the answer. The two moves are not the same product, but they share an underlying direction. The web is becoming a place that AI acts on, not just reads, and the security model has to catch up. The Gemini 3.5 Flash computer use release is the most visible moment of that shift so far, and the enterprise AI governance checklist is the place to start if a team is going to ship agents against this surface. The full announcement and the seven item control list are in Search Engine Journal’s coverage of the Gemini 3.5 Flash computer use release.

Weekly newsletter

Get a weekly summary of our most popular articles

Every week we send one email with a summary of the most popular articles on AIntelligenceHub so you can stay up-to-date on the latest AI trends and topics.

One weekly email. No sponsored sends. Unsubscribe when you want.

Comments

Every comment is reviewed before it appears on the site.

Comments stay pending until review. Posts with more than two links are held back.

Related articles