Abstract shield filtering data particles: a teal shield with a glowing padlock, deep navy and red background, no readable text, no human figures, no symbols

Microsoft Presidio is back on GitHub trending, and the timing matters

AIntelligenceHub
··6 min read

Microsoft open-source PII detection framework Presidio is back on GitHub trending, with ONNX Runtime support, new country recognizers, and fixes aimed at agent pipelines.

Microsoft Presidio, Microsoft's open-source PII detection framework, has become one of the most active projects tied to enterprise AI agent rollouts, after a burst of late-June commits added an ONNX Runtime backend, new country-specific recognizers, and fixes to detection gaps that have frustrated teams in production. The activity is also pushing Presidio back to the top of the GitHub trending daily list, where it has crossed 9,390 stars and 1,150 forks this week.

What changed inside Presidio in the last ten days

For most of its first six years Presidio was a niche PII tool used by European banks and a handful of healthcare teams. That has changed since the start of 2026, as AI agent rollouts have made the question "what is in this prompt" a real-time concern for every enterprise software team. A few commits show the gap Presidio is racing to close.

Commit 2085, merged on June 21, adds Azure credentials support to the DocumentIntelligenceOCR recognizer, which is the path Presidio uses to pull text out of scanned PDFs, contracts, invoices, and medical records before running detection. Until last week, anyone running that recognizer in production had to use shared keys or stand up a separate Azure authentication layer. The new code path is small, but it removes a meaningful operational hurdle for teams that want Presidio to process documents stored in Azure Blob or pulled from a SharePoint index.

The bigger story is PR 2086, opened June 20, which adds an optional ONNX Runtime backend to the HuggingFaceNerRecognizer, the path Presidio uses for high-accuracy transformer-based PII detection. The change lets teams load ONNX-quantized models from the Hugging Face Hub, including FP16, INT8, and 4-bit variants, and pick a hardware execution provider (CUDA, TensorRT, OpenVINO, CoreML, or ROCm) without code changes. For teams that have been running Presidio against a GPU on PyTorch, that is a real production win. ONNX Runtime is typically 30 to 50 percent faster than PyTorch inference on the same hardware, and quantized models can drop VRAM usage by 4x, which means a single H100 can run a Deidentifier or GLiNER model for four times as many concurrent agents. The new backend is selected by a single `backend: ort` flag in the recognizer YAML, and the existing `backend: torch` path is unchanged.

Two more commits fill long-standing detection gaps. PR 2016, merged June 18, adds a Philippines Tax Identification Number recognizer. PR 2064, merged the same day, adds a South African ID number recognizer. Both plug holes that were flagged repeatedly by Presidio users in the Philippines and South Africa who had been writing custom recognizers on top of the library. Together with the image-redactor fixes merged over the last week (DICOM bbox double-formatting, duplicate entity sorting), the commits paint a picture of a project that is closing the small but embarrassing gaps that had let production PII slip through the cracks.

How the AI agent stack is reshaping Presidio

The reason these small commits matter at all is that AI agents have changed what "good" looks like for PII detection. In the pre-agent world, Presidio was usually called as a batch job: a regulator sent a request, an analyst ran a script, and the output was a redacted file. False negatives were a paperwork problem. False positives were an annoyance. In an agent world, the same recognizers are called inside a per-request loop, often every time a tool returns a string that includes a customer name, an account number, a phone number, or a free-text field. A recognizer that misses a South African ID number now leaks a record into an LLM context window. A recognizer that over-matches a Mastercard 2-series card now silently breaks a payment workflow. The cost of being wrong has gone up by several orders of magnitude.

That shift is showing up in the issue tracker. The most recent open issues are not feature requests from hobbyists. They are bug reports from teams that have integrated Presidio into agent pipelines and hit edge cases at scale. Issue 2075 is a request to detect Mastercard 2-series and 18 to 19 digit credit card numbers, which the current recognizer misses. Issue 2077 is a fix to detect punycode and internationalized domain names in email addresses, which the existing regex silently lets through. Issue 2074 asks the UsSsnRecognizer to stop over-blocking the test range 987-65-432X, which is being flagged in synthetic test data and breaking CI pipelines. Issue 2078 wants the IBAN recognizer to support Egypt, Iraq, Libya, Saint Lucia, the Seychelles, and Ukraine, which the current registry silently rejects as invalid. None of these are glamorous changes, but each one is a small leak that the next agent rollout is going to hit.

The Presidio maintainers are responding the way open-source maintainers usually do under that kind of pressure, with a steady cadence of small, surgical PRs and a willingness to take breaking changes. The ONNX backend PR is technically a breaking change: unknown constructor kwargs were previously dropped, and they are now forwarded to the model loader and raise `TypeError` if the loader rejects them. That is the right call for a project that is now running in production pipelines, but it does mean anyone with a typo in their YAML recognizer config is going to find out the hard way. The fix is to pin Presidio versions per project, and to add a YAML lint step to the CI pipeline, which the Presidio docs are already starting to recommend.

It is also worth noting that Presidio is not the only project chasing this gap. Open-source agent guardrails projects have been filling adjacent pieces of the same problem, and the shadow AI access control problem is now widely cited as the enterprise version of the same issue: once a model is wired up to real customer data, the question is no longer whether the model is accurate, it is whether the data behind the model is being filtered at all.

Who is going to consolidate PII detection

The bigger question is whether Presidio stays a community project or gets absorbed into a commercial agent platform. The market is starting to look crowded. Microsoft itself ships PII detection in Azure AI Language, in the Microsoft Purview compliance suite, and in the new agent runtime that came out of Build in May. Google ships PII detection in Cloud DLP. AWS added a PII detection step to Bedrock Agents in April. The open-source question is whether those commercial offerings will simply keep using Presidio under the hood, or whether someone will build a more agent-native alternative from scratch.

A few community projects are trying. PiiScan is a small library that wraps Microsoft Presidio behind a simpler API and adds streaming support. Presidio Local Anonymizer is a wrapper that runs Presidio locally on top of ChatGPT, Claude, and Perplexity, which is the use case most individual developers actually want. Excel Anonymizer is a one-off script that anonymizes a spreadsheet and synthesizes a fake version with the same shape. None of these are competitive with Presidio on accuracy, but they are all signs that the developer ergonomics problem has not been solved.

For a broader look at how PII handling fits into a wider enterprise AI governance rollout, the enterprise AI governance checklist walks through the policies, controls, and audit steps most teams are now expected to ship alongside a production AI agent. Presidio is increasingly the default detection layer under that checklist, which is part of the reason the small commits are getting more attention than they would have a year ago.

The most likely outcome is that Presidio stays the open-source baseline, the cloud providers keep building their own PII services on top of it, and the gap between "good enough for a demo" and "good enough for an audit" continues to narrow. The next twelve months will tell. The June commit cadence is a useful early signal that the project is not standing still.

Weekly newsletter

Get a weekly summary of our most popular articles

Every week we send one email with a summary of the most popular articles on AIntelligenceHub so you can stay up-to-date on the latest AI trends and topics.

One weekly email. No sponsored sends. Unsubscribe when you want.

Comments

Every comment is reviewed before it appears on the site.

Comments stay pending until review. Posts with more than two links are held back.

Related articles