OpenAI's first custom chip, Jalapeño, lands for LLM inference
OpenAI and Broadcom on Wednesday unveiled Jalapeño, the first custom AI inference chip in a long-term partnership. The ASIC ships to data centers by end of 2026 and targets large language model inference at scale.
OpenAI and Broadcom on Wednesday unveiled a custom AI chip called Jalapeño, the first silicon the two companies have built together. The chip is an application-specific integrated circuit (ASIC) designed from the ground up for large language model inference in data centers, and both companies say deployment to OpenAI's data centers is targeted for the end of 2026.
The two companies have been working on the project for about nine months. Broadcom, the established silicon supplier whose name is on a wide range of networking and storage chips, built the ASIC based on what it describes as detailed insight from conversations with OpenAI researchers. OpenAI shared its own roadmap for future models and products, which Broadcom says shaped the chip's design. The chip is being manufactured with Celestica, which has emerged as the manufacturing partner for the high-end custom AI silicon going into hyperscaler fleets.
Jalapeño lands at a moment when the AI infrastructure buildout is shifting from a fight for general-purpose GPU capacity to a fight for inference-specific silicon. Frontier model providers, including OpenAI's competitors, are looking at custom chips as a way to control cost, lock in supply, and tune performance for the specific workloads that dominate production traffic. The interesting thing about Jalapeño is that OpenAI is the buyer of the chip and a co-architect of the chip, not a customer buying off-the-shelf parts from a third party.
The chip is designed for inference, not training. That distinction matters. Training is the compute-heavy process of teaching a model on a corpus of data, and it has historically run on Nvidia's most powerful GPUs because flexibility matters when the workload keeps changing. Inference is the production process of running an already-trained model on a user request, and it is the part of the stack that frontier labs spend the most on, day after day, once a model is in production. A custom chip can be tuned for the specific shape of inference traffic, which is a narrower workload than training, and that is where the efficiency story comes in.
OpenAI says early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art options, though the company is not yet ready to publish detailed benchmarks. A detailed technical report is expected in the coming months. Until then, the headline claim is that the chip is more specialized for the current shape of LLM inference than the general-purpose hardware currently in production data centers.
The broad strategic argument from OpenAI is that owning the full stack behind its models and products reduces dependence on outside suppliers, including Nvidia, and gives the company a way to differentiate on cost or efficiency through vertical integration. That is the same playbook Google has run with its Tensor Processing Units for almost a decade, and one that Amazon and Microsoft have followed in their own ways with Trainium, Inferentia, and Maia. OpenAI is now joining that group of frontier model operators that design their own inference silicon. The full announcement is covered in detail by Ars Technica's report on the Jalapeño chip.
Inside the Jalapeño chip and its inference role
The decision to go with a Broadcom-built ASIC rather than an in-house design points to two things at once. First, OpenAI wants to move fast: nine months from concept to announced silicon is a tight timeline that a company with no chipmaking experience of its own can hit by leaning on Broadcom's design and packaging capability. Second, OpenAI is signaling that the chip is a long-term platform rather than a one-off. Both companies describe Jalapeño as the first generation in a long-term project, with refinements to come.
The performance claim is also worth reading carefully. ASICs are not flexible the way GPUs are. A GPU can run any model, any shape of workload, and any research code that fits in its memory. An ASIC, by definition, can only run the things it was designed to run well. The trade-off is that for the specific workload it was designed for, an ASIC can deliver substantially better performance per watt than a general-purpose chip. That makes ASICs a good fit for production inference at scale, where the workload is well understood and the volume is high. It is a worse fit for research workloads that change shape every month.
For a company running inference at the scale OpenAI does, the math on that trade-off is straightforward. A chip that is 30% more efficient at LLM inference than an equivalent GPU, deployed across a fleet of millions of units, can pay for its design and manufacturing cost many times over. The interesting question is not whether the chip exists, but what fraction of OpenAI's total inference fleet it will eventually represent, and what fraction will still run on Nvidia hardware.
The chip's deployment target of end of 2026 also tells us something about OpenAI's capacity planning. Inference demand has been growing faster than the supply of inference hardware, and frontier model providers have been trying to lock in as much compute as they can. Building a custom chip is one way to add to that supply, but it does not displace existing capacity overnight. Through 2026, the bulk of OpenAI's inference is still going to run on Nvidia and AMD hardware. Jalapeño is a 2027-and-beyond capacity story more than a 2026 capacity story.
What the OpenAI chip tells us about the AI infrastructure market
OpenAI's move into custom silicon is the strongest signal yet that the AI infrastructure market is splitting into a general-purpose tier and a purpose-built tier. The general-purpose tier, dominated by Nvidia, is where new models get trained and where research workloads run. The purpose-built tier is where production inference at scale lives, and it is becoming a market for custom ASICs from a small group of suppliers that includes Broadcom, Marvell, and a handful of in-house designs from Google, Amazon, and Microsoft. OpenAI is now officially a buyer in that purpose-built tier.
This split has real consequences for the rest of the AI industry. For a deeper look at how chips, cloud, and capacity choices fit together for both frontier labs and enterprises, the AI Infrastructure in 2026 resource page walks through the practical decisions buyers are facing right now.
The other thing to watch is the partnership structure. Broadcom has become the dominant merchant supplier of custom AI silicon to the hyperscalers and frontier model operators that want to design their own chips without becoming chip companies. Google was Broadcom's anchor customer for the TPU program, and now OpenAI is joining that group. As the number of buyers grows, the cost of designing a custom ASIC gets amortized across more volume, which lowers the barrier for the next frontier lab that wants its own chip. That is how the cost curve on custom silicon tends to bend.
There is also a competitive risk for OpenAI in committing to a custom architecture. Designing silicon is a multi-year, multi-billion-dollar bet, and the workload the chip is optimized for has to stay stable enough that the chip's design remains relevant. If the shape of LLM inference traffic changes substantially in the next two or three years, whether because of new model architectures, new serving techniques, or new use cases like agentic workflows that put more state on the chip, the chip may need to be redesigned. That is the standard risk every custom-silicon program runs, and it is one of the reasons OpenAI described Jalapeño as the first generation in a long-term project rather than a finished product.
Where the Jalapeño roadmap goes from here
Both OpenAI and Broadcom have framed Jalapeño as the first chip in a multi-generation program. The next milestones to watch are the detailed technical report OpenAI has promised in the coming months, the first deployment in production data centers, and the early performance numbers once the chip is running real inference traffic rather than synthetic benchmarks. The interesting question for the rest of the industry is whether the performance per watt claim holds up under production conditions, and how quickly OpenAI is willing to shift inference volume from Nvidia and AMD hardware to Jalapeño once it is in production.
For a related look at how another major infrastructure provider is shaping agent traffic, Google's Interactions API is now the default for Gemini agents covers a similar move on the developer-interface side. The same vertical integration story, applied to the layer between the model and the application.
Weekly newsletter
Get a weekly summary of our most popular articles
Every week we send one email with a summary of the most popular articles on AIntelligenceHub so you can stay up-to-date on the latest AI trends and topics.
Comments
Every comment is reviewed before it appears on the site.
Related articles
Engram raises $98M to teach AI models your company, not the open web
The 13-person startup says its custom-trained models hit 10x to 100x token efficiency on enterprise knowledge. Notion, Harvey, and Microsoft are already piloting it.
Coval raises $28M to test AI voice agents before they reach customers
Coval raised $28M led by Norwest with Base10, Twilio Ventures, and Y Combinator, bringing total funding to $31M. The platform simulates and monitors AI voice agents so enterprises can ship them without silent failures.
Seltz raises $12.5M to build a search engine that AI agents can actually use
Seltz, rebuilding web search for AI agents, raised $12.5M in seed funding from Speedinvest and B Capital. The startup owns its own crawler, index, and ranking and pitches itself as a retrieval layer for LLM agents.