Abstract laptop and WebGPU chip concept art representing local AI inference in JavaScript environments

Transformers.js v4 Brings WebGPU AI to Browsers, Node, Bun, and Deno

AIntelligenceHub Editorial
·

Hugging Face released Transformers.js v4 with a new WebGPU runtime, broader JavaScript runtime support, and major performance gains that make local AI deployment more practical.

Transformers.js v4 just landed, and one number jumps out right away: Hugging Face says it saw about a 4x speedup for BERT-style embedding models after adopting ONNX Runtime's MultiHeadAttention operator.

The biggest shift is the new WebGPU runtime stack. According to the official release notes, the same Transformers.js code can now run with WebGPU acceleration not only in browsers, but also in Node, Bun, and Deno.

The release notes also claim GPT-OSS 20B (q4f16) ran at about 60 tokens per second on an M4 Pro Max during internal testing.

For developers shipping AI features, ModelRegistry is one of the more practical changes. It lets teams inspect required files, cache status, and available dtypes before loading models.

Read the official release notes here (GitHub release), and Hugging Face's deep-dive post here.

Internal reading: AIntelligenceHub home and AI topic page.

Source: Hugging Face

Related articles