How to Build AI Apps in the Browser with TensorFlow.js and WebGPU

Ayantunji Timilehin

Most developers think of AI the same way: you send data to a server, the server thinks, you get a response back. That mental model made sense for a long time. It still makes sense for a lot of use cases. But there’s a quiet shift happening inside the browser environment that a lot of engineers are completely missing out on. The modern browser isn’t just a glorified engine for rendering HTML and CSS anymore. It’s turning into a full-blown runtime for local intelligence. We’ve reached a point where you can ship raw machine learning models straight to a user's device and run inference completely client-side. No server trips, no API keys to protect, and once those initial assets load, zero dependency on an internet connection. This is the reality of Web AI. If you're building for the web today, understanding this paradigm shift is easily one of the most valuable skills you can add to your stack. In this guide, we’re going to pull back the curtain on how Web AI actually operates under the hood, break down the browser technology stack making it possible, and build a real, working image classifier using Teachable Machine and TensorFlow.js. Along the way, we’ll also set up a live benchmark so you can watch exactly how WebGL and WebGPU stack up against each other in real-time execution speeds. Prerequisites To follow along with this tutorial, you should have:

A working knowledge of JavaScript

Basic familiarity with HTML and how the browser works

Google Chrome installed (required for WebGPU support and Chrome's built-in AI APIs)

A code editor like VS Code with the Live Server extension installed (recommended for running the demo locally)

No prior machine learning experience is required. Table of Contents

What is Web AI?

Browser AI vs Cloud AI

The Technology Stack

How to Build AI in the Browser

Chrome's Built-in AI APIs

Where Web AI Is Headed

What You Learned

Resources

What is Web AI? Instead of sending data off to a distant cloud server, Web AI lets you run machine learning models directly on the user’s device inside their browser. It uses standard web tech like JavaScript, WebAssembly, and WebGPU to handle all the heavy lifting right then and there. The simplest definition: intelligence that runs in the browser, without sending your data anywhere. Most of us already interact with on-device AI every day without realizing it. Think about unlocking an iPhone. The second you lift it, Face ID maps out roughly 30,000 infrared points, feeds that data through a neural network living on Apple's local silicon, matches it against an encrypted embedding, and opens the phone. The whole process takes milliseconds and happens entirely offline. Browser-based AI works on that exact same core architecture. The only real difference is that we're building on top of shared web standards rather than native hardware APIs. When you spin up a face-tracking model using TensorFlow.js or MediaPipe in Chrome, you're running that exact same pipeline: Camera input → Local ML model → Local decision

No round trip. No server. The browser is your Neural Engine. Browser AI vs Cloud AI There’s no right or wrong answer here. It just depends on what you’re trying to build. Both approaches have their pros and cons, so it’s just a matter of picking the tool that fits your specific use case.

Browser AI (Client-Side) Cloud AI (Server-Side)

Internet required No Yes

Latency Near-zero Depends on network

Privacy Data stays on device Data leaves the device

Model size Small to medium As large as you need

Cost at inference time Free Per token or per request

Use browser AI when:

You need split-second speed for things like tracking gestures or detecting objects live on a webcam

The app has to work offline (whether it's a PWA or just needs to survive spotty internet)

Privacy is a hard requirement to keep sensitive data like medical inputs, biometrics, or financial information strictly local

You want to reduce or eliminate API costs on high-frequency, lightweight predictions

Use cloud AI when:

You need large models like GPT-4, Gemini Pro, or Stable Diffusion

You need centralized model updates, A/B testing, or user analytics

You require serious GPU or TPU compute power

Most production systems actually use a mix of both. Take Google Photos: it handles face detection right on your device so it’s fast and private, but leaves the heavier categorization work for the cloud. Or think of a modern web app that might use TensorFlow.js locally to classify images instantly, but calls the Gemini API when it needs deeper language processing. This hybrid setup, keeping lightweight intelligence at the edge and heavy compute in the cloud, is usually the sweet spot for most apps. The Technology Stack Browser AI isn’t just a single tool – it’s a stacked layer of technologies. Knowing how these layers fit together makes it a lot easier to choose your setup and navigate the trade-offs. Tensors Before jumping into any ML framework, you need to understand tensors. Not deeply, just enough of a handle on them so you don't get blindsided by tensor shape errors, because they will happen and they can be tricky to debug. Think of a tensor as a multi-dimensional grid of numbers. Whether your model is processing images, audio, or text, everything gets converted into this format first. Models only speak numbers, and tensors are the containers that hold them. A single number → 0D tensor (scalar): 42 A list of numbers → 1D tensor (vector): [0.2, 0.8, 0.5] A table of numbers → 2D tensor (matrix): [[1,2,3],[4,5,6]] An image → 3D tensor: shape [224, 224, 3] A batch of images → 4D tensor: shape [32, 224, 224, 3]

Models accept inputs in specific shapes. If your tensor shape doesn't match the model's expected input, your code breaks. That's why understanding dimensions is practical, not just theoretical. TensorFlow is literally named after this concept. Tensor + Flow = tensors flowing through neural networks. Here's how you create tensors in TensorFlow.js: // 1D tensor — a list of values const scores = tf.tensor([0.1, 0.7, 0.2]);

// 3D tensor — a single image (height x width x RGB channels) const image = tf.tensor([ [[255, 0, 0], [0, 255, 0]], [[0, 0, 255], [255, 255, 0]] ]);

// 4D tensor — a batch of 32 images const batch = tf.zeros([32, 224, 224, 3]);

TensorFlow.js TensorFlow.js is Google's JavaScript version of TensorFlow. It lets you run pre-trained models right in the browser and, if you really want to, train new ones completely client-side. The most important concept in TensorFlow.js is the backend, the hardware your model actually runs on. You can switch between backends depending on what the user's device supports, and it makes a significant difference to performance. await tf.setBackend('webgpu'); // fastest — true GPU compute await tf.setBackend('webgl'); // very fast — GPU via graphics shaders await tf.setBackend('wasm'); // fast — near-native CPU speed await tf.setBackend('cpu'); // slowest — plain JavaScript on CPU

await tf.ready(); console.log('Running on:', tf.getBackend());

In practice, you want to try the fastest available backend and fall back gracefully if a user's browser doesn't support it: const backends = ['webgpu', 'webgl', 'wasm', 'cpu'];

for (const backend of backends) { try { await tf.setBackend(backend); await tf.ready(); console.log('Using backend:', backend); break; } catch { continue; } }

WebAssembly WebAssembly (WASM) basically lets code written in C++ or Rust run inside the browser at near-native speeds. When it comes to AI, this is a big deal because heavy math operations like tensor calculations, data preprocessing, and running compressed models happen way faster in WASM than they ever could in standard JavaScript. Under the hood, TensorFlow.js's WASM backend is using a compiled C++ runtime. If you're running compressed models on a device's CPU, switching to the WASM backend can make your app anywhere from 2 to 10 times faster than just sticking with regular JavaScript. await tf.setBackend('wasm'); await tf.ready();

WebGL and WebGPU This is where browser AI performance gets interesting. WebGL was originally built for 3D graphics. But developers discovered that the parallel computation that GPUs use for rendering is exactly the kind of parallel computation neural networks need. TensorFlow.js's WebGL backend encodes tensor operations as graphics shader programs and runs them on the GPU. It works well, but it's a workaround, as WebGL was never designed for this kind of work. WebGPU is what was actually designed for the job. It launched in Chrome back in April 2023 after six years of collaboration between Apple, Google, Mozilla, Intel, and Microsoft. Instead of just handling graphics, it's a modern API built from the ground up for general-purpose computing. When it comes to running AI models, it can be 2 to 3 times faster than WebGL, which means you can actually run significantly larger models right in the browser. Here's how to check for WebGPU support and use it: if ('gpu' in navigator) { console.log('WebGPU is supported'); await tf.setBackend('webgpu'); } else { console.warn('WebGPU not available, falling back to WebGL'); await tf.setBackend('webgl'); }

await tf.ready();

To enable WebGPU in Chrome for development, go to: chrome://flags/#enable-unsafe-webgpu → Enable → Restart Chrome

The performance progression across backends looks like this:

Backend What's happening under the hood Relative speed

cpu Plain JavaScript on CPU Slow

wasm Compiled C++ on CPU Fast

webgl GPU via gr

How to Build AI Apps in the Browser with TensorFlow.js and WebGPU

How to Build AI Apps in the Browser with TensorFlow.js and WebGPU

Related Articles

Treasure Hunt Engine: How We Blew Up the Docs and Built a System That Actually Works

The Blacklist Nightmare: How to Get Off Spam Lists Fast

How I built a Bluesky scraper using the AT Protocol API (and published it on Apify)

Comments