Models that fit
The 2025–26 wave — MiniMax M3, Llama 4, Qwen 3 — made sub-200B parameter models genuinely competitive. Capability stopped being a frontier-only property.
✦Field note · 04 June 2026
For years, “AI on your device” was a privacy footnote — trade capability for control. Last week, NVIDIA shipped the chip that ends the trade.
✦What was announced
At Computex 2026, NVIDIA unveiled the RTX Spark Superchip: an Arm CPU paired with a Blackwell GPU, with up to 128 GB of unified memory and 1 petaflop of FP4 AI compute on a machine you can close and put in a backpack.
It runs models up to 200 billion parameters and context windows up to 1 million tokens— entirely on the device. Ships in Surface Ultra, Dell, HP, Lenovo, ASUS, and MSI hardware this fall. The roadmap goes three generations deep: Rubin next, then Rosa Feynman.
This is not a refresh. It is a platform commitment.
┌──────────────────────────────────┐
128 GB → │ ████████████████████████████████ │
└──────────────────────────────────┘
model size fits in 128 GB ?
─────────────────────────────────────────────
7 B │ █ │ ✦ yes
70 B │ ████████ │ ✦ yes
200 B │ ████████████████████ │ ✦ yes
1 T │ █████████████████████ × │ ─ no
yesterday ───────── prompt ──→ network ──→ cloud GPU ──→ tokens ──→ you today ───── prompt ──────────────→ your machine ──→ tokens
✦The inflection
The 2025–26 wave — MiniMax M3, Llama 4, Qwen 3 — made sub-200B parameter models genuinely competitive. Capability stopped being a frontier-only property.
Apple Silicon proved the unified-memory thesis. Spark brings it to CUDA, where the weights, kernels, and tooling already live.
Microsoft is shipping Windows on Arm as an agentic OS — agents as first-class processes, not API calls behind a browser tab.
This is the first moment all three exist on a machine you can buy at Best Buy. The substrate is no longer the bottleneck.
✦For builders
No more rent-the-A100 dev loop
Iterate against the same hardware your user will run on.
Privacy stops being a feature
It is the default once nothing crosses a network.
Capex, not opex
Pay once for a device. Don’t meter every token your product generates.
Agents that read everything
Files, browser state, app context — addressable without a TOS gating access.
✦For users
Because the memory never crosses a wire, it can be honest about what it knows. Tools work on a plane. “Your data never leaves the device” stops being marketing and becomes a verifiable property of the system. The conversation is yours again.
✦Why we’re building what we’re building
Masi redacts PDFs on WebGPU in your browser. Maya is a Hinglish companion that runs entirely on your device. Drik walks your localhost the way a user would. All three are bets on the same substrate.
Spark is the substrate growing up. Good.
✦What’s still hard
✦ ✦ ✦
The cloud trained the models.
The Spark runs them.
The user owns the conversation.