╭───────╮
│ ◆ ◆ ◆ │
│ ◆ ◆ │
│ ◆ │
╰───────╯Text-to-image diffusion, 4 B parameters, compressed to 0.93 GB (1-bit) or 1.21 GB (ternary). First image model in its class to run directly on an iPhone. Ternary variant retains 95%of FLUX.2 Klein 4B accuracy. Apache 2.0.
- ✦9.4 s · 512×512 on iPhone 17 Pro Max
- ✦6.0 s · 512×512 on Mac M4 Pro (5.6× faster)
- ✦Apple Silicon + CUDA
┌───────┐
│ ▓ ▓ ▓ │
│ ▓ ◉ ▓ │
│ ▓ ▓ ▓ │
└───────┘Unified multimodal model. Text, vision, audio flow directly into the same transformer — no separate encoders. Runs on a 16 GBlaptop. Approaches the 26 B MoE variant on benchmarks. Apache 2.0.
- ✦Encoder-free architecture
- ✦Multi-Token Prediction drafters for lower latency
- ✦Available via Hugging Face, Kaggle, Ollama, LM Studio