Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

Ollama v0.19

Massive local model speedup on Apple Silicon with MLX

Ollama v0.19 rebuilds Apple Silicon inference on top of MLX, bringing much faster local performance for coding and agent workflows. It also adds NVFP4 support and smarter cache reuse, snapshots, and eviction for more responsive sessions.

Top comment

Hi everyone!

The engineering in Ollama v0.19 is a massive leap for anyone running local models on macOS. Moving to Apple's native MLX framework changes the game for performance, leveraging the unified memory architecture and the new GPU Neural Accelerators on the M5 chips.

v0.19 now also supports NVFP4, which brings local inference closer to production parity, and the KV cache has been reworked with cache reuse across conversations, intelligent checkpoints, and smarter eviction. For branching agent workflows like @Claude Code or @OpenClaw , that should mean lower memory use and faster responses.

If you have a Mac with 32GB+ of unified memory, you can pull the new Qwen3.5-35B-A3B NVFP4 model and test this right now. Running heavy agentic workflows locally just became a lot more viable!