This product was not featured by Product Hunt yet. It will not be visible on their landing page and won't be ranked (cannot win product of the day regardless of upvotes).
Product upvotes vs the next 3
Waiting for data. Loading
Product comments vs the next 3
Waiting for data. Loading
Product upvote speed vs the next 3
Waiting for data. Loading
Product upvotes and comments
Waiting for data. Loading
Product vs the next 3
Loading
Shimmy v2.0
The first pure-Rust GGUF inference engine. No C. No Python.
Two 5,200-token runs. Same model. SHA-identical byte output. That's a proof, not a benchmark. Shimmy v2.0 ships Airframe: pure-Rust GPU inference with hand-written WGSL compute shaders. No llama.cpp. No C. No Python. No CUDA. First production GGUF engine Rust all the way down — including the GPU shaders. Run TinyLlama, Llama 3.2, Phi, DeepSeek from GGUF. Drop-in for AnythingLLM, Open WebUI, Cursor, Zed via OpenAI or Ollama API. Windows, macOS, Linux. cargo install shimmy
The novel part — Helical Shift:
When the KV cache fills, a GPU compute shader slides the cached keys and values backward in the sequence dimension. Because keys and values are stored in raw pre-RoPE form (no position encoding baked in), the slide is a pure data copy — no trigonometric recomputation needed. Two independent 5,200-token runs crossing multiple compaction boundaries produce SHA-identical output. That's not an optimization; it's a provable mathematical invariant.
Why this matters:
Every other local inference tool — llama.cpp, candle, whisper.cpp — has a C or C++ core that Rust wrappers call through FFI. Airframe is the first production-ready GGUF inference engine that is Rust all the way down, including the GPU shaders.
Tech stack:
-13,586 lines Rust + 855 lines WGSL
-wgpu (WebGPU), bytemuck, tokio, axum
-Targets: Windows (D3D12), macOS (Metal), Linux (Vulkan)
What you can do right now:
-Run TinyLlama, Phi, Llama 3.2, DeepSeek Coder, and others from GGUF files
-Connect AnythingLLM, SillyTavern, Zed, Cursor, Open WebUI via Ollama or OpenAI API
-Generate beyond your context limit without crashes or garbage output
Privacy first! Own your process from implementation to production! Down with our evil corporate AI overlords.
https://github.com/Michael-A-Kuy...
About Shimmy v2.0 on Product Hunt
“The first pure-Rust GGUF inference engine. No C. No Python.”
Shimmy v2.0 was submitted on Product Hunt and earned 0 upvotes and 1 comments, placing #33 on the daily leaderboard. Two 5,200-token runs. Same model. SHA-identical byte output. That's a proof, not a benchmark. Shimmy v2.0 ships Airframe: pure-Rust GPU inference with hand-written WGSL compute shaders. No llama.cpp. No C. No Python. No CUDA. First production GGUF engine Rust all the way down — including the GPU shaders. Run TinyLlama, Llama 3.2, Phi, DeepSeek from GGUF. Drop-in for AnythingLLM, Open WebUI, Cursor, Zed via OpenAI or Ollama API. Windows, macOS, Linux. cargo install shimmy
On the analytics side, Shimmy v2.0 competes within Open Source, Developer Tools and Artificial Intelligence — topics that collectively have 1.1M followers on Product Hunt. The dashboard above tracks how Shimmy v2.0 performed against the three products that launched closest to it on the same day.
Who hunted Shimmy v2.0?
Shimmy v2.0 was hunted by Mike Kuykendall. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
For a complete overview of Shimmy v2.0 including community comment highlights and product details, visit the product overview.