This product was not featured by Product Hunt yet. It will not be visible on their landing page and won't be ranked (cannot win product of the day regardless of upvotes).
Shimmy v2.0
The first pure-Rust GGUF inference engine. No C. No Python.
Two 5,200-token runs. Same model. SHA-identical byte output. That's a proof, not a benchmark. Shimmy v2.0 ships Airframe: pure-Rust GPU inference with hand-written WGSL compute shaders. No llama.cpp. No C. No Python. No CUDA. First production GGUF engine Rust all the way down — including the GPU shaders. Run TinyLlama, Llama 3.2, Phi, DeepSeek from GGUF. Drop-in for AnythingLLM, Open WebUI, Cursor, Zed via OpenAI or Ollama API. Windows, macOS, Linux. cargo install shimmy
The novel part — Helical Shift:
When the KV cache fills, a GPU compute shader slides the cached keys and values backward in the sequence dimension. Because keys and values are stored in raw pre-RoPE form (no position encoding baked in), the slide is a pure data copy — no trigonometric recomputation needed. Two independent 5,200-token runs crossing multiple compaction boundaries produce SHA-identical output. That's not an optimization; it's a provable mathematical invariant.
Why this matters:
Every other local inference tool — llama.cpp, candle, whisper.cpp — has a C or C++ core that Rust wrappers call through FFI. Airframe is the first production-ready GGUF inference engine that is Rust all the way down, including the GPU shaders.
Tech stack:
-13,586 lines Rust + 855 lines WGSL
-wgpu (WebGPU), bytemuck, tokio, axum
-Targets: Windows (D3D12), macOS (Metal), Linux (Vulkan)
What you can do right now:
-Run TinyLlama, Phi, Llama 3.2, DeepSeek Coder, and others from GGUF files
-Connect AnythingLLM, SillyTavern, Zed, Cursor, Open WebUI via Ollama or OpenAI API
-Generate beyond your context limit without crashes or garbage output
Privacy first! Own your process from implementation to production! Down with our evil corporate AI overlords.
https://github.com/Michael-A-Kuy...
No comment highlights available yet. Please check back later!
About Shimmy v2.0 on Product Hunt
“The first pure-Rust GGUF inference engine. No C. No Python.”
Shimmy v2.0 was submitted on Product Hunt and earned 0 upvotes and 1 comments, placing #33 on the daily leaderboard. Two 5,200-token runs. Same model. SHA-identical byte output. That's a proof, not a benchmark. Shimmy v2.0 ships Airframe: pure-Rust GPU inference with hand-written WGSL compute shaders. No llama.cpp. No C. No Python. No CUDA. First production GGUF engine Rust all the way down — including the GPU shaders. Run TinyLlama, Llama 3.2, Phi, DeepSeek from GGUF. Drop-in for AnythingLLM, Open WebUI, Cursor, Zed via OpenAI or Ollama API. Windows, macOS, Linux. cargo install shimmy
Shimmy v2.0 was featured in Open Source (68.5k followers), Developer Tools (514.1k followers) and Artificial Intelligence (471.1k followers) on Product Hunt. Together, these topics include over 185.1k products, making this a competitive space to launch in.
Who hunted Shimmy v2.0?
Shimmy v2.0 was hunted by Mike Kuykendall. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how Shimmy v2.0 stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.