This product was not featured by Product Hunt yet.
It will not be visible on their landing page and won't be ranked (cannot win product of the day regardless of upvotes).

Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

DiffusionGemma

Open LLM that generates 256 tokens per forward pass

DiffusionGemma is a 26B MoE open model that generates text in parallel blocks using a diffusion approach, delivering up to 4x faster local inference for researchers and developers building speed-critical or non-linear text applications.

Top comment

The autoregressive assumption has been baked into LLM inference for years. DiffusionGemma is an open-weight experiment in questioning it.

Token-by-token generation is efficient on cloud servers batching thousands of requests. On a single local GPU, it wastes most of your compute. DiffusionGemma generates 256 tokens in parallel per forward pass, refining the full block iteratively until the output converges — shifting the hardware bottleneck from memory-bandwidth to compute, where dedicated GPUs have the most headroom.

  • 4x faster inference on dedicated GPUs: 1000+ tokens per second on H100, 700+ on RTX 5090

  • Bi-directional attention across the generation block, suited for code infilling, inline editing, and non-linear text tasks

  • 26B MoE, 3.8B active parameters, 18GB VRAM when quantized — consumer GPU accessible

  • Apache 2.0, available now on Hugging Face with ecosystem support from vLLM, MLX, Unsloth, HF Transformers, and NVIDIA NeMo and NIM

The tradeoff is real: quality is lower than Gemma 4, and Google recommends Gemma 4 for production outputs. Speedup is also dedicated-GPU-specific.

This is for researchers and developers who want to run fast, non-linear generation experiments locally without enterprise hardware.

Grab the weights on Hugging Face and see what the parallel decoding architecture opens up for your use case.

I hunt the latest and greatest launches in tech, SaaS and AI, follow to be notified.

About DiffusionGemma on Product Hunt

Open LLM that generates 256 tokens per forward pass

DiffusionGemma was submitted on Product Hunt and earned 0 upvotes and 1 comments, placing #124 on the daily leaderboard. DiffusionGemma is a 26B MoE open model that generates text in parallel blocks using a diffusion approach, delivering up to 4x faster local inference for researchers and developers building speed-critical or non-linear text applications.

On the analytics side, DiffusionGemma competes within Open Source, Developer Tools and Artificial Intelligence — topics that collectively have 1.1M followers on Product Hunt. The dashboard above tracks how DiffusionGemma performed against the three products that launched closest to it on the same day.

Who hunted DiffusionGemma?

DiffusionGemma was hunted by Raghav Mehra. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

For a complete overview of DiffusionGemma including community comment highlights and product details, visit the product overview.