Run state-of-the-art open-source models (GLM 5.1, Kimi K2.7 Code, MiniMax M2.7, and more) in Claude Code at up to 4× the speed (up to 200 tok/s) for a flat $29/month. Set up in minutes, no code changes.
Story time. A few weeks ago I was working with Claude Code on a refactor with Opus. The model knew exactly what to do, but I sat there watching a 500-line file crawl out one token at a time. Two minutes for one file. Multiply that by every step of the agent loop and you realize: speed is the silent tax on every coding session.
Around the same time I started testing open-source models like GLM and Kimi K2.7. The quality on coding tasks was honestly impressive.
But the speed on standard endpoints was even slower than the closed models. And the setup was painful: API keys, code changes, CLAUDE md to rewrite, MCP servers to reconfigure.
That's the problem we built Edgee Turbo Models to solve.
What it does:
→ Run frontier open-source models (GLM 5.1, Kimi K2.7 Code, Kimi K2.6, MiniMax 2.7) directly in Claude Code.
→ At up to 4x the speed of standard endpoints (~200 tok/s vs ~50).
→ Flat $29/month. No metered token bill that climbs as your agents work harder.
→ Setup in 2 minutes. Your CLAUDE md, MCP servers, and entire setup stay exactly where they are.
Important point I want to get out front because it'll come up:
Turbo is NOT a smaller or quantized version of these models. They are the full open-weight checkpoints. Turbo only changes how they are served, on dedicated high-throughput inference infrastructure built for raw speed, not a shared best-effort endpoint. Same outputs, just faster.
How this fits with our previous launches:
- Compression: use fewer tokens per request
- Teams: see who uses what, per repo, per PR
- Fallback Models: keep working when Claude or Copilot hit limits
- Turbo Models: run open-source models at premium speed, for flat pricing
Together that is the Route + Compress + Observe stack of our Agent Gateway. Today we're shipping the speed layer.
Why now: The Economist published a piece this week confirming that "token-maxxing is over" and that companies are routing to cheaper models. Open-source models are clearly part of the answer. Turbo
makes them actually usable.
A few questions I'd love your feedback on:
→ Which open-source coding model are you most curious to try?
→ Is flat $29/month the right price point, or would you prefer usage-based?
→ What other models should we add to the Turbo lineup?
Will be in comments all day. Thanks for checking it out 🙏
Being able to run different models through Claude Code is really cool. Can you switch between models mid-session, or is it set per project?
flat $29/month instead of usage-based is the right call for anyone running agents that loop unpredictably. the worst part of token-based pricing is never knowing what the bill will be until it's too late. also being able to swap in open-source models without changing any code or rewriting configs removes the biggest barrier to actually trying them. most people stick with what they know because switching is painful, not because alternatives aren't good enough
Proxying Claude Code's API calls through a gateway to route to Kimi K2.7 or MiniMax without code changes is clean architecture. We've hit throughput ceilings in agentic workflows where task latency compounds fast, so the 4x speed claim is interesting. Does Edgee handle automatic fallback if a model hits rate limits mid-session?
Love the flat rate approach for unpredictable agent loops, excited to test Kimi K2.7 Code with this kind of speed. Huge congrats on shipping this, @sachamorard
About Edgee Turbo Models on Product Hunt
“Use Claude Code with Kimi K2.7 Code, MiniMax M2.7, and more”
Edgee Turbo Models launched on Product Hunt on June 16th, 2026 and earned 126 upvotes and 13 comments, placing #5 on the daily leaderboard. Run state-of-the-art open-source models (GLM 5.1, Kimi K2.7 Code, MiniMax M2.7, and more) in Claude Code at up to 4× the speed (up to 200 tok/s) for a flat $29/month. Set up in minutes, no code changes.
Edgee Turbo Models was featured in Software Engineering (42.6k followers), Developer Tools (514.1k followers), Artificial Intelligence (471.1k followers) and Vercel Day (19 followers) on Product Hunt. Together, these topics include over 180.2k products, making this a competitive space to launch in.
Who hunted Edgee Turbo Models?
Edgee Turbo Models was hunted by fmerian. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how Edgee Turbo Models stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Hey Product Hunt 👋
Sacha here, co founder of Edgee.
Story time. A few weeks ago I was working with Claude Code on a refactor with Opus. The model knew exactly what to do, but I sat there watching a 500-line file crawl out one token at a time. Two minutes for one file. Multiply that by every step of the agent loop and you realize: speed is the silent tax on every coding session.
Around the same time I started testing open-source models like GLM and Kimi K2.7. The quality on coding tasks was honestly impressive.
But the speed on standard endpoints was even slower than the closed models. And the setup was painful: API keys, code changes, CLAUDE md to rewrite, MCP servers to reconfigure.
That's the problem we built Edgee Turbo Models to solve.
What it does:
→ Run frontier open-source models (GLM 5.1, Kimi K2.7 Code, Kimi K2.6, MiniMax 2.7) directly in Claude Code.
→ At up to 4x the speed of standard endpoints (~200 tok/s vs ~50).
→ Flat $29/month. No metered token bill that climbs as your agents work harder.
→ Setup in 2 minutes. Your CLAUDE md, MCP servers, and entire setup stay exactly where they are.
Important point I want to get out front because it'll come up:
Turbo is NOT a smaller or quantized version of these models. They are the full open-weight checkpoints. Turbo only changes how they are served, on dedicated high-throughput inference infrastructure built for raw speed, not a shared best-effort endpoint. Same outputs, just faster.
How this fits with our previous launches:
- Compression: use fewer tokens per request
- Teams: see who uses what, per repo, per PR
- Fallback Models: keep working when Claude or Copilot hit limits
- Turbo Models: run open-source models at premium speed, for flat pricing
Together that is the Route + Compress + Observe stack of our Agent Gateway. Today we're shipping the speed layer.
Why now: The Economist published a piece this week confirming that "token-maxxing is over" and that companies are routing to cheaper models. Open-source models are clearly part of the answer. Turbo
makes them actually usable.
A few questions I'd love your feedback on:
→ Which open-source coding model are you most curious to try?
→ Is flat $29/month the right price point, or would you prefer usage-based?
→ What other models should we add to the Turbo lineup?
Will be in comments all day. Thanks for checking it out 🙏