Tokenwise
A smart LLM proxy that shows where you're overpaying
Analytics
Developer Tools
Artificial Intelligence
Visit Website See on Product Hunt

Upvotes129

▲ 129View on ProductHunt ⧉

Comments18

18 commentsSee comments on PH ⧉

Featured onJune 1st, 2026

Hunted by

Théophile Louvart

Tokenwise is a one-line LLM proxy (OpenAI-compatible baseURL) for makers and small teams. It learns from your real requests, shows exactly where you're overpaying, proven with quality checks on your own traffic, not public benchmark, and lets you apply the fix in one click while it verifies the savings in real dollars.

Top comment

Upvotes129

▲ 129View on ProductHunt ⧉

Comments18

18 commentsSee comments on PH ⧉

Product of the Day12nd

Hey everyone, Theo here. I build a few small SaaS on the side of a full-time data engineering job, and at some point every one of them started leaning on LLMs. My API bills crept up every month and honestly I could never tell you why. Which feature, which prompt I'd changed last week, which model I picked without really thinking about it. I'd just top up credits and move on. The part that really got to me was the spend I couldn't even see. Claude Code running all day while I work, plus Cursor and Codex. None of that shows up anywhere until the invoice lands, and it turned out to be the money I understood the least. I tried the tools that already existed. One felt like it was in maintenance mode, one needed a whole observability setup just to get started, and one only worked if your stack was built around a specific framework. None of them were made for someone like me who just wanted to know where the money went and what to do about it. So I built Tokenwise. You add one line of code, or point your coding agents at it with no production changes, and you see every call: cost, latency, tokens, and what's being wasted. Then it tells you what to cut. A cheaper model here, a cache there, a bloated prompt to trim. Every fix gets checked against your own quality bar first, so you're never trading cost for worse output. The idea shifted a lot while I was building it. I started out thinking it was a dashboard. Then I realised nobody wants another dashboard, they want the answer: here's the $842 a month you're burning, and here's the one click to fix it. The real value was proving the savings on your own traffic, live. It's early and I'd genuinely love your honest feedback. Tell me what's missing, what's confusing, what you'd never use. That's more useful to me right now than anything. Thanks for taking a look.

Comment highlights

The shift from dashboard to apply is the right call. Every observability tool I have run hits the same wall: you have the numbers but no path from chart to fix without writing a migration script yourself. The A/B split on the apply step is the part that actually earns trust. Twenty four hours of quality scores on real traffic before a ramp is the difference between this and a vendor that just suggests a cheaper model and walks away. Question on the semantic clustering of prompt templates: how do you handle the case where two templates look semantically identical but have different system prompts that meaningfully change output? Do you cluster on the user message only or the full request shape?

Are you talking about query optimization/compression? Something like what Google recently did with an algorithm that compresses prompts by 7x without losing quality?

We've had seven AI agents running in production since last year, and token costs were a complete black box until the invoice arrived. We built a basic per-model logger ourselves -- took more engineering time than it should have. The edge case I'd push on: can you attribute spend to a specific workflow or user journey, not just a model? When you're debugging why a particular sequence of agent calls got expensive, model-level rollups aren't granular enough. That's where the real cost surprises live.

This looks awesome. I use a load balancer but I can probably use that OpenAI key and output to tokenwise but would you ever be interested in developing a built in load balancer for multi-account setups? I imagine there's far more savings to be had if your also load balancing time-based limits but I can also see where it might be out-of-scope for this project and better chained.

Great idea, I've had the same problem and spent many hours debugging transcripts to look for token savings.

The "quality check on your own traffic" angle is genuinely the right frame. I've tried a few LLM cost tools and most just show you aggregate spend with generic benchmark comparisons — which is basically useless when your prompts are domain-specific.
One thing I'd love to see: for coding agents specifically, the spend is often bursty and session-based. A 2-hour Claude Code session can easily hit $5-10, but you don't know *which* part of the session burned the tokens. Was it the initial codebase ingestion? A long loop of test-fix retries? Breaking that down by sub-task within a session would be way more actionable than just "session-xyz cost $8.40."

The "quality check on your own traffic, not public benchmarks" is the right frame, that's exactly the gap most LLM-cost tools wave at. Question for Théophile: when you replay a request on the cheaper model to verify quality, how do you score "same answer" without a human in the loop? Embedding similarity tends to be permissive and exact-match too strict.

Observe-only is probably where I’d start, especially for Claude Code spend. The scary part is the “apply” step.
Before swapping a model, does Tokenwise show exactly which traffic it will touch, and is there an easy rollback?

About Tokenwise on Product Hunt

“A smart LLM proxy that shows where you're overpaying”

Tokenwise launched on Product Hunt on June 1st, 2026 and earned 129 upvotes and 18 comments, placing #12 on the daily leaderboard. Tokenwise is a one-line LLM proxy (OpenAI-compatible baseURL) for makers and small teams. It learns from your real requests, shows exactly where you're overpaying, proven with quality checks on your own traffic, not public benchmark, and lets you apply the fix in one click while it verifies the savings in real dollars.

Tokenwise was featured in Analytics (172.7k followers), Developer Tools (515.7k followers) and Artificial Intelligence (473.5k followers) on Product Hunt. Together, these topics include over 197.8k products, making this a competitive space to launch in.

Who hunted Tokenwise?

Tokenwise was hunted by Théophile Louvart. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

Want to see how Tokenwise stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.