Edgee Claude Code Compressor V2
Fewer tokens, same context, 50% cost reduction
API
Developer Tools
Artificial Intelligence
Visit Website See on Product Hunt Twitter ⧉

Upvotes168

▲ 168View on ProductHunt ⧉

Comments26

26 commentsSee comments on PH ⧉

Featured onJuly 6th, 2026

Hunted by

fmerian

Compression V2 cuts coding-agent token bills with three techniques across two layers: sharper tool result trimming, new task-aware tool surface reduction, and output brevity. Drop-in for Claude Code, Codex, OpenCode, and Cursor. Semantically lossless.

Top comment

Upvotes168

▲ 168View on ProductHunt ⧉

Comments26

26 commentsSee comments on PH ⧉

Product of the Day5th

Hey friends
Sacha here, founder of @Edgee.
Back in March we shipped our first compressor. One technique, tool result trimming, inspired by the RTK project. It delivered around 10% cost savings on real coding sessions. Safe, simple, limited.
Today we're launching Compression V2, and it is a different beast.
Three orthogonal techniques, each attacking a different layer of an agent request, each toggleable independently:
1. Brevity (output layer). The model still does every tool call and produces the same final patches, it just stops narrating its plan. Output is the most expensive token class, and this is where the big
win lives.
2. Tool surface reduction a.k.a TSR (input layer). Agents send the model the union of every MCP tool on every request, even when 95% are irrelevant. V2 runs a fast classifier that scores each tool against the task and strips the rest before the request hits the model. Your IDE still exposes everything, the model just sees a curated subset.
3. Tool result trimming (history layer, refined from V1). Cleans up the verbose tool outputs that pile up over a long session without dropping what the model needs.
Because the three touch different layers, they compose cleanly. Combined, that lands around 50% cost reduction on a typical session.
The part I am most proud of is not the number, it is how we measured it. Our research engineer @0kham ran this on SWE-bench Lite in agent mode, with paired sign tests, bootstrap confidence intervals, per-replicate cache nonces so no run gets an unfair cache advantage, and token counts read straight from the raw usage fields. We published the full methodology, including where each technique is strong and where it is modest. Brevity is ~30% median on coding, TSR is a huge token-volume win on tool-heavy MCP work, trimming compounds over long sessions. No single inflated headline number, just the decomposition and the stats behind each claim.
Full technical write-up here if you want the tests and the CIs: https://www.edgee.ai/blog/posts/introducing-compressor-v2-three-compression-layers-measured-end-to-end-for-a-50-cost-reduction
It is semantically lossless on code tasks, drop-in for Claude Code, Codex, OpenCode, Github Copilot and Cursor, with under 12ms P50 gateway overhead. Your CLAUDEmd and MCP servers stay exactly where they are.
A few questions I would love your take on:
Which of the three techniques sounds most useful for your workflow?
Anyone running heavy MCP setups who wants to try tool surface reduction and tell us what you see?
Will be in the comments all day. Thanks for checking it out

Comment highlights

the TSR classifier is the clever bit — scoring every MCP tool against the task and stripping irrelevant ones before they bloat the context. how much latency does the classifier itself add on each request, and does it run per-turn or only at task start?

Brevity being the biggest win makes sense to me — I notice Claude Code narrating plans I never asked for. Does suppressing that narration change how easy it is for a human to follow along mid-session, or is it only trimmed on the wire?

If you’re using coding agents all day, them tokens add up really fast and cutting them would save a lot of money. Does the token compression ever affect the quality of the responses?

I enjoy products that improve existing tools of replacing then. how does Compression V2 perform on very long debugging sessions with repeated tool calls? Publishing those numbers could answer another important question for engineering teams.

Rare to see a cost claim backed by paired sign tests and per-replicate cache nonces instead of one headline number — that got me to read the full write-up. The one thing I couldn't work out: how compression interacts with prompt-cache prefix stability. On Anthropic, a cache read costs ~1/10 of uncached input, and a long Claude Code session gets most of its economics from the prefix staying byte-stable. History-layer trimming that touches an already-sent tool result mutates the prefix and invalidates everything after it — and TSR looks like it has the same tension, since the tool block sits at the very top of the prefix, so a per-task curated subset would change the first bytes of the prompt as the task evolves. Two questions: (1) Is trimming append-only — freeze what's already been sent, compress only new turns — or re-optimized per request? (2) When you say ~50%, is that net of the lost cache discounts, or measured on uncached token counts? The nonces read like caching was deliberately neutralized — the right call for isolating each technique, but it leaves the production net effect open. Asking as squarely the target user: solo dev, long-running sessions, a bill that's mostly cache reads.

Token compression is most valuable when it preserves the parts humans forget to restate: constraints, failed attempts, release risk, and why a decision was made. If those survive compression, this becomes more than a cost tool; it becomes a safer long-running-agent primitive.

That's clever. Does it figure out the active task on its own, or do you need to pass in context?

Me noticed the emphasis on keeping results semantically lossless. could users review before and after token reports for every session? That level of transparency might encourage wider adoption across larger engineering organizations.

Curious how Edgee handles the “gateway” part in practice for coding agents. Is the main idea routing requests across different agent providers for cost and speed, or is it more about giving engineering teams one place to manage access and workflows? The “cheaper, faster, unstoppable” promise is clear, just wondering where the biggest control point is.

plugged it into my Claude Code setup over the weekend and actually saw the token count drop on a refactor job. nice that it just routed around my OpenAI rate limit without me even noticing.

The 50% cost reduction claim is the kind of thing that varies a lot depending on what's actually in the context window. Conversations with heavy tool call output, long file reads, or repeated error traces compress very differently than a tidy back-and-forth session. Curious what the benchmark corpus looks like and whether that number holds on real agentic sessions rather than cleaner workloads. Also wondering whether the compression is lossy in any meaningful way, specifically whether there are cases where the model behaves differently after compression because something subtle got dropped from an earlier turn.

Coding agents are amazing until you realize how much token waste is hidden in the background. not just the actual code changes, but repeated tool context, long outputs, noisy history, and models explaining things I didn't really need explained. The TSR idea is the most interesting part for me. with more MCP tools connected, the "tool surface" can get huge very fast, and sending irrelevant tools to the model on every request feels like exactly the kind of invisible cost that compounds.
I also really respect that you published the methodology instead of just saying "up to 50% cheaper". paired tests, cache nonces, and decomposition by technique makes the claim much easier to trust. Curious how you think about the risk side of TSR. if the classifier removes a tool that would've been useful later in the task, can the agent recover, or does it need to restart with a wider tool set?

About Edgee Claude Code Compressor V2 on Product Hunt

“Fewer tokens, same context, 50% cost reduction”

Edgee Claude Code Compressor V2 launched on Product Hunt on July 6th, 2026 and earned 168 upvotes and 26 comments, placing #5 on the daily leaderboard. Compression V2 cuts coding-agent token bills with three techniques across two layers: sharper tool result trimming, new task-aware tool surface reduction, and output brevity. Drop-in for Claude Code, Codex, OpenCode, and Cursor. Semantically lossless.

Edgee Claude Code Compressor V2 was featured in API (98.3k followers), Developer Tools (515.2k followers) and Artificial Intelligence (472.8k followers) on Product Hunt. Together, these topics include over 190k products, making this a competitive space to launch in.

Who hunted Edgee Claude Code Compressor V2?

Edgee Claude Code Compressor V2 was hunted by fmerian. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

Reviews

Edgee Claude Code Compressor V2 has received 2 reviews on Product Hunt with an average rating of 5.00/5. Read all reviews on Product Hunt.

Want to see how Edgee Claude Code Compressor V2 stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.