Product Thumbnail

Edgee

The AI Gateway that TL;DR tokens

Software Engineering
Developer Tools
Artificial Intelligence

Edgee compresses prompts before they reach LLM providers and reduces token costs by up to 50%. Same code, fewer tokens, lower bills.

Top comment

This looks amazing, @gilles_raymond ! Reducing token costs by 50% is a game changer for anyone building agents for big audience 🤯 Question: How does the compression impact the latency for real-time applications? Congrats on the launch!

Comment highlights

The idea is very interesting. But how does it work?

For example, I have a travel AI — essentially a wrapper around ChatGPT and Gemini. Some of the prompts are huge. How would you reduce the number of tokens? Would you compress my prompts? But that could affect quality.

Could you suggest where something can be replaced with free or cheaper tools? But then you would need to know our product no worse than we do… How do you do that?

token compression at the gateway level is a smart approach. i've been watching my AI API costs climb across multiple projects and this is exactly the kind of infra that makes shipping AI features viable without stressing about the bill

Impressed by the edge-native architecture with 100+ PoPs and the token compression approach.

I noticed Edgee is built with Claude Code. For developers using AI coding agents (Claude Code, Cursor, etc.) that make heavy API calls during development, does Edgee support integration at the agent workflow level? Specifically, can we route AI agent requests through Edgee to compress tool call contexts and reduce token consumption during iterative coding sessions?

Would like to see benchmarks across different model providers and prompt types. If the compression holds under real production loads, this could become default infra in most LLM stacks.

Congrats on the launch! Will definitely be following this project closely. I've always thought there should be a way to more efficiently provide prompt for LLMs, especially when the latest models consume a lot of them for complex work. Hopefully this will eventually result in less usage rate and higher limits.

I've been waiting to see companies start tackling this issue. Cost and efficiency are going to be increasingly important once AI platforms are increasingly pressured for revenue.

Love this! Congrats @sachamorard - Great onboarding XP and managed to get going in <5' we will do ❤️. Curious whether and how we can control the compression level and adjust based on endpoints or use case as I imagine there's a quality trade-off?

Hey, this is interesting! I was wondering if the prompt optimisations that you're doing are deterministic, as the first layer of cost improvement is caching we having a long conversation with LLM you need to cache, so the prompt compaction need to be deterministic and stable whatever happens.

Second point how do handle different model providers API interfaces? Do you support SSE? Did you reimplemented your own layer between Edgee SDK and LLM providers? There are so many edge cases with each provider when it comes to streaming + tools + reasoning tokens, etc.

Congrats on the launch !

LLM's costs are going crazy here, I definitetly give it a try

Cool idea! Do you get transparency into how prompt was trimmed/manipulated so you can ensure nothing was missed?

Gateways can become a new reliability and latency bottleneck: what’s Edgee’s architecture for keeping p95/p99 overhead low (especially for streaming and agent tool-call loops), and how do you handle failure modes like retries causing traffic spikes or provider brownouts?

Token costs are the new database query problem. This feels like the right abstraction layer.

How's the latency impact in practice?

Sounds amazing, can’t wait to plug it with Tellers!! Congratulations for the launch @sachamorard 🚀 Congratulations @picsoung for yet another successful hunt 😃

This would be game-changing for our margins. Does the compression work for both prompts and completions?

@sachamorard Token costs are definitely becoming a real problem once prompts get large (RAG, tools, agents…).

Curious how you handle compression without breaking output quality, especially for structured outputs?

Go edgee! Would love to know if you handle MCP and Tool usage optimisations? It's a real pain for long running agents

Love the focus on production problems vs demo features. Does the cost tracking integrate with existing observability tools (DataDog, etc.)?