Product Thumbnail

Context Gateway

Make Claude Code faster and cheaper without losing context

Developer Tools
Artificial Intelligence
GitHub

Context Gateway cuts latency and token spend for Claude Code / Codex / OpenClaw by compressing tool output while preserving important context. Setup takes less than a minute. Quality-of-life features: instant context compaction and setting spend limit in Claude Code.

Top comment

Hey all! 👋 We are releasing Context Gateway - our context compression proxy, which cuts token spend and improves accuracy/latency for Claude Code, OpenClaw, Codex, and other agents. We built it because agents struggle to efficiently manage lengthy context: each tool an agent calls can return thousands of redundant tokens, leading to unnecessary spend, higher latency, and lower generation quality. The proxy invisibly fixes that by compressing whatever context the agent has to deal with. We've also added a number of quality-of-life features, which are missing from Claude Code: instant context compaction (same /compact, but you don't wait for 3 minutes), setting the spend cap, sending Slack notifications, and more. We are open-sourcing everything except for the models we use for context compression, which are free to use during the launch. Excited to hear your feedback and the features you'd want next! 🚀

Comment highlights

Reducing latency and token spend while preserving context is a smart infrastructure move. How much impact are teams seeing once this is plugged into production?

Great team, great product - tons of potential for agentic workflows that deal with heavy context. 🚀

Oh man the instant compaction alone is worth it. I've been hitting /compact in Claude Code and just staring at the screen for like 3 minutes every time my context gets bloated. The spend cap + Slack notifications combo is also super practical, I've definitely had sessions where I looked away for a bit and came back to a surprisingly large bill lol

Really smart approach to a problem I hit constantly - agent tool calls returning massive outputs that bloat context and burn tokens. The instant compaction feature is clutch too, waiting 3 min for /compact in Claude Code always kills my flow. Curious how the compression models handle code-heavy outputs vs prose - do you see different compression ratios?

Congrats on the launch! Curious how the compression handles tool outputs that contain mixed content, structured data alongside verbose logs, for example. Does it preserve the structured parts reliably while trimming the noise, or is it more of a blunt summarization?