Tessl helps developers evaluate and optimize agent skills, so you focus on building with smarter AI agents instead of fixing bugs and hallucinations - no signup required ➡️ tessl.io/registry/skills/submit
Guypo here, founder of Tessl (previously founded Snyk).
Today, I’m excited to announce that you can evaluate your skills and optimize them on Tessl. This means you can stop debugging agent output and start shipping quality code, faster: https://tessl.io/registry/skills/submit
Agent skills help agents use your products, build in your codebase and enforce your policies.
They're the new unit of software for devs - but most are still treated like simple Markdown files copied between repos with no versioning, no quality signal, no updates.
Without AI evaluations, you can’t tell if a skill helps, provides minimal uplift or even degrades functionality. You spend your time course-correcting agents instead of shipping.
Tessl is a development platform and package manager for agent skills. With Tessl, we were able to evaluate and optimize ElevenLabs' skills, 2x'ing their agent success in using their APIs.
If you are building a personal project, maintaining an OSS library, or developing with AI at work, you can now evaluate your skill and optimize it to help agents use it properly.
What skills are you working on, and what's your use case for them?
This feels like the missing layer in the agent stack.
Everyone’s shipping “skills” but very few are measuring whether they actually improve outcomes. The versioning + evaluation angle makes a lot of sense.
Curious how you think about benchmarking across models? A skill might behave very differently between Claude / GPT / open models.
Congrats on the launch — this could quietly become core infra for serious agent teams.
I've been building with Claude Code and the difference between a well-written skill/instruction set and a mediocre one is night and day. The ElevenLabs case study is a compelling proof point. Most people are still treating agent instructions as an afterthought, just a markdown file in the repo. The idea that you can actually evaluate and iterate on them like any other piece of software makes a lot of sense.
Congrats on the launch! Excited to see where this goes.
The eval-driven approach makes sense. Most teams copy skill files across projects and hope they still work after a model update - there's no feedback loop telling you the context degraded. Having structured evals that catch regression before it hits production is the missing piece.
Curious about the version compatibility matrix. When a new model version drops (say Claude Opus to Sonnet), how granular is the eval detection? Does it flag per-skill degradation or just overall task completion changes? The 1.8-2X performance numbers are compelling but I'd want to know which skills contributed most vs which ones were noise.
Really strong launch. The "package manager for agent skills" framing is exactly where teams are heading as multi-agent workflows get real.
What stood out to me is the eval + optimization loop: most teams can feel output drift but can’t isolate whether the issue is model choice, prompt context, or skill quality. If Tessl can make that diagnosis explicit (before/after score deltas per skill revision), that’s high leverage for shipping faster with fewer hallucination regressions.
Curious if you’re planning CI hooks so teams can gate skill changes on eval thresholds the same way we gate tests/lint in code pipelines.
The "package manager for agent skills" framing clicks immediately, especially coming from the Snyk founder. The dependency management and security signal problem in traditional code is exactly what's now happening with agent skills, and most teams don't have the tooling to even see it yet.
The ElevenLabs 2x result is a concrete proof point that avoids the usual vague benchmark claims. That kind of before/after is what actually convinces teams to adopt a new tool in their workflow.
I use Claude Code daily for building my own AI platform and the skill quality problem is very real. You genuinely can't tell if a skill is helping or quietly degrading outputs without proper evals. This fills a gap that's been easy to ignore until it hurts. Congrats on the launch!
Agent evaluation is the part of the AI workflow that still feels unsolved — deterministic tests don't translate well when your output is non-deterministic by design. Curious how Tessl approaches defining "skill" for an agent: is it task completion rate, output quality scoring, or something closer to behavioral alignment? The 3x better code claim is a big statement, but if the eval layer is solid, the compounding effect on code quality could absolutely get there.
Amazing product and outstanding team behind it. Love what you're all building!
This seems so useful as skills become core to building with AI. Very needed service.
This is the right tool at the right time! The eval and optimize functions are clearly what skills creators need right now to test and validate their skills - great job, Tessl!
Tools like Tessl help bring the engineering mindset to context engineering. It's like Grammarly for skills , something actionable , finally we can go beyond the simple "vibe check".
Very relevant and important in the new agentic sdlc, skills and context are key.
Good tool team! Currently working across documentation mainly. Just tested out Tessl, very easy to use, good user experience!
They're pioneers in the AI industry, and active contributors by maintaining AINativeDev and organizing the AI Native DevCon. So, when the team reached out for this launch, I was super pumped.
@Tessl is a package manager for agent skills. It helps you find, install, and evaluate capabilities for your coding agents. It's the right direction. In a recent thread, [1] we discussed best practices to get the most out of @Claude Code. Above all? Run more agents in parallel. @Tessl teaches them coding best practices, raising the quality of the outputs.
Hey Product Hunt! 👋
Guypo here, founder of Tessl (previously founded Snyk).
Today, I’m excited to announce that you can evaluate your skills and optimize them on Tessl. This means you can stop debugging agent output and start shipping quality code, faster: https://tessl.io/registry/skills/submit
Agent skills help agents use your products, build in your codebase and enforce your policies.
They're the new unit of software for devs - but most are still treated like simple Markdown files copied between repos with no versioning, no quality signal, no updates.
Without AI evaluations, you can’t tell if a skill helps, provides minimal uplift or even degrades functionality. You spend your time course-correcting agents instead of shipping.
Tessl is a development platform and package manager for agent skills. With Tessl, we were able to evaluate and optimize ElevenLabs' skills, 2x'ing their agent success in using their APIs.
If you are building a personal project, maintaining an OSS library, or developing with AI at work, you can now evaluate your skill and optimize it to help agents use it properly.
What skills are you working on, and what's your use case for them?