PromptPerf lets you test a prompt across GPT-4o, GPT-4, and GPT-3.5 and compares results to your expected output using similarity scoring. Models change fast. Prompts break. This helps you stay ahead. Unlimited free runs. More models coming soon.
As an AI developer, I spend a lot of time running prompts across different models and configs, tweaking temperature, comparing outputs, and manually checking which one gets it right.
It’s repetitive. Time-consuming. And easy to mess up.
So I built PromptPerf -> a tool that tests a single prompt across GPT-4o, GPT-4, and GPT-3.5, runs it multiple times, and compares the results to your expected output using similarity scoring.
⚡ No more guessing which prompt or model is better ⚡ No more switching between tabs ⚡ Just clean, fast feedback and a CSV if you want it
This started as a scratch-my-own-itch tool, but now I’m opening it up to anyone building with LLMs.
Unlimited free runs. More models coming soon. Feedback shapes the roadmap.
Would love to hear what you think! Keen on feedback and help to ensure I build a product that solves your problems 👉 promptperf.dev
Really liked PromptPerf on Product Hunt — solid idea and super useful for anyone working with LLMs.
I’m Mohamed, a visual identity designer. Just wanted to say: the tool feels super smart and clean, and I think a bit of polish in the branding (logo, landing page visuals, etc.) could push it even further — especially for first-time users.
If you’re ever exploring tweaks in that area, happy to share ideas or help.
Here’s a quick look at my work: Behance Portfolio
Keep building — excited to see where PromptPerf goes!