DeepTagger is a no-code platform that makes your judgment scalable. It uses your annotations as an example to extract information from new documents. Highlight what matters to you once, and let DeepTagger handle the rest with precision. API access included.
While analyzing the Enron Email dataset for a PhD project, we needed to extract data from hundreds of thousands of emails in various formats, and then trace chains that included incidents of “knowledge hiding.” But we got stuck on the very first task: splitting long email chains into individual emails.
Custom Python parsers failed.
RegEx broke.
Traditional ML tools, such as spaCy Prodigy, or Label Studio, couldn’t handle the complexity 🤯 Doing it manually would have meant admitting defeat.
So we built our own annotation tool that could handle nested data structures 🛠️. However, even with perfect annotations, traditional models couldn’t generalize — the data was too diverse, and the examples were too few.
Then OpenAI posted "Introducing Structured Outputs in the API," and everything clicked ⚡ Our annotations became few-shot examples instead of training data. ✅ No model training needed — just smart prompting.
That’s when we realized this could compete with traditional OCR tools by offering a completely different experience.
A few months of polish later… Deeptagger was born 🚀 Hope you love it! ❤️
PhD projects always seem to turn into building the tools you wish existed.
I totally feel the pain of custom parsers failing on messy data. We process millions of social media posts for influencer analysis and the format inconsistencies are a nightmare. RegEx works until it spectacularly doesn't.
Quick question though - how does it handle really domain-specific annotation tasks? Like if I need to extract sentiment and engagement metrics from Instagram comments in different languages, can it adapt to those custom categories pretty easily?
Really impressed with how easy it is to turn unstructured documents into structured data. The interactive labeling feels very intuitive!
DeepTagger makes your judgment scalable, just highlight what matters once, and it learns to extract that info from new documents with precision. No code, no repetition. Your annotations become automation. API access included.
@talshyn very well done. This platform sounds soo appealing. Best of luck!
This is a truly valuable product 👏🏼 I’m certain it will be of great use. Wishing you every success!
Wow congratulations! I’m gonna try myself and suggest to my accountant friend who are working with different forms of invoices. Wish you a great success 🙏🏻
@talshyn Very exciting product, does the ocr engine parse any annual reports/ and investor decks available online?
Wow! I like it! Congrats on the launch! What are the pricing options?(no info on pricing page)
Besides CVs. What are the the top 3 other document types people use this for?
Good job! Splitting messy email chains is such an underrated nightmare. Amazing to see you tackle it head-on with structured outputs.
Congrats on the launch @talshyn Really impressive work!
How do you see it competing with traditional OCR and annotation tools?
I like how DeepTagger makes document tagging easier for non-technical users. Do you plan to add integrations like Google Drive?
this is so dope. always thought doc annotation was stuck in the stone age, but y’all just fast-forwarded it. congrats on the launch, hope the PH crowd shows you some love!
Congratulations to the DeepTagger team on the launch! The no-code approach to turning documents into structured data with interactive labeling is really exciting. I’m particularly interested in trying it out for scaling annotation workflows, and the API access makes it even more powerful. Looking forward to exploring its potential, wishing you great success!
This product was born out of real-life problems 📨
While analyzing the Enron Email dataset for a PhD project, we needed to extract data from hundreds of thousands of emails in various formats, and then trace chains that included incidents of “knowledge hiding.”
But we got stuck on the very first task: splitting long email chains into individual emails.
Custom Python parsers failed.
RegEx broke.
Traditional ML tools, such as spaCy Prodigy, or Label Studio, couldn’t handle the complexity 🤯
Doing it manually would have meant admitting defeat.
So we built our own annotation tool that could handle nested data structures 🛠️. However, even with perfect annotations, traditional models couldn’t generalize — the data was too diverse, and the examples were too few.
Then OpenAI posted "Introducing Structured Outputs in the API," and everything clicked ⚡
Our annotations became few-shot examples instead of training data.
✅ No model training needed — just smart prompting.
That’s when we realized this could compete with traditional OCR tools by offering a completely different experience.
A few months of polish later… Deeptagger was born 🚀
Hope you love it! ❤️