SmolDocling, from Hugging Face and IBM Research, is the ultra-compact (256M) open VLM for end-to-end document conversion. Extracts text, layout, tables, code, and more from images.
Check out SmolDocling, a new open-source vision-language model from Hugging Face and IBM Research! True to its name, it's incredibly small – only 256M parameters! – yet it's designed for full, end-to-end document conversion.
You feed it an image of a document page (a scanned PDF, a photo, etc.), and it outputs a structured representation (called "DocTags") that includes everything:
📝 Text (OCR): It extracts the text, of course. 📑 Layout: It understands the page layout (paragraphs, headings, lists, etc.). 📊 Tables: It extracts table structure and content. 💻 Code: It recognizes and formats code blocks (with indentation!). ➕ Equations: It handles mathematical formulas. 🖼️ Figures: It identifies figures and links captions.
The key is that it does all of this in a single model, end-to-end, unlike traditional approaches that use separate OCR, layout analysis, and table extraction tools. And it does it with a model that's tiny compared to most VLMs.
It's built on SmolVLM (also open-source) and achieves competitive results with models many times its size.
I used Docling a couple of months ago, it was already cool, now this mini version sounds even cooler!
About SmolDocling on Product Hunt
“256M VLM for end-to-end document AI”
SmolDocling launched on Product Hunt on March 25th, 2025 and earned 158 upvotes and 4 comments, placing #12 on the daily leaderboard. SmolDocling, from Hugging Face and IBM Research, is the ultra-compact (256M) open VLM for end-to-end document conversion. Extracts text, layout, tables, code, and more from images.
SmolDocling was featured in Open Source (68.3k followers), Artificial Intelligence (466.2k followers) and Development (5.8k followers) on Product Hunt. Together, these topics include over 100.6k products, making this a competitive space to launch in.
Who hunted SmolDocling?
SmolDocling was hunted by Zac Zuo. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how SmolDocling stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Hi everyone!
Check out SmolDocling, a new open-source vision-language model from Hugging Face and IBM Research! True to its name, it's incredibly small – only 256M parameters! – yet it's designed for full, end-to-end document conversion.
You feed it an image of a document page (a scanned PDF, a photo, etc.), and it outputs a structured representation (called "DocTags") that includes everything:
📝 Text (OCR): It extracts the text, of course.
📑 Layout: It understands the page layout (paragraphs, headings, lists, etc.).
📊 Tables: It extracts table structure and content.
💻 Code: It recognizes and formats code blocks (with indentation!).
➕ Equations: It handles mathematical formulas.
🖼️ Figures: It identifies figures and links captions.
The key is that it does all of this in a single model, end-to-end, unlike traditional approaches that use separate OCR, layout analysis, and table extraction tools. And it does it with a model that's tiny compared to most VLMs.
It's built on SmolVLM (also open-source) and achieves competitive results with models many times its size.
You can try SmolDocling yourself here.