SmolDocling, from Hugging Face and IBM Research, is the ultra-compact (256M) open VLM for end-to-end document conversion. Extracts text, layout, tables, code, and more from images.
Check out SmolDocling, a new open-source vision-language model from Hugging Face and IBM Research! True to its name, it's incredibly small – only 256M parameters! – yet it's designed for full, end-to-end document conversion.
You feed it an image of a document page (a scanned PDF, a photo, etc.), and it outputs a structured representation (called "DocTags") that includes everything:
📝 Text (OCR): It extracts the text, of course. 📑 Layout: It understands the page layout (paragraphs, headings, lists, etc.). 📊 Tables: It extracts table structure and content. 💻 Code: It recognizes and formats code blocks (with indentation!). ➕ Equations: It handles mathematical formulas. 🖼️ Figures: It identifies figures and links captions.
The key is that it does all of this in a single model, end-to-end, unlike traditional approaches that use separate OCR, layout analysis, and table extraction tools. And it does it with a model that's tiny compared to most VLMs.
It's built on SmolVLM (also open-source) and achieves competitive results with models many times its size.
SmolDocling launched on Product Hunt on March 25th, 2025 and earned 158 upvotes and 4 comments, placing #12 on the daily leaderboard. SmolDocling, from Hugging Face and IBM Research, is the ultra-compact (256M) open VLM for end-to-end document conversion. Extracts text, layout, tables, code, and more from images.
On the analytics side, SmolDocling competes within Open Source, Artificial Intelligence and Development — topics that collectively have 540.3k followers on Product Hunt. The dashboard above tracks how SmolDocling performed against the three products that launched closest to it on the same day.
Who hunted SmolDocling?
SmolDocling was hunted by Zac Zuo. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Hi everyone!
Check out SmolDocling, a new open-source vision-language model from Hugging Face and IBM Research! True to its name, it's incredibly small – only 256M parameters! – yet it's designed for full, end-to-end document conversion.
You feed it an image of a document page (a scanned PDF, a photo, etc.), and it outputs a structured representation (called "DocTags") that includes everything:
📝 Text (OCR): It extracts the text, of course.
📑 Layout: It understands the page layout (paragraphs, headings, lists, etc.).
📊 Tables: It extracts table structure and content.
💻 Code: It recognizes and formats code blocks (with indentation!).
➕ Equations: It handles mathematical formulas.
🖼️ Figures: It identifies figures and links captions.
The key is that it does all of this in a single model, end-to-end, unlike traditional approaches that use separate OCR, layout analysis, and table extraction tools. And it does it with a model that's tiny compared to most VLMs.
It's built on SmolVLM (also open-source) and achieves competitive results with models many times its size.
You can try SmolDocling yourself here.