DataSieve helps you turn unstructured text into clean, usable data in seconds. Drop in text, files, folders, or even archives, and extract what you need in one pass. Emails, phone numbers, URLs, dates, financial data, and more. Everything runs locally on your device, with no cloud and no tracking. What you can do - Extract multiple data types at once - Process text, PDFs, EPUBs, CSV, JSON, Word files, and more - Export results to JSON, XLSX, DOCX, and more - Define your own custom extractors
Hey everyone,
I’m the developer behind DataSieve (previously TextMine). This update has been a big step forward compared to the first version.
The main focus for 2.x was flexibility and scale. Being able to scan folders and archives, and define custom extractors, makes it much more useful for real workflows instead of just one-off text inputs.
I also spent time improving extraction accuracy for more complex data types like financial info and international formats.
Happy to answer any questions, and I’d really appreciate any feedback, especially around usability and edge cases.
Nice — structured data extraction is one of those problems that sounds simple until you actually try it. How does it handle ambiguous fields? For example, does it distinguish between a phone number and a fax number in unstructured text? Asking because I work on a similar challenge with voice-to-form mapping.
Hi Alberto, I like your idea of running everything locally. Is the list of attributes to extract static, or can I define custom ones?