Chunking heavily impacts the performance of your retrieval when dealing with LLMs. Preprocess split documents into optimal chunks of text. We split PDF and Office files based on the original document structure and content semantics.
👋Hello, Product Hunt community,
I hope you all are fine and feeling good😀,
I am Nicola co-founder at Preprocess. In 2018 I founded Pigro (https://pigro.ai/) with Nicolò. Thanks to our venture at Pigro.ai, we gained document chunking experience and decided to create Preprocess.
Preprocess is our solution for document preprocessing tailored for Large Language Models (LLMs).
Recognizing the challenges in document preprocessing for LLMs, we developed Preprocess to automate and optimize this critical step. Our goal is to provide a reliable, efficient, and easy-to-integrate solution that meets the diverse needs of our users.
Preprocess is ideal for data scientists, AI developers, and organizations implementing Retrieval-Augmented Generation (RAG) systems. It simplifies the ingestion pipeline, allowing you to focus on building intelligent applications without the hassle of manual preprocessing.
Key Features 🛠️
- Intelligent Parsing and Chunking: Automatically processes various document types, preserving the original structure and semantics.
- High-Quality Table and Image Extraction: Accurately extracts and formats tables and images for seamless integration.
- Support for Multiple Formats: Handles PDFs, Word documents, Excel sheets, presentations, HTML, and plain text files.
We offer a Free Tier that allows you to preprocess up to 10 documents per day, each up to 10 pages/credits, with no time limit. Our flexible credit-based model ensures you only pay for what you need.
We're committed to continuous improvement and would love your thoughts on Preprocess. Please share your experiences and suggestions to help us serve you better.
About Preprocess on Product Hunt
“Preprocess maximises RAG performances”
Preprocess launched on Product Hunt on March 3rd, 2025 and earned 105 upvotes and 7 comments, placing #13 on the daily leaderboard. Chunking heavily impacts the performance of your retrieval when dealing with LLMs. Preprocess split documents into optimal chunks of text. We split PDF and Office files based on the original document structure and content semantics.
On the analytics side, Preprocess competes within API, Artificial Intelligence and Data Science — topics that collectively have 568k followers on Product Hunt. The dashboard above tracks how Preprocess performed against the three products that launched closest to it on the same day.
Who hunted Preprocess?
Preprocess was hunted by Nicola Abbasciano. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
For a complete overview of Preprocess including community comment highlights and product details, visit the product overview.