Crawlee for Python
Build reliable scrapers in Python
Open Source
Growth Hacking
Developer Tools
GitHub

Featured onJuly 9th, 2024

Shipixen

Go from nothing → deployed Next.js codebase in minutes • Sponsored

Get Shipixen ⧉

Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Crawlee for Python

Build reliable scrapers in Python

We are launching Crawlee for Python, an open-source library for web scraping and browser automation. Quickly scrape data, store it, and avoid getting blocked, headless browsers, and smart proxy rotation.

Top comment

Upvotes178

▲ 178View on ProductHunt ⧉

Comments30

30 commentsSee comments on PH ⧉

Product of the Day12nd

Hello Hunters and Makers, I am Saurav, Developer Community Manager of Apify, the company building Crawlee. I am happy to hunt Crawlee for Python today. We launched (Crawlee) in August 2022 and received an amazing response from the community, as well as continuous demand for building it in Python. Finally, after a lot of hard work from our team, we are launching Crawlee for Python today. It has all of these features: - Unified interface for HTTP & headless browser crawling. - Automatic parallel crawling based on available system resources. - Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking). - Automatic retries on errors or when you’re getting blocked. - Integrated proxy rotation and session management. - Configurable request routing - direct URLs to the appropriate handlers. - Persistent queue for URLs to crawl. - Pluggable storage of both tabular data and files. - Robust error handling. Why use Crawlee rather than Scrapy? - Crawlee has out-of-the-box support for headless browser crawling (Playwright). - Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code. - Complete type hint coverage. - Based on standard Asyncio. Please pass on your feedback and thoughts in the comments below!

Crawlee for PythonBuild reliable scrapers in PythonOpen SourceGrowth HackingDeveloper ToolsGitHub

Product upvotes and comments

Product vs the next 3

Top comment

Crawlee for Python
Build reliable scrapers in Python
Open Source
Growth Hacking
Developer Tools
GitHub