Product Thumbnail

Crawlee for Python

Build reliable scrapers in Python

Open Source
Growth Hacking
Developer Tools
GitHub

We are launching Crawlee for Python, an open-source library for web scraping and browser automation. Quickly scrape data, store it, and avoid getting blocked, headless browsers, and smart proxy rotation.

Top comment

Hello Hunters and Makers, I am Saurav, Developer Community Manager of Apify, the company building Crawlee. I am happy to hunt Crawlee for Python today. We launched (Crawlee) in August 2022 and received an amazing response from the community, as well as continuous demand for building it in Python. Finally, after a lot of hard work from our team, we are launching Crawlee for Python today. It has all of these features: - Unified interface for HTTP & headless browser crawling. - Automatic parallel crawling based on available system resources. - Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking). - Automatic retries on errors or when you’re getting blocked. - Integrated proxy rotation and session management. - Configurable request routing - direct URLs to the appropriate handlers. - Persistent queue for URLs to crawl. - Pluggable storage of both tabular data and files. - Robust error handling. Why use Crawlee rather than Scrapy? - Crawlee has out-of-the-box support for headless browser crawling (Playwright). - Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code. - Complete type hint coverage. - Based on standard Asyncio. Please pass on your feedback and thoughts in the comments below!

Comment highlights

Scrape like a pro! This Python library tackles web scraping with ease. Headless browsers, proxy rotation, and built-in error handling make it a powerful tool for data collection projects.

Python's new secret weapon? Crawlee! Say goodbye to complex scraping scripts. Crawlee streamlines the process with a user-friendly interface and handles the heavy lifting for reliable data extraction.

Crawlee for Python Fast, reliable, modern. Build scalable web crawlers in Python with Crawlee. Enjoy type hints for fewer errors, leverage Playwright for advanced browser control, and conquer crawling challenges efficiently.

I'm interested in its compatibility with popular tools used in web development and data analysis.

I'd like to know more about it's security features and how it protects against vulnerabilities and data breaches.

I'm keen to learn about it's performance benchmarks and its speed compared to other scraping solutions.

I'm curious about its roadmap for future development and community contributions.

I'm eager to explore it's documentation to understand how well it supports developers at different skill levels.

I'm interested in hearing from early adopters about their experiences using it and any tips they have for maximizing its effectiveness.

One of Crawlee's standout features is its ability to navigate around anti-scraping measures employed by websites. This includes techniques to avoid detection by using randomized user agents, delays between requests, and intelligent handling of cookies.

Hey @sauain Excited to announce Crawlee for Python! This open-source library simplifies web scraping, browser automation, and data storage. Scrape efficiently, avoid blocks, leverage headless browsers, and enjoy smart proxy rotation

Congratulations on the launch! I love the seamless integration of a headless browser crawling with Playwright. This is fantastic for anyone looking to scrape dynamic content without the hassle of constantly adjusting for JavaScript rendering.

Crawlee for Python is a must-have for web scraping! The open-source library is a great addition. What challenges did you face during development?

Congrats on the launch team! I love to see core technical products making their way in the era of AI wrappers.

This looks like a powerful tool for web scraping and browser automation. How does Crawlee's proxy rotation and session management compare to other tools on the market? Any plans to add more integrations? Congrats on the launch, Saurav!

Congratulations on the launch🎉 Amazing work👏 Scraping in headless browser had so many gaps!