Scrapy and friends

Open source at our heart

Where it all started

Make building spiders a breeze

Scrapy is an open source Python framework built specifically for web scraping by MDMS co-founders Pablo Hoffman and Shane Evans. Out of the box, Scrapy spiders are designed to download webpage data (HTML, JSON, XML…), parse and process the data and save it in any structured data format (e.g. CSV, JSON, XML).

Scrapy and friends

Open source at our heart

Scrapy boasts a wide range of built-in extensions and middlewares designed for handling cookies and sessions as well as HTTP features like compression, authentication, caching, user-agents, robots.txt and crawl depth restriction. It is also very easy to extend through the development of custom middlewares or pipelines to your web scraping projects which can give you the specific functionality you require.

Giving you the power of Data Extraction

Scrapy

Scrapy is our open source web crawling framework written in Python. Scrapy is one of the most widely used and highly regarded frameworks of its kind; very powerful yet easy to use

.

.

Spidermon

Spidermon is our battle-tested open source spider monitoring library for Scrapy.

.

.
.

DateParser

DateParser is our library for parsing human-readable dates and times. Supports 18 languages..

.

.
.

Eli5

A library for debugging machine learning classifiers and explaining their predictions.y.

.

.
.

Formasaurus

Formasaurus figures out the type of an HTML form using machine learning. Is it a login, search, sign up, password recovery, contact form, etc..

.

.
.

W3lib

W3lib provides a number of useful web-related functions for your web scraping projects..

.

.
.

ScrapyRT

ScrapyRT let’s you reuse your spider’s logic to extract data from web pages through a single HTTP request..

.

.
.

Queuelib

Queuelib lets you create disk-based queues in Python..

.

.
.

Parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors.

.

.
.

Cssselect

CSS Selectors for Python

.

.

.

Itemloaders

Library to populate items using XPath and CSS with a convenient API

.

.
.

Itemadapter

Common interface for data container classes.

.

.
.

Protego

A pure-Python robots.txt parser with support for modern conventions..

.


.
.

Price-parser

Extract price amount and currency symbol from a raw text string.

.

.
.

Spidermon

Parse numbers written in natural language.

.

.
.

Used by companies powered by data

Dev tools that make scraping easy

Scrapy Cloud

Scrapy Cloud is our battle-tested platform for running and managing web crawlers.

Easily build crawlers and deploy them instantly. Your spiders run in the cloud, scaling on demand from thousands to billions of pages.

MDMS API Enterprise

Supercharge Your Data Scraping Team. When data collection is too important to outsource, but laws, bans and proxies still keep you up at night. We have the perfect solution for you.


Software + Strategy = Zyte API Enterprise.

Any search engine data you need...seriously.