Web scraping with Python | Bytes for good

Here is a set of resources for scraping the web with the help of Python. The best solution seems to be Mechanize plus Beautiful Soup.

See also :

Off-topic : proxomitron looks like a nice (python-friendly ?) filtering proxy.

5 réflexions sur « Web scraping with Python »

Ping : AkaSig » Web scraping with python (part 1 : crawling)

Sig Auteur de l’article14/03/05 à

Check also Web scraping with Python (part II) where I present a python app that helps you scraping the web.

JohnMc 07/09/09 à

Beautifulsoup is ok, but it does have lexical issues on certain constructs. Nor it is consistent.

Something I find much better is lxml and PyQuery. lxml is extremely fast. PyQuery’s advantage is that if you know jQuery already you bypass the learning curve in using the tool.

Sig Auteur de l’article07/10/09 à

JohnMc, thanks for pointing to PyQuery and lxml. I was a bit concerned that lxml would not handle malformed HTML files. But it now can interface with BeautifulSoup and use its parsing abilities for malformed HTML files. Here is the lxml module for this :

http://codespeak.net/lxml/elementsoup.html

wordpress web scraper 30/07/11 à

wpgrab.com is the easiest web scraper

Les commentaires sont fermés.