Thursday 16 May 2013

FREE! PHP IMDb Scraper/API for new IMDb Template

IMDb is undoubtedly the leading information source for media information and is the top target of web scraping for movie lovers around the world. Unfortunately IMDb does not provide an API to access its database so web scraping is the only resort for us. PHP being one of the most commonly used and powerful web development language enables easy web scraping with the power of PCRE (Perl Compatible Regular Expressions).

For my recent project on a Movie Catalog (http://movies.abhinayrathore.com), I needed a  IMDb scraper and found one built by Tyler Hall. His version was not robust enough to scrap all kind of movie pages so I extended it and made it more robust to support different type of titles, BUT recently IMDb changed its page template and most of the old scrapers stopped working including mine. So, I modified my scraper to accommodate the new template changes and considered it as my moral responsibility to contribute back to the developer community.

This new scraper is very robust and capable enough to handle a wide variety of new template modifications. Apart from the regular information it even goes deep to scan extra media images and release dates.


Source: http://web3o.blogspot.in/2010/10/php-imdb-scraper-for-new-imdb-template.html

No comments:

Post a Comment