goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Tuesday, December 18, 2007

The Python Papers - Screen Scraping Article

The new issue of the Python Papers is out.  It includes a small article I wrote called: Screen Scraping Web Pages

The issue can be downloaded here:  The Python Papers, Volume 2, Issue 4 (pdf)

This tutorial shows how to programmatically retrieve a stock quote from Google Finance.  It uses Python's high level Web API and screen scraping with regular expressions.
#    Comments [2] |
Tuesday, January 15, 2008 5:20:04 AM (Eastern Standard Time, UTC-05:00)
Hi Corey,

I just followed your tutorial, thanks for submitting it! Just one question though, about the regex: why did you use the question marks in it?
To me it looks that

re.search('class="pr".*?>(.*?)<', content)

gives the same result for group(1) as:

re.search('class="pr".*>(.*)<', content)

Or do I overlook something?
Regards,

Roger
Tuesday, January 29, 2008 9:07:06 AM (Eastern Standard Time, UTC-05:00)
Hi.. sorry for the late reply.

The question marks in the regex means "non-greedy" matching. That basically means it will stop scanning once it hits its first match.

-Corey
Comments are closed.