In this lab, you will perform a simple version of the kind of data scraping (retrieval from third-party sources without a standard interface) and display necessary for the larger group project. Your work will make use of concepts covered in class related to design, interfaces (software and human), debugging, and testing.
Part A:
Design a piece of software that retrieves all of the street addresses of movie theaters operated by Kerasotes ShowPlace Theatres, LLC, as advertised on their website at http://www.kerasotes.com. Assume that the source HTML code of the site as of 2/21/06 will be the version you need to parse - you don't need to account for major changes in the layout or design of those pages.
Specifically, the software should retrieve (at minimum) into a well-designed runtime data structure of your choice:
The name of the theatre
The mailing address of the theatre
The name of the general manager of the theatre
Whether or not the theatre is "handicap accessible"
The URL of the theatre's details/showtimes page
The software should store the data in a file or some other (simple!) form of data storage.
Don't worry about checking for incremental changes or other inconcsistencies; do a "wholesale" replacement of any previous data set.
Your software should be smart enough not to replace any existing data with an emtpy data set (e.g. if the retrieval failes in some way)
Upon request (through a command line flag or otherwise) the software should provide feedback for reading by humans: the number of records retrieved, how long it took, and other data you think is relevant.
Set your software up so that it could be run on a regular basis (via cron or some other scheduling tool). If you actually put it in a cron schedule, make sure the interval is reasonable enough to be respectful of your data source while keeping the data current.
Part B:
Create a piece of software that reads the data set from the storage mechanism created in Part A, and puts it in a runtime data structure of your choice.
Using the Google Maps API, create a map that shows the location of the theatres operated by Kerasotes in the region.
The map should start out centered on Richmond, Indiana such that the city limits are visible
The map should display standard "pushpins" indicating the existence of a theatre location
When the user "rolls over" a location, the details (as retrieved in Part A above) of that location should be displayed, including a link to that location's showtimes URL
The Google Maps API key for cs.earlham.edu is
.
Create a user interface (i.e. a web page) that has the appropriate titling and documentation to make this map useful to someone looking for area theatres owned by Kerasotes. Assume that your user has only basic knowledge of the Google Maps system and how to navigate it.
Turn in:
The URL of your finished map interface
A copy of your "scraping" software and any accompanying documentation
A pointer to the electronic version of your source code, and to the data storage location you used
A one-page write up of the process you went through to design and implement the scraping software and the mapping interface.
Make notes about how you refined the interface, what sorts of debugging processes you went through along the way, and what third-party/peer resources you made use of to complete the lab.
For extra credit:
Refine the map interface to show an appropriate icon for theatres that have Digital Sound
Refine the "roll over" display to show the next three upcoming shows at that theatre