CS345 - Software Engineering - Spring 2005-2006
Charlie Peck and Chris Hardie
Department of Computer Science - Earlham College


[ main | syllabus | schedule | journals | resources | mailing list ]

Lab Assignment #2 - Data Scraping and Mapping

In this lab, you will perform a simple version of the kind of data scraping (retrieval from third-party sources without a standard interface) and display necessary for the larger group project. Your work will make use of concepts covered in class related to design, interfaces (software and human), debugging, and testing.

Part A:

  1. Design a piece of software that retrieves all of the street addresses of movie theaters operated by Kerasotes ShowPlace Theatres, LLC, as advertised on their website at http://www.kerasotes.com. Assume that the source HTML code of the site as of 2/21/06 will be the version you need to parse - you don't need to account for major changes in the layout or design of those pages.
     
  2. Specifically, the software should retrieve (at minimum) into a well-designed runtime data structure of your choice:
  3. The software should store the data in a file or some other (simple!) form of data storage.
  4. Upon request (through a command line flag or otherwise) the software should provide feedback for reading by humans: the number of records retrieved, how long it took, and other data you think is relevant.
     
  5. Set your software up so that it could be run on a regular basis (via cron or some other scheduling tool). If you actually put it in a cron schedule, make sure the interval is reasonable enough to be respectful of your data source while keeping the data current.

Part B:

  1. Create a piece of software that reads the data set from the storage mechanism created in Part A, and puts it in a runtime data structure of your choice.
     
  2. Using the Google Maps API, create a map that shows the location of the theatres operated by Kerasotes in the region.
    1. The map should start out centered on Richmond, Indiana such that the city limits are visible
    2. The map should display standard "pushpins" indicating the existence of a theatre location
    3. When the user "rolls over" a location, the details (as retrieved in Part A above) of that location should be displayed, including a link to that location's showtimes URL
    4. The Google Maps API key for cs.earlham.edu is .
       
  3. Create a user interface (i.e. a web page) that has the appropriate titling and documentation to make this map useful to someone looking for area theatres owned by Kerasotes. Assume that your user has only basic knowledge of the Google Maps system and how to navigate it.

Turn in:

  1. The URL of your finished map interface
  2. A copy of your "scraping" software and any accompanying documentation
  3. A pointer to the electronic version of your source code, and to the data storage location you used
  4. A one-page write up of the process you went through to design and implement the scraping software and the mapping interface.
    Make notes about how you refined the interface, what sorts of debugging processes you went through along the way, and what third-party/peer resources you made use of to complete the lab.

For extra credit:

  1. Refine the map interface to show an appropriate icon for theatres that have Digital Sound
  2. Refine the "roll over" display to show the next three upcoming shows at that theatre

     

 

[ main | syllabus | schedule | journals | resources | mailing list ]