CS345 - Software Engineering - Spring 2005-2006
Charlie Peck and Chris Hardie
Department of Computer Science - Earlham College
Lab #4 - Data Stores
Last updated:
Friday, 10-Mar-2006 10:21:50 EST
Due in class on Tuesday March 14th, 2006.
This lab is designed to introduce you to using data stores. You can use either Postgres or MySQL as your RDBMS, details about how to connect to each of these on the CS server farm will be forthcoming.
The 30K' view of this lab goes like so:
- Design a data model to accomodate the core information found here: banana production and here: banana production per capita. You will need to include both the core data, e.g. country, amount, etc. and meta-data, e.g. source, last update date, etc. in your data model.
- Write a SQL script to create the schema for your data model in either a Postgres or MySQL RDBMS. Your script should handle both initial creation and re-creation, i.e. dropping and re-creating the objects. The username/password to connect to the database with should be a command line option.
- Write a script (Perl, PHP, C, assembly, whatever) that scrapes (not downloads or extracts from other URLs at that site) those two URLs and populates your schema. Your script should handle both the initial load and subsequent updates, i.e. delete what you have and reload it from the URLs, transparently. If you are using Perl then DBI/DBD is your friend, if PHP then the built-in database API, if C then libpq, if assembly then libpq. Think about having your script populate the source element of your schema. The username/password to connect to the database with should be a command line option.
- Write a script (Perl, PHP, C, assembly, whatever) that is a report for all the data elements of your schema, that is all the gross production and per capita figures. Include the last update date for each set of data. It's your job to envision the best way to organize the information at hand and write the code to display it in that form. This should be a command line utility that generates text to stdout, no web hu-ha or XML or other eye-candy, just the data displayed in an accesible form(s).
- Write a short overview of your solution including how to use it and any design or tool choices that you made. Using the descriptions of validation and verification found in the group project document describe how you would perform those two tasks on this particular data set and the software you wrote for this lab.
You will need to turn-in the following in class on Tuesday March 14th:
- A printout of your code and a printout of your write-up. Make sure your write-up includes usage information, command line options, etc. If you don't know how to do 2-up, 2-sided printing now would be a good time to do so.
- A tarball of your code and write-up and any other files a person will need to use it. Put this in ~charliep/homework/cs345/[username]-data-store.tar.gz
Extra Credit:
NationMaster also offers many data sets in downloadable or extractable forms. Explore other forms the banana data is available in and add support for populating the RDBMS schema from those with your script. Imagine a command line option retrieval_method=[scrape | extract]. Obviously the URL would be different depending on which choice the user made. It's up to you to ensure that there isn't an impedance mismatch.