Back to Class Links

Aaron Cayard-Roberts
Summer Journal 2001


5/21/01 Monday

Today I finished working on the perl script that will let iostone run in parallel with other iostone processes and write to the same output file. This includes executing the program in temp directories so that the temp files iostone makes do not over right each other, and cleaning up those directories. I also modified iostone so that it has a file lock on the output file using fcntl so that it will wait for the file to become free then write to it. I also started working on the perl script for the dd command. So far it will execute the dd command on a file called large_file.dat which is created with a small c program I wiped up. This will let us easily run dd on the pvfs without having to move around some large file to do the dd tests on all the time.

Rest of week

I worked on the perl scripts some more so that they could be run buy ether data limiting or time limiting. This alows for the test to show how much data could be read/rewriten in a given time and how long did it take to read/rewrite a given amount of data.

After that I started running some tests on Redhat(acl3) and FreeBSD(quark) and got results that made it look like FreeBSD ran things in parellel much better then redhat did. We then did some test on the sandbox cluster which is running debian. The test were only proformed on the local file systems to give us a general idea how the operating system handles lots of processes in parellel. This will give us some reference on how to interpret the data we get from running the benchmark programs on the pvfs. We are waiting right now for access to a sun station to see how it will do with the benchmarking program on it.

For the remainder of the week I mostly tried to learn some of the stuff Ned was doing with system imager and helped hassan out some with his graphing project...he didn't know perl very well and it was in cgi.I also took part in the work we did trying to figure out wich variables would be good for the graph and how to present them.

5/28/01 Monday

I started out the day verifying the remaining acl hard drives while I was the only one here. Ned's got errors.

Me and Hassan installed FreeBSD on an acl computer to test the benchmarking

5/29/01 Tuesday

Hassan and I got pvfs up and running again on the pentagon cluster (sandbox). We really renamed the sandbox to pentagon0-4. We also did some mibs discution...Hassan and I also changed cricket so that the sand box is in pentagon form now. Modified my benchmarking programs to work on the pvfs....started 10 iostones on the pvfs to run untill 8:00 tomorrow....I'm not sure if it will work though because we started getting some weird read/write errors...we'll see how it goes.

5/30/01 Wenesday

Today we got some data back from the iostones running and decided that the pvfs was chokeing some. I played around with it some more and I think that I've found out that the random generation of the filenames was slowing it down to much so I converted it back to the way that it used to be (since I solved the multiple programs running thing by haveing them execute in different dirrectories, so they wouldn't delete eachothers files). I started a smaller one which will finish in a little while. After its done and I don't get any errors I'm going to start another long one up for the night.

We've had a little trouble getting iostone to work on the FreeBSD install we did on one of the acl macheins. It seems to not have a header that it needs (for ftime)...we will be looking into this now that we figured out how to update the locate database.

I put Athenaa back together and its ready to have a fresh install done on it (since it did a little kernal panic when I tried to start it up).

Me and Hassan were also playing around with some of the cricket code and I found out what the link is to any single graph and also how to change the time scale of that graph, which will make putting multiple graphs from different computers on the same page much easier.

5/31/01 Thursday

We did some more runs with iostone and got back some more strange results...There were still some missing data outputs although there were no more errors (which are now going to a file so the program can be run remotely and we can still know if there were errors. Something else that seemed kind of odd was how slow the program seemed to be running. We were getting results that were about 1000X slower then what we would get on a straight drive. If this is really the case then we may have to rethink using the pvfs. More test to fallow...

We have also fixed the problem with compiling iostone under bsd...we have to include a -lcompat so that it would use libs that bsd thinks should be obsolete now. Running some test on it and compairing them to what we got on a linux box proved to be a little surprising...the bsd box didn't run the program much faster and in some cases did it slower....more test to fallow

1st - 8th

vacation....

6/11/01 Monday

After looking at the work that Hassan has been doing with the cricket graphs and talking about where that was heading for a while we got a call from charlie in NY. We talked for a while...After that we decided to change the root password and the insecure password. Ned got most of the places up to date and will get the rest soon. Then me and Ned went over to the ranch and picked up the ladder and the wireless hardware. We then set up the wireless stuff inside using the small antenna. We got that working then tried to get some of the wireless cards to work on some laptops... this is still being worked on.

6/12/01 Tuesday

We did some more testing with the wireless stuff. We tried using both of the aironet stations as a base to see if the original base was causing the slow connection to the laptops but that didn't make any difference. We then tried to get the other pcimca card to work thinking that it might be a compatability problem, but that didn't work ether since none of the laptops would use the aironet card...we think it might be to old of a card for the hardware to support it.

We also got more stuff running on the redhat 7.1 box. it now has dbd, dbi, and postgress running on it. We weren't sure what version of oracal to run so we e-mailed Dusko to find out. He said he had it on disk and he was interested in comeing over and installing it with us. He also told us that it needed X to run so we had to get X working on it.

We also got the hip data to start working again...postgres must have messed up when quark went down and it was no longer taking data from proto.

6/13/01 Wednesday

Dusko came over and we tried to get the orical cd working on the 7.1 box but for some reason it wasn't working. We tried lots of different stuff like changing the install directory and stuff like that. Hassan tried it on one of the 6.2 boxes and got the screen to come up...later that day he reinstalled the 7.1 using more packages and some patches which got the installer screen up, but it was late so he stopped there. I made a perl script that ran ifconfig trying to make it bring up the hypercube cards and give the the ip numbers that we wanted to assign them. The problem was after I finished it and started some testing I found that ifconfig didn't change the /etc/network/interface file, and for some reason when this happened NFS stopped working correctly. This took a long time to figure out since I started out using ssh so the problem wasn't all that apparent....I would just get kicked off when NFS stopped finding quark.

6/14/01 Thursday

Today I talked to Ned about what I found with the NFS and we talked about it for a while and then found that the best solution was to make the program just rewrite the /etc/network/interface file instead of running ifconfig. This seems to have fixed the problem. After rewriting a little of my code I moved on to routing. after working on it a while me and Ned decided the best thing to is to draw a bunch of pictures of it :) So there are two different drawings of the athena hypercube now...in color. Using this and my routing program from networks I came up with a routing program....though there were lots of setbacks. First we thought that my program was in error because we expected it to give the next computer in the route....what I later found out after lots of pains taking research was that was the wrong expectation. What it does is give the path name to the next computer out of the 4 total paths....which means it gives which Ethernet card it need to depart on. This turned out to be very good since the system call route needs the departing dev. The next setback was I though I would need to know the ip of the card that the info will be arriving on in the computer its trying to get to. This was going to be a big pain and I worked on it for a long time before Ned pointed out that we could assign the computer name to all of the ip's in that computer so that I would just have to specify which computer the data needed to get to and just let named figure out the ip it needed. So I'm now ready for some testing on the cluster to see if this stuff really works. Ned has gotten three of the athena's up and running using the new image so I'll start doing that tomorrow.

6/18/01 Monday

Not a whole lot happened today...we got lists straighened out and everyone up to date on whats been happening and what needs to happen. Ned and I played around with C3 and got it working on the athena cluster...which is very nice I might add. Hassan and Abby tried to blame my program for there ftp transfer problems...for some reason the ftp transfere is making that file larger then it started....and thats my fault. I also helped abby with the twisted wire thing for the direct ftping. Hassan, Ned, and I came up with some really neat idea's for the C3 and snmp. Ned would like to use the C3 on the acl's but it does bad things if some of them are out...so we think we could write a perl script that uses snmp to find out which acl's are up then right that to a file which C3 can use at the list of computers to proform its operations on. I also realized that my program could use a little updating so that it figures out which athena it is running on so that it can be executed using C3 instead of haveing to be executed on a single computer at a time.

6/19/01 Tuesday

Spent a lot of time try to get my program to figure out which host it was running on today. Tried several different ways and finally got one that works. I also put a host name search to make sure that the word athena shows up in it so it will not try to change stuff on a none athena computer. Pointed hassan in the right direction with some of his perl stuff. Moved some things around so Abby could have a computer and desk in the front room. Worked with Ned getting C3 to work on the acl's....used my computer as the one to test the commands on...found what appears to be a small bug in C3 when using the cshutdown command to shut a computer down. because we have remote X forwarding on for the acl using ssh when we ran the c3 shutdoean

6/20/01 Wendesday

I started the day out by trying to get my code for they hypercube routing to make the destination of the packet the ip number of the card it will be arriving on and not just the athena name (which didn't work). To do this I had to carry out the entire route in code for each route command which there are 15 of per computer. I finally go that working. Then me and ned decide to work on some psql stuff on the athena cluster. We keep running into problems which seemed to be somewhat unrelated to the ones that hassan was getting. We decide that it may have been a problem uncured by the routing tables I had made and decide to try and get that fixed before continuing with the psql configuration. After getting all of the computers configured and doing some tests that showed the packetswhere not getting to where they should have been we started using tcpdump trying to track the path. Then Ned came up with the idea that I think would explain it. Because the destination is not the computer its going though we think that they may be rejecting the packet on thefirst leg of the route. I'll talk to you about this on Thursday.

6/21/01 Thursday

Ned and I found a small bug in my code where a var wasn't getting reinitalized...I got that fixed then we tried getting the routeing stuff to work again...without any luck. We next tried changing files (/etc/network/options) to match the ones in noether, but that didn't help eather. After that we took a brake on that and tried to find out the block size for ext2..this took a little longer then it should have, mostly because we keep finding references to the max block size, not the default. We finally found it by running some kind of dump on the ext2 filesystem that told us lots of info about it, which said it was 4092bits or 4Kbytes so we changed the pvfs to use that. This doesn't seem to have helped the proformence out any though...at least based on hassan's first data from the database entry times. We then went back to working on the routeing...doing the kernel recompiling, which didn't seem to help eather. Ned just installed routed but we haven't configured it yet.

6/22/01 Friday

Today we sorted though the hardware list and distributed out some jobs for us to work on on fridays. Hassan and I decided to work on the UV sensor that I had worked on for my final project in EI. It is at the point where most of the hardware should be done and we need to get the program working that we will use to find out whats coming in from the hardware via serial port. I finally found where the program was stashed at and copied it to mine and hassn's home directories. I then showed hassan what we had done, which was very little and didn't work correctly at that. We also spent a good amount of time searching for the block size that the ex2 file system uses and then change pvfs to match so that they are not fragmenting.

6/25/01 Monday

Today we did a list sort and report. I handed the iostone stuff over to abby and told her what I had changed on it and we talked about what still needed to be done...testing with a stand alone program on a local hard drive. Ned and I also did some more routing stuff with the athena cluster. We decided that we should try and specify the exact ip of the next computer in the path of the route as a gateway in the routing table, instead of just leaveing the default which we thought was anything. This seemed to work a little better. We started getting limited packet forwarding....what it seemed to do was send the packet though the first "gateway" but when it reached the next computer, weather it was a gateway or the destination host, it wouldn't continue on. We played around with this for quite a long time before we were convinced that we were completely baffaled.

After that I finished up the day by doing a little work on the perl script that would do the snmpwalk to find out which computers in a cluster were up so that C3 could be run on them.

6/26/01 Tuesday

The first thing i did today was start working on some little scripts that use snmp to probe a cluster and find out which of the computers are up. This creates a list of the computers which are running which can then be used by C3 as the computers to execute the commands on, instead of letting it try and execute on all of them, which takes a really long time.

When I finished one of these scripts for the acl and the athena clusters I started working on a bash script which starts up all of the pvfs components (pvfsd, iod, and mount.pvfs) at boot time on the athena cluster. While doing this I also tried doing some other various things on the routing for the athena cluster with Ned, with no luck. While working on the bash script I found out that the pvfsd will not start on the kernel that Ned made trying to get the ip forwarding to work....and since it didn't work we are going to revert back to the old kernel.

6/27/01 Wenesday

I finished up the bash script that does the start up of all of the pvfs components. I reconfigured the bash script on noether so that it doesn't start up iod and does start mgr...didn't find this till noether was rebooted and the pvfs didn't come back up. I also updated the admin page to include all of the tools that I've written so far for the acl and athena clusters. I put a copy of them in the ~admin/sysadmins/utils and made a link to utils in the www directory so the source could be displayed on the page. Ned and Hassan also came up with the idea that it would be cool if I changed my perl acl_up.pl to be cgi so we could see the status of the acl faster then dingman's pgp pinging program. The problem with this is it took me a long time to figure out what I needed to do with cgi....and my program writes a some files while its running...which doesn't mesh well with cgi...so I gave up for now.

6/27/01 Wenesday

I finalized the pvfsd bash script and put it in the image. Ned had reimaged them before I'd gotten it in so had to put it back on all of them. After that I pretty much just did reading on the folding@home page and some of the recomended reading that they had at the bottom of there page. Learned some interesting stuff about what it is we are doing...makes me feel a little better :)

6/28-29/01 Thursday & Friday

I did some more reading...I think I've gotten though all of the recommended reading sites that were located at the bottom of the folding at home web page. We also did some list management as a group. On Friday I tried to figure out what to do about the serial port code for the uv censor that I've been working on, but I still haven't made any break troughs.

7/2/01 Monday

Did even more reading about protein folding and gnomes. checked out the pvfs page for info about there scripts which are supposed to start the pvfsd stuff at boot time. They turned out to be less functional then what I've done. They just make links from the rc folder to the iod and pvfsd executables. mine will start, restart, and stop them and mount and unmount the /athena/pvfs1 "drive".

7/3,5/01 Tuesday & Thursday

Did a little more reading....did some meeting time reworking so now we all have a group time to meet and a group list of things to work on. We then re did the lists so we all had stuff to do again. Also did some work on the motion sensor with abby. Did a little looking around for there protein folding programs/projects with little success.

7/6/01 Friday

Did some reading of the TPC-C stuff. Had hardware meeting. Hassan really wanted to do some snmp stuff and Ned and Abby where having some trouble so I worked with them on the motion sensor.

7/9/01 Monday

Did research on trying to find other folding programs besides fah. Found lots of 3d rendering programs and physical research groups, but not a whole lot in the area of working programs. took a brake from that and we all took apart the pentagon and enlarged the sandbox to be 10 computers...in so doing we blow the breaker in the next trailer....we didn't notice that the power strip was plugged into the Athena's power and not this trailers power.

7/10/01 Tuesday

Started the day out trying to figure out why the cs subnet was completely flooded this morning...turned out to be the wireless bridges being plugged into the same subnet had cased it. Did more research for folding programs...I think I'm nearing the end of finding useful things in my search. Got the power back up to the Athena cluster

7/11-16/01 Thursday-Monday

Did all of the reading of the TPC benchmarks book that charlie wanted us to read. I also did some reading for the TPC-C stuff that was published by different companies....lots of this was info about there -great- computers ;) I've also been working on lots of other things on the side. Helping Chip out with different wireless stuff, playing with the motion sensor, and figuring out why quark went down (taking the cs subnet with it).

7/18/01 Wendesday

Finished skimming the Tpc-c docs. Started looking at the C code of Charlie's implementation trying to get an idea of what I'm going to need to do. Hassan and I tried many times to install FreeBSD 4.3 on one of the acl 22. Tried serveral times...we weren't able to get it to work using X so so just did an install without it and that seemed to work.

7/19/01 Thursday

Today we installed FreeBSD 4.3 on quark. We got it partly up and running buy the end of the day.

7/20/01 Friday

Ned and I looked over the tpc-c specs and code with Charlie and did a little more quark work so that we could do some work of our own.

7/23/01 Monday

I started out the day searching for some perl implementations of the tpc-c spec so we would not be working completely from scratch. Didn't find a whole lot...so we started working on our own...we converted the sql statements so that they were good for dbi. we then also used some of hassan's code so that we could execute this commands...droping the table then recreating it.

7/24/01 Tusday

Ned and I went though the code some more finding things that we didn't understand and just didn't get why they were implemented that way. We started coding our own version in perl starting from places where we know what was going on. We decided we needed a common directory (group) that we could all write to and get to easily which is /clients/tpc-c/. We started taking notes in a file called unknown-stuff-to-ask-charlie which is self explanatory.

7/25/01 Wednesday

We converted some more of the tpc-c benchmarking code...added more things to ask charlie file.

7/26/01 Thursday

We did the picture thing for the cs brochure. during this we didn't get a whole lot done...Afterwards Ned and I met with charlie and got to ask some of our questions. He was able to answer them..although it took him a little while on some of them. After that we worked on the code a little bit...figured out how to pass multiple return values (via hash) which solves the problem with gobals. At the end of the day we decided to go take a look at dennis and see how its going along.


7/27/01 Friday

Ned and I did some more code converstion.... We think we got all of the support functions working and now are moving on to the load.pl script. We also made a some "policy" desitions about the code design.

7/30/01 Monday

Ned and I did more converstions...we've gotten to the point where we are both formilure enough with the code to brake up and work on different protions of the code at the same time. Ran into some more small inconsistancies that lead us to beleive that this was not the final code line such as lack of variable and fuction declearation.

7/31/01 Tuesday

I did some work on some of our perl modules that are like SUPPORT.pm but for the transactions. I took part of the day off to go swimming...I did some more work afterwards.

8/01/01 Wednesday

Following charlie's recomandation Ned and I started focusing on the load.pl so that we would have the database up with variables in to so that we would have something to test the rest of the program (the transaction part) against once we had it coded and read to run. We think we got load.pl up and running at the end of the day. We split off some of the transaction starting parts of the program to work on next (DRIVER.pm and tpc-main.pl)

8/02/01 Thursday

Ned and I got tpc-main.pl and DRIVER.pm finished (still not debugged) and then decided to get one or two of the transaction functions that DRIVER.pm calls before we tried to run them (there not to big). I got NEWORD.pm finished so then we started trying to run it. Got some small syntax errors out of the way and then we came to a really weird error that neither one of us could make head or tails from...I have a feeling its being caused by something completely unrelated but its hard to say. The error has to do with all of the variables on two different lines...its something like "Global symbol "$w_id" requires explicit package name at DRIVER.pm line 114"

8/03/01 Friday

We got all of the known bugs out of tpc-main.pl and DRIVER.pm and NEWORD.pm. Still have to get the other transaction modules finished and debugged and then I think this thing will be about done. The error we ran into turned out to be another for loop with {} to give it a focus. Damn perl! I'm working on ORDSTAT.pm now...

8/06/01 Monday

We finished up coding the other transaction modules and fixed some other small bugs in DRIVER.pm that we ran into. ran some test runs...seems to be working.

8/07/01 Tuesday

We found out that there was another transaction function called delivery that was not fully implemented in charlie's code. we coded it up and then added it to the driver to run. We had a little problem with it because there were lots of cases where some of the select or statements that had several WHERE conditions where failing. I think we may have finally gotten this fixed by have it just return a status of 0 whenever this occurs(like other functions).We realized there my be an error in ordstat in how it goes about fetching its rows...still have to go over its out put to know for sure. Near the end of the day I also got color to work on quark (with ls) again and have the console term work for vi.

8/08/01 Wednesday

We spent the day modifying all of our transaction functions so that they would display everything they are entering and reading into the database when debug mode is on. We then when though turned the debugging on one at a time and started finding lots of little errors that resulted in database inconsistency. We did this for the rest of the day and still have a little left to go.

8/09/01 Thursday

Some of our debugging led us to an error in load.pl where we found some of the sql wasn't even being run so some data was never even being loaded into the database. We finally fixed this and got back to debugging the transaction modules.

8/10/01 Friday

We found out from charlie that we could execute each new piece of sql and then commit it once at the end of the fuction/program. We did some more runs which demonstrate that it was much faster with the commits happening only once in a while instead of all the time.

8/13/01 Monday

Charlie pointed out some new changes that we could make on the code which would speed it up considerably. This came about after it was discovered that the code was taking much more cpu power then the database was. So we started moving all fo the major random fuction calls outside of any loops making the number of times a rand is called much much less. This cause the database to become less random but charlie said that this was worth it. The other improvement that we made was to create the sql statments before any loops that they are called in and then to just use variables to plug the values needed at the last minute. This saves us from haveing to recreate the whole sql statement on every run of the loop. a small test run indicated that our changes would result in a code line that was at least 3X faster the before. If all seems to work out the plan is to move the changes over to the transaction fuctions as well. Near the end of the day we ran into another small problem. There were two sql statments where we were giveing the database a NULL value. Before the whole sql statment was just getting put into a whole variable and then executed so perl didn't even see the NULL. Now we are passing the values as arguments to a perl fuction and because perl doesn't know the var NULL we have to come up with a why around this.

8/14/01 Tuesday

Charlie solved our probelm with the null buy just making a var my NULL; without declaring it....doesn't that make it 0 not ""?...humm


Back to Class Links