Aaron Cayard-Roberts
Summer Journal 2001
5/21/01 Monday
Today I finished working on the perl script that
will let iostone run in parallel with other iostone processes and
write to the same output file. This includes executing the program
in temp directories so that the temp files iostone makes do not over
right each other, and cleaning up those directories. I also
modified iostone so that it has a file lock on the output file using
fcntl so that it will wait for the file to become free then write to
it. I also started working on the perl script for the dd command.
So far it will execute the dd command on a file called large_file.dat
which is created with a small c program I wiped up. This will let us
easily run dd on the pvfs without having to move around some large
file to do the dd tests on all the time.
Rest of week
I worked on the perl scripts some more so that they could be run buy ether data limiting or time limiting. This alows for the test to show how much data could be read/rewriten in a given time and how long did it take to read/rewrite a given amount of data.
After that I started running some tests on Redhat(acl3) and FreeBSD(quark) and got results that made it look like FreeBSD ran things in parellel much better then redhat did. We then did some test on the sandbox cluster which is running debian. The test were only proformed on the local file systems to give us a general idea how the operating system handles lots of processes in parellel. This will give us some reference on how to interpret the data we get from running the benchmark programs on the pvfs. We are waiting right now for access to a sun station to see how it will do with the benchmarking program on it.
For the remainder of the week I mostly tried to learn some of the
stuff Ned was doing with system imager and helped hassan out some with
his graphing project...he didn't know perl very well and it was in
cgi.I also took part in the work we did trying to figure out wich
variables would be good for the graph and how to present them.
5/28/01 Monday
I started out the day verifying the remaining acl hard drives while I was the only one here. Ned's got errors.
Me and Hassan installed FreeBSD on an acl computer to test the
benchmarking
5/29/01 Tuesday
Hassan and I got pvfs up and running again on
the pentagon cluster (sandbox). We really renamed the sandbox to
pentagon0-4. We also did some mibs discution...Hassan and I also
changed cricket so that the sand box is in pentagon form now.
Modified my benchmarking programs to work on the pvfs....started 10
iostones on the pvfs to run untill 8:00 tomorrow....I'm not sure if
it will work though because we started getting some weird read/write
errors...we'll see how it goes.
5/30/01 Wenesday
Today we got some data back from the iostones
running and decided that the pvfs was chokeing some. I played around
with it some more and I think that I've found out that the random
generation of the filenames was slowing it down to much so I converted
it back to the way that it used to be (since I solved the multiple
programs running thing by haveing them execute in different
dirrectories, so they wouldn't delete eachothers files). I started a
smaller one which will finish in a little while. After its done and I
don't get any errors I'm going to start another long one up for the
night.
We've had a little trouble getting iostone to
work on the FreeBSD install we did on one of the acl macheins. It
seems to not have a header that it needs (for ftime)...we will be
looking into this now that we figured out how to update the locate
database.
I put Athenaa back together and its ready to
have a fresh install done on it (since it did a little kernal panic
when I tried to start it up).
Me and Hassan were also playing around with
some of the cricket code and I found out what the link is to any
single graph and also how to change the time scale of that graph,
which will make putting multiple graphs from different computers on
the same page much easier.
5/31/01 Thursday
We did some more runs with iostone and got back
some more strange results...There were still some missing data
outputs although there were no more errors (which are now going to a
file so the program can be run remotely and we can still know if there
were errors. Something else that seemed kind of odd was how slow the
program seemed to be running. We were getting results that were about
1000X slower then what we would get on a straight drive. If this is
really the case then we may have to rethink using the pvfs. More test
to fallow...
We have also fixed the problem with compiling
iostone under bsd...we have to include a -lcompat so that it would use
libs that bsd thinks should be obsolete now. Running some test on it
and compairing them to what we got on a linux box proved to be a
little surprising...the bsd box didn't run the program much faster and
in some cases did it slower....more test to fallow
1st - 8th
vacation....
6/11/01 Monday
After looking at the work that Hassan has been
doing with the cricket graphs and talking about where that was
heading for a while we got a call from charlie in NY. We talked for a
while...After that we decided to change the root password and the
insecure password. Ned got most of the places up to date and will get
the rest soon. Then me and Ned went over to the ranch and picked up
the ladder and the wireless hardware. We then set up the wireless
stuff inside using the small antenna. We got that working then tried
to get some of the wireless cards to work on some laptops... this is
still being worked on.
6/12/01 Tuesday
We did some more testing with the wireless
stuff. We tried using both of the aironet stations as a base to see
if the original base was causing the slow connection to the laptops
but that didn't make any difference. We then tried to get the other
pcimca card to work thinking that it might be a compatability problem,
but that didn't work ether since none of the laptops would use the
aironet card...we think it might be to old of a card for the hardware
to support it.
We also got more stuff running on the redhat
7.1 box. it now has dbd, dbi, and postgress running on it. We
weren't sure what version of oracal to run so we e-mailed Dusko to
find out. He said he had it on disk and he was interested in comeing
over and installing it with us. He also told us that it needed X to
run so we had to get X working on it.
We also got the hip data to start working
again...postgres must have messed up when quark went down and it was
no longer taking data from proto.
6/13/01 Wednesday
Dusko came over and we tried to get the orical
cd working on the 7.1 box but for some reason it wasn't working. We
tried lots of different stuff like changing the install directory and
stuff like that. Hassan tried it on one of the 6.2 boxes and got the
screen to come up...later that day he reinstalled the 7.1 using more
packages and some patches which got the installer screen up, but it
was late so he stopped there.
I made a perl script that ran ifconfig trying to make it bring up the
hypercube cards and give the the ip numbers that we wanted to assign
them. The problem was after I finished it and started some testing I
found that ifconfig didn't change the /etc/network/interface file, and
for some reason when this happened NFS stopped working correctly.
This took a long time to figure out since I started out using ssh so
the problem wasn't all that apparent....I would just get kicked off
when NFS stopped finding quark.
6/14/01 Thursday
Today I talked to Ned about what I found with
the NFS and we talked about it for a while and then found that the
best solution was to make the program just rewrite the
/etc/network/interface file instead of running ifconfig. This seems
to have fixed the problem. After rewriting a little of my code I
moved on to routing. after working on it a while me and Ned decided
the best thing to is to draw a bunch of pictures of it :) So there
are two different drawings of the athena hypercube now...in color.
Using this and my routing program from networks I came up with a
routing program....though there were lots of setbacks. First we
thought that my program was in error because we expected it to give
the next computer in the route....what I later found out after lots of
pains taking research was that was the wrong expectation. What it
does is give the path name to the next computer out of the 4 total
paths....which means it gives which Ethernet card it need to depart
on. This turned out to be very good since the system call route needs
the departing dev. The next setback was I though I would need to know
the ip of the card that the info will be arriving on in the computer
its trying to get to. This was going to be a big pain and I worked on
it for a long time before Ned pointed out that we could assign the
computer name to all of the ip's in that computer so that I would just
have to specify which computer the data needed to get to and just let
named figure out the ip it needed. So I'm now ready for some testing
on the cluster to see if this stuff really works. Ned has gotten
three of the athena's up and running using the new image so I'll start
doing that tomorrow.
6/18/01 Monday
Not a whole lot happened today...we got lists
straighened out and everyone up to date on whats been happening and
what needs to happen. Ned and I played around with C3 and got it
working on the athena cluster...which is very nice I might add.
Hassan and Abby tried to blame my program for there ftp transfer
problems...for some reason the ftp transfere is making that file
larger then it started....and thats my fault. I also helped abby with
the twisted wire thing for the direct ftping. Hassan, Ned, and I came
up with some really neat idea's for the C3 and snmp. Ned would like
to use the C3 on the acl's but it does bad things if some of them are
out...so we think we could write a perl script that uses snmp to find
out which acl's are up then right that to a file which C3 can use at
the list of computers to proform its operations on. I also realized
that my program could use a little updating so that it figures out
which athena it is running on so that it can be executed using C3
instead of haveing to be executed on a single computer at a time.
6/19/01 Tuesday
Spent a lot of time try to get my program to
figure out which host it was running on today. Tried several
different ways and finally got one that works. I also put a host name
search to make sure that the word athena shows up in it so it will not
try to change stuff on a none athena computer. Pointed hassan in the
right direction with some of his perl stuff. Moved some things around
so Abby could have a computer and desk in the front room. Worked with
Ned getting C3 to work on the acl's....used my computer as the one to
test the commands on...found what appears to be a small bug in C3 when
using the cshutdown command to shut a computer down. because we have
remote X forwarding on for the acl using ssh when we ran the c3
shutdoean
6/20/01 Wendesday
I started the day out by trying to get my code
for they hypercube routing to make the destination of the packet the
ip number of the card it will be arriving on and not just the athena
name (which didn't work). To do this I had to carry out the entire
route in code for each route command which there are 15 of per
computer. I finally go that working. Then me and ned decide to work
on some psql stuff on the athena cluster. We keep running into
problems which seemed to be somewhat unrelated to the ones that
hassan was getting. We decide that it may have been a problem
uncured by the routing tables I had made and decide to try and get
that fixed before continuing with the psql configuration. After
getting all of the computers configured and doing some tests that
showed the packetswhere not getting to where they should have been we
started using tcpdump trying to track the path. Then Ned came up with
the idea that I think would explain it. Because the destination is
not the computer its going though we think that they may be rejecting
the packet on thefirst leg of the route. I'll talk to you about this
on Thursday.
6/21/01 Thursday
Ned and I found a small bug in my code where a
var wasn't getting reinitalized...I got that fixed then we tried
getting the routeing stuff to work again...without any luck. We next
tried changing files (/etc/network/options) to match the ones in
noether, but that didn't help eather. After that we took a brake on
that and tried to find out the block size for ext2..this took a little
longer then it should have, mostly because we keep finding references
to the max block size, not the default. We finally found it by
running some kind of dump on the ext2 filesystem that told us lots of
info about it, which said it was 4092bits or 4Kbytes so we changed the
pvfs to use that. This doesn't seem to have helped the proformence
out any though...at least based on hassan's first data from the
database entry times. We then went back to working on the
routeing...doing the kernel recompiling, which didn't seem to help
eather. Ned just installed routed but we haven't configured it yet.
6/22/01 Friday
Today we sorted though the hardware list
and distributed out some jobs for us to work on on fridays.
Hassan and I decided to work on the UV sensor that I had worked
on for my final project in EI. It is at the point where most of
the hardware should be done and we need to get the program
working that we will use to find out whats coming in from the
hardware via serial port. I finally found where the program was
stashed at and copied it to mine and hassn's home directories.
I then showed hassan what we had done, which was very little and
didn't work correctly at that. We also spent a good amount of time
searching for the block size that the ex2 file system uses and then
change pvfs to match so that they are not fragmenting.
6/25/01 Monday
Today we did a list sort and report. I
handed the iostone stuff over to abby and told her what I had
changed on it and we talked about what still needed to be
done...testing with a stand alone program on a local hard drive.
Ned and I also did some more routing stuff with the athena
cluster. We decided that we should try and specify the exact ip
of the next computer in the path of the route as a gateway in
the routing table, instead of just leaveing the default which we
thought was anything. This seemed to work a little better. We
started getting limited packet forwarding....what it seemed to
do was send the packet though the first "gateway" but when it
reached the next computer, weather it was a gateway or the
destination host, it wouldn't continue on. We played around
with this for quite a long time before we were convinced that
we were completely baffaled.
After that I finished up the day by doing
a little work on the perl script that would do the snmpwalk to
find out which computers in a cluster were up so that C3 could
be run on them.
6/26/01 Tuesday
The first thing i did today was start working
on some little scripts that use snmp to probe a cluster and find out
which of the computers are up. This creates a list of the computers
which are running which can then be used by C3 as the computers to
execute the commands on, instead of letting it try and execute on all
of them, which takes a really long time.
When I finished one of these scripts for
the acl and the athena clusters I started working on a bash
script which starts up all of the pvfs components (pvfsd, iod,
and mount.pvfs) at boot time on the athena cluster. While doing
this I also tried doing some other various things on the
routing for the athena cluster with Ned, with no luck. While
working on the bash script I found out that the pvfsd will not
start on the kernel that Ned made trying to get the ip
forwarding to work....and since it didn't work we are going to
revert back to the old kernel.
6/27/01 Wenesday
I finished up the bash script that does
the start up of all of the pvfs components. I reconfigured the
bash script on noether so that it doesn't start up iod and does
start mgr...didn't find this till noether was rebooted and the
pvfs didn't come back up. I also updated the admin page to include
all of the tools that I've written so far for the acl and athena
clusters. I put a copy of them in the ~admin/sysadmins/utils and
made a link to utils in the www directory so the source could be
displayed on the page. Ned and Hassan also came up with the idea
that it would be cool if I changed my perl acl_up.pl to be cgi so
we could see the status of the acl faster then dingman's pgp pinging
program. The problem with this is it took me a long time to figure
out what I needed to do with cgi....and my program writes a some files
while its running...which doesn't mesh well with cgi...so I gave up
for now.
6/27/01 Wenesday
I finalized the pvfsd bash script and put
it in the image. Ned had reimaged them before I'd gotten it in
so had to put it back on all of them. After that I pretty much
just did reading on the folding@home page and some of the
recomended reading that they had at the bottom of there page.
Learned some interesting stuff about what it is we are
doing...makes me feel a little better :)
6/28-29/01 Thursday & Friday
I did some more reading...I think I've
gotten though all of the recommended reading sites that were
located at the bottom of the folding at home web page. We also
did some list management as a group. On Friday I tried to
figure out what to do about the serial port code for the uv
censor that I've been working on, but I still haven't made any
break troughs.
7/2/01 Monday
Did even more reading about protein
folding and gnomes. checked out the pvfs page for info about
there scripts which are supposed to start the pvfsd stuff at
boot time. They turned out to be less functional then what
I've done. They just make links from the rc folder to the iod
and pvfsd executables. mine will start, restart, and stop them
and mount and unmount the /athena/pvfs1 "drive".
7/3,5/01 Tuesday & Thursday
Did a little more reading....did some
meeting time reworking so now we all have a group time to meet
and a group list of things to work on. We then re did the lists
so we all had stuff to do again. Also did some work on the
motion sensor with abby. Did a little looking around for there
protein folding programs/projects with little success.
7/6/01 Friday
Did some reading of the TPC-C stuff. Had
hardware meeting. Hassan really wanted to do some snmp stuff
and Ned and Abby where having some trouble so I worked with them
on the motion sensor.
7/9/01 Monday
Did research on trying to find other
folding programs besides fah. Found lots of 3d rendering
programs and physical research groups, but not a whole lot in
the area of working programs. took a brake from that and we all
took apart the pentagon and enlarged the sandbox to be 10
computers...in so doing we blow the breaker in the next
trailer....we didn't notice that the power strip was plugged
into the Athena's power and not this trailers power.
7/10/01 Tuesday
Started the day out trying to figure out
why the cs subnet was completely flooded this morning...turned
out to be the wireless bridges being plugged into the same
subnet had cased it. Did more research for folding
programs...I think I'm nearing the end of finding useful things
in my search. Got the power back up to the Athena cluster
7/11-16/01 Thursday-Monday
Did all of the reading of the TPC
benchmarks book that charlie wanted us to read. I also did some
reading for the TPC-C stuff that was published by different
companies....lots of this was info about there -great- computers
;) I've also been working on lots of other things on the side.
Helping Chip out with different wireless stuff, playing with the
motion sensor, and figuring out why quark went down (taking the
cs subnet with it).
7/18/01 Wendesday
Finished skimming the Tpc-c docs. Started
looking at the C code of Charlie's implementation trying to get
an idea of what I'm going to need to do. Hassan and I tried
many times to install FreeBSD 4.3 on one of the acl 22. Tried
serveral times...we weren't able to get it to work using X so
so just did an install without it and that seemed to work.
7/19/01 Thursday
Today we installed FreeBSD 4.3 on quark.
We got it partly up and running buy the end of the day.
7/20/01 Friday
Ned and I looked over the tpc-c specs and
code with Charlie and did a little more quark work so that we
could do some work of our own.
7/23/01 Monday
I started out the day searching for some
perl implementations of the tpc-c spec so we would not be
working completely from scratch. Didn't find a whole lot...so
we started working on our own...we converted the sql statements
so that they were good for dbi. we then also used some of
hassan's code so that we could execute this commands...droping
the table then recreating it.
7/24/01 Tusday
Ned and I went though the code some more
finding things that we didn't understand and just didn't get
why they were implemented that way. We started coding our own
version in perl starting from places where we know what was going
on. We decided we needed a common directory (group) that we could
all write to and get to easily which is /clients/tpc-c/. We started
taking notes in a file called unknown-stuff-to-ask-charlie which
is self explanatory.
7/25/01 Wednesday
We converted some more of the tpc-c benchmarking
code...added more things to ask charlie file.
7/26/01 Thursday
We did the picture thing for the cs brochure. during this we didn't get a whole lot done...Afterwards Ned and I met with charlie and got to ask some of our questions. He was able to answer them..although it took him a little while on some of them. After that we worked on the code a little bit...figured out how to pass multiple return values (via hash) which solves the problem with gobals. At the end of the day we decided to go take a look at dennis and see how its going along.
7/27/01 Friday
Ned and I did some more code converstion....
We think we got all of the support functions working and now are moving
on to the load.pl script. We also made a some "policy" desitions about
the code design.
7/30/01 Monday
Ned and I did more converstions...we've gotten
to the point where we are both formilure enough with the code to brake up
and work on different protions of the code at the same time. Ran into some
more small inconsistancies that lead us to beleive that this was not the
final code line such as lack of variable and fuction declearation.
7/31/01 Tuesday
I did some work on some of our perl modules
that are like SUPPORT.pm but for the transactions. I took part of the
day off to go swimming...I did some more work afterwards.
8/01/01 Wednesday
Following charlie's recomandation Ned and
I started focusing on the load.pl so that we would have the database up
with variables in to so that we would have something to test the rest of
the program (the transaction part) against once we had it coded and read to
run. We think we got load.pl up and running at the end of the day. We
split off some of the transaction starting parts of the program to work on
next (DRIVER.pm and tpc-main.pl)
8/02/01 Thursday
Ned and I got tpc-main.pl and DRIVER.pm
finished (still not debugged) and then decided to get one or two of the
transaction functions that DRIVER.pm calls before we tried to run them
(there not to big). I got NEWORD.pm finished so then we started trying
to run it. Got some small syntax errors out of the way and then we came
to a really weird error that neither one of us could make head or tails
from...I have a feeling its being caused by something completely unrelated
but its hard to say. The error has to do with all of the variables on
two different lines...its something like "Global symbol "$w_id" requires
explicit package name at DRIVER.pm line 114"
8/03/01 Friday
We got all of the known bugs out of
tpc-main.pl and DRIVER.pm and NEWORD.pm. Still have to get the other
transaction modules finished and debugged and then I think this thing will
be about done. The error we ran into turned out to be another for loop
with {} to give it a focus. Damn perl! I'm working on ORDSTAT.pm now...
8/06/01 Monday
We finished up coding the other transaction
modules and fixed some other small bugs in DRIVER.pm that we ran into.
ran some test runs...seems to be working.
8/07/01 Tuesday
We found out that there was another transaction
function called delivery that was not fully implemented in charlie's code.
we coded it up and then added it to the driver to run. We had a little
problem with it because there were lots of cases where some of the select
or statements that had several WHERE conditions where failing. I think
we may have finally gotten this fixed by have it just return a status of 0
whenever this occurs(like other functions).We realized there my be an error
in ordstat in how it goes about fetching its rows...still have to go over
its out put to know for sure. Near the end of the day I also got color to
work on quark (with ls) again and have the console term work for vi.
8/08/01 Wednesday
We spent the day modifying all of our transaction
functions so that they would display everything they are entering and reading
into the database when debug mode is on. We then when though turned the
debugging on one at a time and started finding lots of little errors that
resulted in database inconsistency. We did this for the rest of the day and
still have a little left to go.
8/09/01 Thursday
Some of our debugging led us to an error in
load.pl where we found some of the sql wasn't even being run so some data was
never even being loaded into the database. We finally fixed this and got back
to debugging the transaction modules.
8/10/01 Friday
We found out from charlie that we could execute each
new piece of sql and then commit it once at the end of the fuction/program. We
did some more runs which demonstrate that it was much faster with the commits
happening only once in a while instead of all the time.
8/13/01 Monday
Charlie pointed out some new changes that we could make
on the code which would speed it up considerably. This came about after it was
discovered that the code was taking much more cpu power then the database was. So
we started moving all fo the major random fuction calls outside of any loops making
the number of times a rand is called much much less. This cause the database to
become less random but charlie said that this was worth it. The other improvement
that we made was to create the sql statments before any loops that they are called in
and then to just use variables to plug the values needed at the last minute. This
saves us from haveing to recreate the whole sql statement on every run of the loop.
a small test run indicated that our changes would result in a code line that was at
least 3X faster the before. If all seems to work out the plan is to move the changes
over to the transaction fuctions as well.
Near the end of the day we ran into another small problem. There were two sql
statments where we were giveing the database a NULL value. Before the whole sql
statment was just getting put into a whole variable and then executed so perl didn't
even see the NULL. Now we are passing the values as arguments to a perl fuction and
because perl doesn't know the var NULL we have to come up with a why around this.
8/14/01 Tuesday
Charlie solved our probelm with the null buy just making a var my NULL; without declaring it....doesn't that make it 0 not ""?...humm