Marwan's Computer Science Senior Seminar Site
Main CS page |
First weeks of school
Im still not sure what im going to do for my computer science project. Im interested in
combining both the fields of Computer science and Business in order to have a combination
for both of my majors.
Im going to do my project in the field of Management Information Systems.
Im not sure which part exactly in that field im going to go more in details, thats why im
doing more reading at the point from the following books:
Contemporary Financial Management R. Charles Moyer.
Econometric Analysis. Prentice Hall.
Time Series Analysis. Princton.
The Social Impact Of computers.
There are more sources from the internet while ill get the URL's soon.
Week of 9/16
I started doing more research in the area of Data Mining, Knowledge data, exploration and
tools.
I think that I might be interested in having a software that will gather information,
analyze it and make some managerial decision, ofcourse having a database system that will do
all the storage.
I started writing my sarvery paper, and so far so good, most of what stuff are gathering
smoothly. My presentation is going well, I have my outline ready and most of the things that
i need to say are typed down.
Week of 9/23
More preparation on the paper and the presentation.
I finished somewhere around 5 single-spaced
pages in writing my paper.
I found some articles more about Data mining and how it works. Im still waiting for some
books I ordered from the inter-library loans(ILR).
Week of 10/1
More articles and viewpoints and books to look at and for.
I need to find more books on the implementation of the Data mining, becuase basically I
figured most of the stuff that I need for the servay and the presenatation.
Week of 10/8
I went to the institutional research thats located on the 3rd floor on the LBC to ask for
information since they have alot of information about different aspects that goes in the
school hoping I would find some information that I can use for data mining for my final
project. I visited them twice, and in the 2 times i went there, they were extremely busy
and no one could help me, so i need to go there one more time.
I went to see the Economics Professor Naser Abumustafa trying to find out if he has some
ecnonomical information like number and statistics about different emerging markets in the
world. He gave me a CD to brawse and Im still in that process.
Im also searching to find more information about the implementation of the datamining
software, whether i want to use statistical tools or machine learning. I think I might be
learning towards statistical tools, and my further research is going to give me the
definitive answer to my question.
I found few papers from the Citesser site, but im having problems opening them.
them
Week of 10/15
I found few ideas about the project from different sites. One of the them was how to
organize the college and where classes meet and when in order to prevent conflicts.
That is after putting all the data about the professors and departments and
other different information in a database, and then the database will analyze the data
and give me where each class meets and when. Plus it will generate a 4yearplan for the
student. The user will have to enter his major and if they wish to have any minors
and then the software will organize his schedule in which, which class to be taken
when and if they wish to take any interesting gen eds or other classes, how will that
fit, and in case the student wants to go for an off campus program, effects..etc.
the following URL has some ideas and projects that I am reading from.
http://www.dba-oracle.com/dw_proj.htm
I found another article which is called "An overview of Data Warehousing
and OLAP Technology" which im reading at the moment.
Plus the article " Research Problems in Data Warehousing.
Im also contacting the managment department asking for ideas and if they
could help in any way, since I might be combining both of my thesis together.
Week of 10/21
Narrowing down the topic to data analysis and clutering. I found several articles
that talk about clustering and the difinitions and different samples and techniques.
The bibliography of these articles will be added soon.
I skimmed though the first few years of the ACM COMPUTING SURVERYS, after
spending hours reading and looking into different books of the ACMs, I found
a very benefitial article in V.31-32 ( 1999-2000) that talk about clustering.
The article is not short, so its going to take me time to read it, because I have
found it late on Tuesday night, and I cant take the book outside the library,
so i have to come back some other time and finish reading and taking what i
need from the article.
I also ordered couple of books from ILL which should be arriving sometime soon.( I hope)
I also put down the definition of clustering and got more information about the topic
itself and few examples of how clustering was used and could be used in order to
extract some ideas and reflect that on webdb so I can do my project.
Week of 10/28
I need to spend more time on the presentation.
I'm going to write my thoughts and ideas down as if it's going to be a rough draft for the
paper, which should give me a better lead on writing the paper for the following week.
Getting back the ILL that I ordered, there are two books that helped me alot to solve my
problem which are: Finding groups in Data, Tools and techniques for data mining and
clusters. They have discription of many algorithms to target clustering and data mining.
They show examples of around six algorithms for partitions and Hieirarchal methods together.
Week of 11/4
My major objective for this week is to finish the paper, and basically after finishing the
paper, I will spend more time implementing a test data for my project and getting the data
from Duscko, and thinking of what's the best way to target my problem for my project.
I started with the testing data, I computed the distances for a number of unreal data and
did some graphing on excel, then I created another data, a larger amount this time, and
started with the distance, and then i'm going to continue with the algorithm, and then start
coding the algorithm.
I talked to Dusko, and he's going to send me 2 test files containing some kind of
identification for the students, courses they have taken, and their addresses. The problem
that I'm facing right now is for the international students, because some of them has
addresses in the united states, so they won't appear with an international address.
Jim gave me an idea to ask Dusko about, plus Charlie gave me another one that might help
Dusko get something out of Banner to get me more accurate address. If not, I might talk to
the registrar and Bonita, since there has to be way to find the real addresses of the
international students, some the college has to inform the IRS of the international students
that enroll in the college
Week of 11/11
I started writing code to transfer the files that Duscko game me in order to create a big
database for all the information that I have. The first code that I need to write is to
seperate data and put "'" between entries and "," as well, since this is the structure for
the SQL database.
For my first peice of code, I need to open a file and write to a file, so I need to get a
reminder from CS35(128), because it's been a while since using files.
I asked John Howell for a very simple example of reading from a file and writing to a file.
In order to use that for my code.
I finished writing the first file, but I still need to test it on a small peice of data.
I'm facing a problem that is my code is working on small data, but it is not working on
a large data. I'm using "char ch" as a variable to get the input and write it as an output,
but when I run the software on a large data, I get quota disk exceeded. Although
it is working perfectly fine.
Jim suggested that I use strings, instead of regular variables. So far, same thing is happening.
I'm going to start writing the code for the insertion of the data into the database, so I
won't kill too much time on one thing and forget everything else, untill I get it right.
After making a good progress, I'm going to start writing the code for the agglomerative algorithm.
Week of 11/18
My code for the transformation of the data to what SQL can read, is taking more time than expectected.
It works once and doesn't the next time. I have made another copy of the original code, so I wont
fall into a deep whole that is going to take a long time to get out of.
I have been sick since the begining of the week. I havnt made that much progress as I used to do
in a week.
Week of 11/25
I started coding the agglomerative algorithm.
I finished solving the problem of transformation of the data so I can create a database
and create tables in order to inorder to apply the algorithm, so I can make the
disiimilarity matrix from, and after that, I'll start clustering the results and analyzing
them.
I started facing minor problem as I go with the coding. The biggest problem of mine was
when I created a table in the databases from the data thet I have, I want to separate
each department on it's own, and showing each pidm and the number of courses they have
taken in that department. First of all, I have created the table where it has all the pidms
I have, and showing each one, with the course they have taken and the number of courses.
Then I separated every department on it's own, and showed the pidms that have taken in that
specific department.
The next step, was joining the tables together, so I can have a bigger table with all the
pidms and for each pidm, i'll have a column for each department, right next to the pidm, so
I can show the count for each department.
This seemed harder than It looked like. It didn't work out as I thought it would.
Things just became more and more complicated as I tried to solve this problem.
Week of 12/2
After spending a fair amount of time trying to solve this problem, I started solving the
problem with a different approach to the problem.
The second approach was to start coding with C++ and read the data from the first table that
I originally created, the one that has all the pidms and all the courses and all the counts.
I finished the code that is going to read the data from the files. But I need to spend more
time on reading the department, because they are giving me some hard time.
I also wrote few hundred lines of code in order to create a two-dimensional array that is
going to hold all the data I need to start using for the calculating the dissimilarity
matrix.