Marwan's Computer Science Senior Seminar Site

Main CS page

First weeks of school
Im still not sure what im going to do for my computer science project. Im interested in combining both the fields of Computer science and Business in order to have a combination for both of my majors.

Im going to do my project in the field of Management Information Systems. Im not sure which part exactly in that field im going to go more in details, thats why im doing more reading at the point from the following books:
Contemporary Financial Management R. Charles Moyer.
Econometric Analysis. Prentice Hall.
Time Series Analysis. Princton.
The Social Impact Of computers.
There are more sources from the internet while ill get the URL's soon.

Week of 9/16
I started doing more research in the area of Data Mining, Knowledge data, exploration and tools. I think that I might be interested in having a software that will gather information, analyze it and make some managerial decision, ofcourse having a database system that will do all the storage.


I started writing my sarvery paper, and so far so good, most of what stuff are gathering smoothly. My presentation is going well, I have my outline ready and most of the things that i need to say are typed down.

Week of 9/23
More preparation on the paper and the presentation. I finished somewhere around 5 single-spaced pages in writing my paper. I found some articles more about Data mining and how it works. Im still waiting for some books I ordered from the inter-library loans(ILR).

Week of 10/1
More articles and viewpoints and books to look at and for.

I need to find more books on the implementation of the Data mining, becuase basically I figured most of the stuff that I need for the servay and the presenatation.

Week of 10/8
I went to the institutional research thats located on the 3rd floor on the LBC to ask for information since they have alot of information about different aspects that goes in the school hoping I would find some information that I can use for data mining for my final project. I visited them twice, and in the 2 times i went there, they were extremely busy and no one could help me, so i need to go there one more time.
I went to see the Economics Professor Naser Abumustafa trying to find out if he has some ecnonomical information like number and statistics about different emerging markets in the world. He gave me a CD to brawse and Im still in that process.
Im also searching to find more information about the implementation of the datamining software, whether i want to use statistical tools or machine learning. I think I might be learning towards statistical tools, and my further research is going to give me the definitive answer to my question.
I found few papers from the Citesser site, but im having problems opening them. them

Week of 10/15

I found few ideas about the project from different sites. One of the them was how to organize the college and where classes meet and when in order to prevent conflicts. That is after putting all the data about the professors and departments and other different information in a database, and then the database will analyze the data and give me where each class meets and when. Plus it will generate a 4yearplan for the student. The user will have to enter his major and if they wish to have any minors and then the software will organize his schedule in which, which class to be taken when and if they wish to take any interesting gen eds or other classes, how will that fit, and in case the student wants to go for an off campus program, effects..etc.

the following URL has some ideas and projects that I am reading from. http://www.dba-oracle.com/dw_proj.htm

I found another article which is called "An overview of Data Warehousing and OLAP Technology" which im reading at the moment.

Plus the article " Research Problems in Data Warehousing.

Im also contacting the managment department asking for ideas and if they could help in any way, since I might be combining both of my thesis together.

Week of 10/21

Narrowing down the topic to data analysis and clutering. I found several articles that talk about clustering and the difinitions and different samples and techniques. The bibliography of these articles will be added soon. I skimmed though the first few years of the ACM COMPUTING SURVERYS, after spending hours reading and looking into different books of the ACMs, I found a very benefitial article in V.31-32 ( 1999-2000) that talk about clustering. The article is not short, so its going to take me time to read it, because I have found it late on Tuesday night, and I cant take the book outside the library, so i have to come back some other time and finish reading and taking what i need from the article.
I also ordered couple of books from ILL which should be arriving sometime soon.( I hope)
I also put down the definition of clustering and got more information about the topic itself and few examples of how clustering was used and could be used in order to extract some ideas and reflect that on webdb so I can do my project.

Week of 10/28

I need to spend more time on the presentation. I'm going to write my thoughts and ideas down as if it's going to be a rough draft for the paper, which should give me a better lead on writing the paper for the following week.
Getting back the ILL that I ordered, there are two books that helped me alot to solve my problem which are: Finding groups in Data, Tools and techniques for data mining and clusters. They have discription of many algorithms to target clustering and data mining. They show examples of around six algorithms for partitions and Hieirarchal methods together.

Week of 11/4

My major objective for this week is to finish the paper, and basically after finishing the paper, I will spend more time implementing a test data for my project and getting the data from Duscko, and thinking of what's the best way to target my problem for my project.
I started with the testing data, I computed the distances for a number of unreal data and did some graphing on excel, then I created another data, a larger amount this time, and started with the distance, and then i'm going to continue with the algorithm, and then start coding the algorithm.
I talked to Dusko, and he's going to send me 2 test files containing some kind of identification for the students, courses they have taken, and their addresses. The problem that I'm facing right now is for the international students, because some of them has addresses in the united states, so they won't appear with an international address.
Jim gave me an idea to ask Dusko about, plus Charlie gave me another one that might help Dusko get something out of Banner to get me more accurate address. If not, I might talk to the registrar and Bonita, since there has to be way to find the real addresses of the international students, some the college has to inform the IRS of the international students that enroll in the college

Week of 11/11

I started writing code to transfer the files that Duscko game me in order to create a big database for all the information that I have. The first code that I need to write is to seperate data and put "'" between entries and "," as well, since this is the structure for the SQL database.
For my first peice of code, I need to open a file and write to a file, so I need to get a reminder from CS35(128), because it's been a while since using files.
I asked John Howell for a very simple example of reading from a file and writing to a file. In order to use that for my code.
I finished writing the first file, but I still need to test it on a small peice of data.
I'm facing a problem that is my code is working on small data, but it is not working on a large data. I'm using "char ch" as a variable to get the input and write it as an output, but when I run the software on a large data, I get quota disk exceeded. Although it is working perfectly fine.
Jim suggested that I use strings, instead of regular variables. So far, same thing is happening.
I'm going to start writing the code for the insertion of the data into the database, so I won't kill too much time on one thing and forget everything else, untill I get it right.
After making a good progress, I'm going to start writing the code for the agglomerative algorithm.

Week of 11/18

My code for the transformation of the data to what SQL can read, is taking more time than expectected. It works once and doesn't the next time. I have made another copy of the original code, so I wont fall into a deep whole that is going to take a long time to get out of.
I have been sick since the begining of the week. I havnt made that much progress as I used to do in a week.

Week of 11/25

I started coding the agglomerative algorithm.
I finished solving the problem of transformation of the data so I can create a database and create tables in order to inorder to apply the algorithm, so I can make the disiimilarity matrix from, and after that, I'll start clustering the results and analyzing them.
I started facing minor problem as I go with the coding. The biggest problem of mine was when I created a table in the databases from the data thet I have, I want to separate each department on it's own, and showing each pidm and the number of courses they have taken in that department. First of all, I have created the table where it has all the pidms I have, and showing each one, with the course they have taken and the number of courses. Then I separated every department on it's own, and showed the pidms that have taken in that specific department.
The next step, was joining the tables together, so I can have a bigger table with all the pidms and for each pidm, i'll have a column for each department, right next to the pidm, so I can show the count for each department.
This seemed harder than It looked like. It didn't work out as I thought it would. Things just became more and more complicated as I tried to solve this problem.

Week of 12/2

After spending a fair amount of time trying to solve this problem, I started solving the problem with a different approach to the problem. The second approach was to start coding with C++ and read the data from the first table that I originally created, the one that has all the pidms and all the courses and all the counts.
I finished the code that is going to read the data from the files. But I need to spend more time on reading the department, because they are giving me some hard time.
I also wrote few hundred lines of code in order to create a two-dimensional array that is going to hold all the data I need to start using for the calculating the dissimilarity matrix.