09/24/03 This is the first of hopefully many journal entries for my senior project. So far all the work that has been done has been preliminary research. I have narrowed the focus of my research to the problems of computational DNA sequence analysis. A first glance at papers in this field brought me the the realization that I needed more background knowledge before I could approach actually problems of the field. For some more basic DNA knowledge I went to my biology textbook, as well as reading less technical papers on DNA analysis, along with papers that included as a legnthy introduction in to DNA for computer scientists. Now that I have reached a level of knowledge where I'm comfortable reading "technical" papers, I'm contuning down that road, sorting throught papers and reading the ones that present problems, or algorithms for solving problems, in the field of DNA sequencing. 10/20/03 Quite a bit later now and I have all the background information I need on DNA to proceed with my project. I've read lots a papers and articles on sequence analysis and have finally flushed out a project. I plan to create a distributed system for DNA sequence analysis with lodable modules for different types of analysis. Along with the system I will create one module (hopefully more) for a specific type of analysis. For the module I will impliment my hope is to adapt an algorithm based on "Identification of the binding sites of regulatory proteins in bacterial genomes" by Hao Li, Virgil Rhodius, Carol Cross, and Eric D. Siggia. I hope to extend to extended this algorithm to eukaryotic DNA sequences throught the fact that Li et al's algorithm is based on the fact that most bacterial transcription factors bind in the dimmer motif and many eukaryotic transcription factors also bind in this motif. 11/03 I have been doing background research on Li's algorithm and have decomposed it into his three steps. Currently I'm working on implementing the data types and functions needed to complete the first step. I hope to have it coded by wed or thurs so that I can begin testing it and seeing what changes need to be made to my implementation of his algorithm. Once I finish that I can move onto step two. 11/17 I have almost finished implementing the first step. I'm using a hash to store the dimers, implemented as an array of std:list. The lists are lists of dimers (A class i created). Right now i can tabulate all the dimers in my test file (of 34k bases) with only a few bugs that should be workout by the end of today. Now that I have all the dimers tabulated I will have to create their significance scores (usung li's algorithm).