Bioinformatics Algorithms and Data Structures

 

The purpose of this course will be to familiarize students with algorithms and appropriate data structures for computational molecular biology. Potential topics include algorithms for computing phylogenies. Candidate phylogenetic algorithms include methods for calculating a parsimony criterion (Swofford et al. 1996), Hadamard Conjugation (Hendy and Penny 1993), and Felsensteins' likelihood method (1981) as well as distance based algorithms such as the least squares methods of Bryant and Waddell (1998).

 

Another area of interest is the area of algorithms for analyzing microarray data. Potential algorithms for this topic include k-means clustering, unsupervised hierarchical clustering (Eisen et al. 1998), multidimensional scaling (Khan et al. 1998), graphical modeling including the use of APCR methods (Waddell and Kishino 2000), correspondence analysis (Kishino and Waddell 2000, Fellenberg et al. 2001) fuzzy self organizing feature maps (Huntsberger and Ajjimarangsee 1990) as well as other neural (Gruvberger et al. 2001) and Bayesian networks (Friedman et al. 2000).

 

The target audience for this course consists of students that either have an understanding of molecular biology and are interested in the algorithms and techniques used in bioinformatics data analysis or students with a background in computational science that are interested in investigating another area of computational science.

 

Each student will be expected to make a class presentation in which they describe a particular algorithm. Students will also use the algorithms covered in class to perform data analysis.

 

Swofford, D.L., G.J. Olsen, P.J. Waddell, and D.M. Hillis (1996). Phylogenetic Inference. In: "Molecular Systematics, second edition" (ed. D. M. Hillis and C. Moritz), pp 450-572. Sinauer Assoc, Sunderland, Mass.

Hendy, M.D., and D. Penny, Spectral analysis of phylogenetic data, Journal of Classification, v.10, 1993, 5-24.

Felsentein, J., Evolutionary trees from DNA sequences: a maximum likelihood approach, J. Mol. Evol., v17, 1981, 368-76.

Bryant, D., and P.J. Waddell. (1998). Rapid evaluation of least squares and minimum evolution criteria on phylogenetic trees. Molecular Biology and Evolution 15: 1346-1359. (see also Dept. of Mathematics and Statistics Research Report Series, No. 167, University of Canterbury, for an alternative development of the main results).

Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8.

Khan, J., Simon, R., Bittner, M., Chen, Y., Leighton, S.B., Pohida, T., Smith, P.D., Jiang, Y., Gooden, G.C., Trent, J.M., and P.S. Meltzer. Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. Cancer Res., 58: 5009-5013, 1998.

Waddell , P.J., and H. Kishino. (2000). Cluster Inference Methods and Graphical Models evaluated on NCI60 Microarray Gene Expression Data. Genome Informatics Series 11: 129-141.

Kishino, H., and P.J. Waddell. (2000). Correspondence Analysis of Genes and Tissue Types and Finding Genetic Links from Microarray Data. Genome Informatics Series 11: 83-96.

Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, Vingron M. Correspondence analysis applied to microarray data. Proc Natl Acad Sci U S A. 2001 Sep 11;98(19):10781-6.

Huntsberger, T.L., and P. Ajjimarangsee. Parallel self-organizing feature maps for unsupervised pattern recognition. International Journal of General Systems 16 (1990), pp 357-372.

Gruvberger, S., Ringner, M., Chen, Y., Panavally, S., Saal, L.H., Borg, A., and M. Ferno, "Estrogen Receptor Status in Breast Cancer Is Associated with Remarkably Distinct Gene Expression Patterns", Cancer Research 61, pp 5979-5984, 2001.

Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian networks to analyze expression data. J Comput Biol. 2000;7(3-4):601-20.