COLLOQUIUM Department of Computer Science and Engineering University of South Carolina Large-scale Phylogeny Reconstruction from Arbitrary Gene-Order Data. Jijun Tang Department of Computer Science University of New Mexico Date: February 23, 2004 (Monday) Time: 3:30-4:30PM Place: Swearingen 1A03 (Faculty Lounge) Abstract Phylogenetic reconstruction from gene-order data has attracted increasing attention from both biologists and computer scientists over the last few years. Our software suite GRAPPA currently provides the most accurate approach to phylogenetic reconstruction. However, the approach used in GRAPPA is fundamentally slow, because it requires evaluating every candidate phylogenetic tree. In 2001, we had to use a 512-processor cluster to analyze the 13-genome Campanulaceae dataset; our current version is one billion times faster than the original and can solve the same dataset on a laptop in less than an hour, but remains limited to at most 16 genomes. In this talk, I will introduce various techniques used to speed up GRAPPA, including a tightened lower bound, a layered search, and a branch-and-bound method. Then I will present our successful efforts to scale up GRAPPA to hundreds of genomes by integrating GRAPPA with DCM, the disk-covering method pioneered by Tandy Warnow. DCM-GRAPPA can handle datasets with more than 1000 genomes and still retain high accuracy. Finally, GRAPPA, like all existing reconstruction methods, requires that all genomes have identical gene content, with each gene appearing exactly once in each genome, which is highly unrealistic. I will present a collection of techniques we have developed to handle unequal gene contents, along with early experimental results showing that the ability to handle unequal contents makes a very significant difference in the accuracy of reconstructions. Jijun Tang is a doctoral candidate in the Department of Computer Science, University of New Mexico. He received his M.S. in Offshore Engineering from Tianjin University in 1996 and the M.S. degree in Computer Science from the University of New Mexico in 2002. His primary research interests are modeling, algorithmic design, and high-performance implementations for complex computational applications. His current research is on phylogeny reconstruction from gene-order data.