COLLOQUIUM
Department of Computer Science and Engineering
University of South Carolina
 
Large-scale Phylogeny Reconstruction from 
Arbitrary Gene-Order Data.

Jijun Tang

Department of Computer Science
University of New Mexico

Date: February 23, 2004 (Monday)
Time: 3:30-4:30PM
Place: Swearingen 1A03 (Faculty Lounge)

Abstract

Phylogenetic reconstruction from gene-order data has attracted
increasing attention from both biologists and computer scientists over
the last few years. Our software suite GRAPPA currently provides the
most accurate approach to phylogenetic reconstruction. However, the
approach used in GRAPPA is fundamentally slow, because it requires
evaluating every candidate phylogenetic tree. In 2001, we had to use a
512-processor cluster to analyze the 13-genome Campanulaceae dataset;
our current version is one billion times faster than the original and
can solve the same dataset on a laptop in less than an hour, but
remains limited to at most 16 genomes. In this talk, I will introduce
various techniques used to speed up GRAPPA, including a tightened
lower bound, a layered search, and a branch-and-bound method. Then I
will present our successful efforts to scale up GRAPPA to hundreds of
genomes by integrating GRAPPA with DCM, the disk-covering method
pioneered by Tandy Warnow. DCM-GRAPPA can handle datasets with more
than 1000 genomes and still retain high accuracy. Finally, GRAPPA,
like all existing reconstruction methods, requires that all genomes
have identical gene content, with each gene appearing exactly once in
each genome, which is highly unrealistic. I will present a collection
of techniques we have developed to handle unequal gene contents, along
with early experimental results showing that the ability to handle
unequal contents makes a very significant difference in the accuracy
of reconstructions.

Jijun Tang is a doctoral candidate in the Department of Computer
Science, University of New Mexico. He received his M.S. in Offshore
Engineering from Tianjin University in 1996 and the M.S. degree in
Computer Science from the University of New Mexico in 2002. His
primary research interests are modeling, algorithmic design, and
high-performance implementations for complex computational
applications. His current research is on phylogeny reconstruction from
gene-order data.