Thursday, June 23, 2016 - 03:00 pm
Swearingen 3A75
DISSERTATION DEFENSE Department of computer science and Engineering University of South Carolina Author : Lingxi Zhou Advisor: Jijun Tang Date: Thursday, June 23rd Time: 3:00pm Place : Swearingen 3A75 Abstract Gene order can be changed by events such as rearrangements, duplications, and losses, which can change both the order and content of the genome. These genetic changes account for all of genome evolution. Recently, the accumulation of genomic sequences provides researchers with the chance to handle long-standing problems about the phylogenies, or evolutionary histories, of sets of species, and ancestral genomic content and orders. Over the past few years such problems have had a large number of algorithms proposed in the attempt to resolve them, with each algorithm following a different standard. The work presented in this dissertation focuses on algorithms and models for whole-genome evolution and their applications in phylogeny and ancestral inferencing from gene order. We developed a pipeline involving maximum likelihood, weighted maximum matching, and variable length binary encoding for estimation of ancestral gene content to reconstruct ancestral genomes under the various evolutionary models, including genome rearrangements, additions, losses, and duplications, with high accuracy and low time consumption. Phylogenetic analyses of whole-genome data have been limited to small collections of genomes and low-resolution data, or data without massive duplications. We designed a probabilistic approach to phylogeny analysis (VLWD) based on variable length binary encoding, using the probabilistic model, to reconstruct phylogenies from whole genome data, scaling up in accuracy and make it capable of reconstructing phylogeny from whole genome data, like triploids and tetraploids. Maximum likelihood based approaches have been applied to ancestral reconstruction but remain primitive for whole-genome data. We developed a hierarchical framework for ancestral reconstruction, using variable length binary encoding in content estimation, then adjacencies fixing and missing adjacencies predicting in adjacencies collection and finally, weighted maximum matching in gene order assembly. Therefore it extensively improves the performance of ancestral gene order reconstruction. We designed a series of experiments to validate these methods and compared the results with the most recent and comparable methods. According to the results, they are proven to be fast and accurate. Thanks, Sri.