Protein Sorting Motif Analysis and Protein Subcellular Localization
This ongoing project aims to develop algorithms for identifying protein sorting motifs and protein subcellular localization prediction
metaP, a heuristic ensemble algorithm for protein subcellular localization prediction
LRensemble, a machine learning based ensemble algorithm for protein subcellular localization prediction
BayesMotif, a de novo algorithm for identifying anchor based sorting signal motifs
SortMotifDB, a database of protein sorting motifs, supporting motif search, retrieval, motif model comparison
Computational prediction of protein binding residues
We have developed two web servers for computational prediction of protein binding residues: 1)HemeBind, which predicts heme binding residues by integrating structural and sequence information; 2) HemeNet, which makes heme-binding residue prediction by exploiting topological properties of these residues in the residue interaction networks derived from three-dimensional structures
iMISS: Integrative Missing Value Estimation for Microarray Data
iMISS (the integrative Missing Value Estimation method) is an integrative algorithm framework for improving microarray missing value estimation by incorporating information from multiple reference microarray datasets. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.
J. Hu, Haifeng Li, Michael S. Waterman, Xianghong Jasmine Zhou. Integrative Missing Value Estimation for Microarray Data. BMC Bioinformatics, 2006
EMD: Ensemble Algorithms for Motif discover
We proposed a novel clustering-based ensemble algorithm named EMD for de novo motif discovery by combining multiple predictions from multiple runs of one or more base component algorithms. The ensemble approach is applied to the motif discovery problem for the first time. The algorithm is tested on a benchmark dataset generated from E. coli RegulonDB. The EMD algorithm has achieved 22.4% improvement in terms of the nucleotide level prediction accuracy over the best stand-alone component algorithm. The advantage of the EMD algorithm is more significant for shorter input sequences, but most importantly, it always outperforms or at least stays at the same performance level of the stand-alone component algorithms even for longer sequences.J. Hu, Yifeng David Yang and Daisuke Kihara, "EMD: an
Ensemble Algorithm for discovering regulatory motifs in DNA sequences",
BMC Bioinformatics, 7:342. 2006.
J. Hu, Bin Li, and Daisuke Kihara, "Limitations and
Potentials of Current Motif Discovery Algorithms", Nucleic Acid Research,
33: 4899-4913, 2005
Evograph: Evolving Graphs using Genetic Programming
Evograph uses Genetic Programming to evolve arbitrary types of graphs. Users only need to define the fitness/evaluation function of the graphs and the evolutionary search will try to find an optimum graph. It has been applied to the wireless access point configuration problem. Source code is available for downloading.J. Hu, E. Goodman, “Wireless Access Point Configuration by Genetic Programming”, Proc. IEEE Congress on Evolutionary Computation CEC2004
GPBG: Evolving Bond graph Using Genetic Programming
Machines or dynamic systems such as electronic circuits, mechanical vibration absorbers or MEMS devices can be represented by bond graph models. To design a machine is equivalent to design a bond graph model. GPBG is a complete framework for automated evolutionary synthesis of Bond graph models using Genetic. It has been used to evolve analog filter circuits, printer redesign, vibration absorber, MEMS filters, and robust circuits.
K. Seo, J. Hu, Z. Fan, E. D. Goodman, and R. C. Rosenberg. Automated Design Approaches for Multi-Domain Dynamic Systems Using Bond Graphs and Genetic Programming," The International Journal of Computers, Systems and Signals, vol.3, no.1, pp.55-70, 2002.
Bond Graph C++ Library
We have open sourced our Bond graph C++ package for dynamic system simulation, it can generate state representation model A/B/C/D matrixes.
Download the C++ library here
HFC: Hierarchical Fair Competition EC Framework
Many current Evolutionary Algorithms (EAs) suffer from a tendency to converge prematurely or stagnate without progress for complex problems. Hierarchical Fair Competition (HFC) model is a generic framework for sustainable evolutionary search by transforming the convergent nature of the current EA framework into a non-convergent search process. The significant gain in robustness, scalability and efficiency by HFC, with little additional computing effort, and its tolerance of small population sizes, demonstrates its effectiveness on these problems and shows promise of its potential for improving other existing EAs for difficult problems. A paradigm shift from that of most EAs is proposed: rather than trying to escape from local optima or delay convergence at a local optimum, HFC allows the emergence of new optima continually in a bottom-up manner, maintaining low local selection pressure at all fitness levels, while fostering exploitation of high-fitness individuals through promotion to higher levels.
J. Hu, E. Goodman, K.Seo, Z. Fan, R. Rosenberg, "The Hierarchical Fair Competition (HFC) Framework for Sustainable Evolutionary Algorithms", Evolutionary Computation, 13 (2), MIT Press, 2005.