Motivation

Today's scientific research has a distinct trend: how to convert BIGDATA and raw computing power into knowledge discovery and innovations. The focus of my research is to develop novel machine learning, data mining, and evolutionary algorithms for solving bioinformatics and computational design synthesis problems. Systematic accumulation of large datasets in science and engineering have made it imperative to develop algorithms for analyzing and extracting useful information from these data so as to accelerate the progress of scientific discovery. On the other hand, the availability of enormous computational resources calls for algorithms that transform raw computation cycles into knowledge and innovation. Specifically, my research focuses on two areas: 1) functional genomics that builds predictive models based on heterogeneous data sources such as DNA sequences, protein structures, and microarray datasets; 2) evolutionary algorithms and their application in automated design synthesis of engineering systems and materials.


There are three interesting problems in the areas of biology, engineering, and material science:
  1. Given a variety of heterogeneous data from genomics,transcriptomics, proteomics, protein structures, and metabolomics, how can we decode all the functions of genomes and develop methods for predicting protein functions, disease genes, and disease diagnosis?
  2. Material discovery: how to find the right combination of elements and compositions to achieve desired material functions and properties?
  3. Inventions and Engineering Design: given a set of components, how to assemble them into a system with desired functions?

Research Projects

Protein Sorting Signal Bioinformatics and Protein Subcellular Localization Prediction

A regular cell contains about a billion proteins. How do all these proteins get correctly localized to their target locations after their synthesis? This is a mysterious process that starts getting decoded. This project aims to develop computational algorithms to de novo identify sorting signals and to predict protein subcellular localization

Structural bioinformatics

This projects aims to explore novel algorithms for effective prediction of protein binding residue prediction.

Computational knowledge discovery and modeling in bioinformatics

Due to the complexity of biological systems, many bioinformatics applications and algorithms depend on heuristic knowledge empirically derived by human experts. One example is the scoring functions widely used in sequence alignment and protein docking. This project aims to explore a systematic approach for computationally extracting objective heuristic knowledge from known facts. We will also explore the unbiased open-ended evolutionary modeling for interpreting complex biological processing using genetic programming.

Machine learning, Evolutionary Computation and Data Mining

genetic algorithms, genetic programming, multi-objective optimization

Human Competitive Computational Discovery and Invention

According to IEEE Intelligent Systems Magazine and Scientific American (local copy), one of the major progress of Artificial Intelligence in the past decade is the automated invention (synthesis) of human-competitive patented controllers and circuits using Genetic Programming (See article here). Based on my dissertation study on sustainable evolutionary computation model and genetic programming based computational synthesis of mechatronic systems, I will further explore the critical scalability issue in evolutionary automated design. I will investigate new representations and search algorithms for scalable evolutionary synthesis and its applications in important bioinformatics and engineering problems such as signal processing circuits, mechanism designs and etc. The ultimate goal is to propose a systematic approach for evolving innovative patentable designs and novel open-ended solutions to hard problems.

Strategies

We use computational, statistical and mathematical approaches and data mining algorithms to interpret large-scale biological data coming from high-throughput experiments. We explore open-ended computational intelligence techniques to achieve human-competitive computational discovery and inventions using high-performance computing.