DISSERTATION DEFENSE
Author : Yong Zhao
Advisor : Dr. Jianjun Hu
Date : Feb 21, 2022
Time 12:00 pm
Place : Virtual (Zoom link below)
https://us02web.zoom.us/j/9031735360
Abstract
Discovery of novel functional materials is playing an increasingly important role in many key industries such as lithium batteries for electric vehicles and cell phones. However experimental tinkering of existing materials or Density Functional Theory (DFT) based screening of known crystal structures, two of the major current materials design approaches, are both severely constrained by the limited scale (around 250,000 in ICSD database) and diversity of existing materials and the lack of sufficient number of materials with annotated properties. How to generate a large number of physically feasible, stable, and synthesizable crystal materials and build accurate property prediction models for screening are the two major unsolved challenges in modern materials science.
This dissertation is focused on addressing these two fundamental tasks in material science using deep learning/machine learning models. Deep learning and machine learning have made tremendous progress in computer vision and natural language processing as shown by the autonomous driving cars and google translators, with their potential to greatly transform the research of materials science. Compared to conventional tinkering based materials discovery methods, data-driven approaches have been increasingly used in material informatics because of their much faster screening speed for new materials compared to DFT based calculations. In this dissertation, we design and develop novel deep learning-based algorithms to generate new crystal structures learning the hidden intricate chemical rules from known crystals that assemble atoms into stable crystal structures. We also explore and develop novel representation learning upon materials compositions and structures for high performance prediction of materials structural characteristics and elastic properties.
In the first topic, we propose CubicGAN, a generative adversarial network (GAN) based deep neural network model for large-scale generative design of novel cubic materials. When trained on 375 749 ternary materials from the OQMD database, we show that the model can not only rediscover most of the currently known cubic materials but also generate hypothetical materials of new structure prototypes. A total of 506 such materials have been verified by DFT based phonon dispersion calculation. Our technique allows to generate tens of thousands of new materials given sufficient computing resource.
In the second topic, we propose a Physics Guided Crystal Generative Model (PGCGM) for new materials generation, which significantly expands the structural scope of CubicGAN by bringing the capability of generate crystals of 20 space groups. This is achieved by capturing and exploiting the pairwise atomic distance constraints among neighbor atoms and symmetric geometric constraints and a novel data augmentation strategy using the base atom sites of materials. With atom clustering and merging on generated crystal structures, our method increases the generator's validity by 8 times compared to one of the baselines and by 143% compared to the previous CubicGAN along with its superiority in properties distribution and diversity. We further validated our generated candidates by DFT calculations, which successfully optimized/relaxed 1869 materials out of 2000 generated ones, of which 39.6% are with negative formation energy, indicating their stability.
In the third topic, we propose and evaluate machine-learning algorithms for determining the structure type of materials, given only their compositions. We couple random forest (RF) and multiple-layer perceptron (MLP) neural network models with three types of features: Magpie, atom vector, and one-hot encoding (atom frequency) for the crystal system and space group prediction of materials. Four types of models for predicting crystal systems and space groups are proposed, trained, and evaluated including one-versus-all binary classifiers, multiclass classifiers, polymorphism predictors, and multilabel classifiers. The synthetic minority over-sampling technique (SMOTE) is conducted to mitigate the effects of imbalanced data sets. Our results demonstrate that RF with Magpie features generally outperforms other algorithms for binary and multiclass prediction of crystal systems and space groups, while MLP with atom frequency features is the best one for structural polymorphism prediction.
Finally, we propose to use electronic charge density (ECD) as a generic unified 3D descriptor for materials property prediction with the advantage of possessing close relation with the physical and chemical properties of materials. We develop an ECD-based 3D convolutional neural network (CNN) for predicting the elastic properties of materials, in which CNNs can learn effective hierarchical features with multiple convolving and pooling operations. Our experiments show that our method can achieve good performance for elasticity prediction over 2170 Fm-3m materials.