**Data-driven techniques are rapidly transforming the field of materials science, particularly surrounding computational materials design. He we catch up with Chuanxun Su to hear about his recent work in Journal of Physics: Condensed Matter, pioneering a new database of crystal structures. Read on to find out more from Chuanxun himself.**

Modern research experiments provide us with large amounts of information concerning materials. Access to crystal structure data is helpful for solving scientific and industrial problems involving materials. Many generally accessible databases offer unprecedented opportunities for data-driven techniques that can accelerate materials discovery and design. However, large amounts of duplicate information are stored in databases and hamper materials analysis and discovery. In order to solve this problem, in our recent paper we develop a robust and efficient structure descriptor for assessing the similarity of atomic structures to eliminate redundant structure entries.

Based on our structure descriptor, we propose a simple and unambiguous definition of crystal structure prototype. The definition considers composition, symmetry, and configuration. We suggest an explicit threshold in the framework of our structure descriptor to determine similar/dissimilar structures. This allows our program to automatically filter every crystal structure in the database to construct the Crystal Structure Prototype Database (CSPD).

After CSPD was constructed, we developed a series of statistics describing the distribution of crystal structure prototypes. By analyzing these statistics, we suggest that material discovery (i.e. experimental synthesis, theoretical structure predictions, and high-throughput calculations) may focus primarily on certain favorable composition types (i.e. AB_{2}, AB, AB_{3}, A_{2}B_{3}, ABC_{2}, ABC, ABC_{3}, ABC_{2}D_{6}, ABCD_{4}, and ABC_{2}D_{4}) or numbers of formula units (two, four, and eight).

We demonstrate two typical applications of the CSPD in our paper: generating initial structures for structure prediction and determining the prototype for a given structure. By substituting elements for the structure prototypes in the CSPD, we can generate initial structures for structure prediction. Test results show that our method outperforms the random sampling method. The CSPD can also generate high quality structures for empirical potential fitting and high-throughput calculations. With the help of our structure descriptor, we can determine whether a new proposed structure is similar to any known in the database.

**About the Authors**

**Chuanxun Su** is a PhD candidate in State Key Laboratory of Superhard Materials at Jilin University. His major is condensed matter physics. His research interests mainly are the development of ATLAS (a total energy calculation software package based on Orbital-free density functional theory) and CALYPSO (a widely used software package for atomic structure prediction), the exploration of crystal structure and physical phenomenon under high pressures on the basis of density functional theory, and the construction of the CSPD.

**Jian Lv** received his PhD from Jilin University in 2013. From 2013 to 2015 he was a postdoc in Beijing Computational Science Research Center, and then worked in College of Materials Science and Engineering of Jilin University. His research is focused on the development of global optimization and machine learning methods on structure prediction.

**Quan Li** received his PhD from Jilin University in 2011. From 2011 to 2013 he was a postdoc in UNLV. From 2015, he is a full Professor in Jilin University. Now his research focuses on the structural design and mechanical properties of superhard materials using CALYPSO method and *ab initio* calculations.

**Hui Wang** is an associate Professor in Jilin University. His research focuses on the new physics of crystals and minerals at high pressures and temperatures.

**Lijun Zhang** is the Professor at Jilin University, where he has started his own research career since 2014 after years of postdoctor and research professor training at Oak Ridge National Laboratory, National Renewable Energy Laboratory, and University of Colorado Boulder. His research is focused on development of hierarchical methods for computational materials design and rational design of functional semiconductor materials.

**Yanchao Wang** received his PhD from Jilin University in 2013. Since 2013, he worked in State Key Laboratory of Superhard Materials of Jilin University and became an associate professor in 2015. His research is focused on the development of global optimization methods on structure prediction and real-space finite-difference implementation of orbital-free density functional theory.

**Yanming Ma** is a full Professor in Jilin University, where he leads a research group on development of simulation methods on structure prediction and exploration of new physics of condensed matters under high pressure conditions. His team has developed an efficient CALYPSO method and its same-name code for structure prediction based on swarm intelligence algorithm.

This work is licensed under a Creative Commons Attribution 3.0 Unported License

Categories: Journal of Physics: Condensed Matter, JPhys+