Crystal databases

Databases in Today’s Materials Science

Databases play a critical role in today’s materials science and materials engineering. They serve as the backbone of research, enabling scientists and engineers to access, store, and analyze large amounts of data from various sources, including experiments, simulations, and literature. For instance, databases like ICSD or Materials Project provide very diverse information on a vast number of materials, including their structures, properties or reaction mechanisms. Having access to this comprehensive and diversified data can considerably speed up the process of materials discovery and development, as it enables researchers to quickly find and analyze relevant information, identify promising candidates for further study, and make informed decisions on how to proceed for the development of new optimised ones.

In addition, databases can facilitate interdisciplinary collaboration between experimental researchers, theoretical scientists, chemists, programmers, and other experts. This can lead to more efficient and effective problem-solving, as well as a better understanding of the underlying science behind materials behavior, especially with the rise of artificial intelligence (AI), that is able to explore vast and presumably complex amounts of data in order to identify patterns, trends, and relationships between materials, structures or properties that may not be immediately obvious for human scientists. Now, we will take a closer look at some of the most important databases in materials science:

  • The Inorganic Crystal Structure Database (ICSD) is above all a private database of experimental crystal structures, although it also has a certain number of theoretical structures. This database hosts very useful information about crystal symmetry, composition, and physical properties. It currently contains over 300,000 structures and is accessible through the FIZ Karlsruhe website at https://icsd.fiz-karlsruhe.de/.
  • The Crystallography Open Database (COD) is a collection of open-access experimental data about crystal structures and their properties, including both inorganic and organic compounds. The database currently contains over half a million structures and it is accessible at https://www.crystallography.net/.
  • The Materials Project (MP) manages a materials database that provides free access to calculated data of over 140,000 materials, including structural, electronic or thermodynamic information. MP is a collaborative effort between researchers at several institutions, including the Lawrence Berkeley National Laboratory. The database of MP is searchable by a variety of properties and provides tools for data analysis and visualization. MP provides a powerful platform for AI-based materials discovery and design, enabling researchers to drastically reduce the time needed to develop new materials by focusing on the most computationally promising compounds. The database can be accessed at https://materialsproject.org/.
  • The Open Quantum Materials Database (OQMD) is a database similar to MP because it is also focused on density functional theory (DFT) calculated data. The main advantage of OQMD database is that it has a larger amount of open-access calculations (over one million). However, it is less concerned about harmonisation and user-friendly accessibility. OQMD was created in Chris Wolverton’s group at Northwestern University, and it can be accessed at https://oqmd.org/.
  • The Automatic Flow for Materials Discovery (AFLOW) was born as a software framework for high-throughput materials discovery, but it has ended up becoming one of the most enormous computational databases of the field, with more than three million calculations. Because of its collaborative and multi-institutional character and because of its associated libraries of calculation automation and data analysis, it compares well with MP. All its information is freely accessible at https://aflow.org/.
  • The Joint Automated Repository for Various Integrated Simulations (JARVIS) is a database related to the United States National Institute of Standards and Technology, and hosts thousands of DFT and machine learning calculations, and experimental data. It provides web access by free user-credential login at https://jarvis.nist.gov/.
  • MatNavi is a set of databases maintained by the National Institute for Materials Science (NIMS) in Japan. It houses very heterogeneous information that collects data on polymers (Polymer DB), metals (Metallic Material DB) or inorganic crystals (Inorganic MaterialDB). It also offers applications such as the Composite Design or Property Prediction System. The database can be consulted at https://mits.nims.go.jp/.
  • Citrination is a cloud-based database and platform for materials data. It is the only database on the list maintained by a private company. It provides access to a large, user-contributed database of hundreds of datasets of very diverse sources. Citrination can be accessed at https://citrination.com/ by prior registration.

Databases

Timeline and geographic distribution of materials database and some related companies: Data-Driven Materials Science: Status, Challenges, and Perspectives (Himanen, 2019)

Although these are some of the most important databases in materials science, the fact is that there are many more that deserve to be described in this post. The impressive advancements in technology and computational capabilities and the increase in demand for new better materials promote that the number of databases and data entries grow year by year. This is generally very positive because the more data can be accessed the more potential accuracy can be reached by IA inferences. However, this fast growth of data volume also requires a great responsibility on the part of programmers and engineers so that the quality and harmonisation of the data keeps up with its quantity in order to really achieve a truly effective improvement of AI insights. After all, this is one of the main goals of today’s databases.