Challenges infrastructure

Main Challenges in Materials Data Infrastructure

Data-driven materials science is an emerging field that has shown significant progress in recent years. As highlighted in a previous post, the recent and impressive advancements in technology and computational capabilities and the increase in demand for new better materials promote that the number of material databases and material data entries grow year by year. However, the current field’s data infrastructure is still far from reaching its full potential and several challenges still need to be addressed in order to completely boost the discovery of new materials through artificial intelligence (AI). According to Himanen et al. (2019), these challenges can be classified into five main categories:

Data challenges

Challenges faced by materials data infrastructures on the way of academia, industry, governments and general public adoption (Himanen et al., 2019)

Relevance and Adoption

Databases must provide relevant and adoptable data to various stakeholders, including scientific communities, industries and governments. Relevance can be generally determined by data volume, data quality, data completeness and data homogeneity. Developing interdisciplinary infrastructures that can be adopted by the different interested communities and stakeholders is not a simple task. Relevance is also intimately related to its accessibility by data analysis tools. As Machine Learning (ML) algorithms has gained prominence in this regard, features and properties of the data need to be to be adequate and compatible for ML algorithms to be truly relevant.

Completeness

Completeness is the quality of being whole or perfect and having nothing missing. Attending to this definition, data infrastructures today suffer from a severe completeness problem, because while computational data dominates the existing databases, experimental data is scarce. Facilitating a seamless comparison between computational and experimental data is important for validating theoretical predictions and enhancing materials discovery efforts. Building synergies among computational and experimental data remains an important challenge for the future of data-driven materials science.

Standardization

Some form of standardization is essential in the widespread adoption of a new paradigm or technology. Developing standardized metadata for materials science that is informative, exhaustive, and adaptable is an outstanding challenge. Over the years, there have been various efforts to develop some general material ontologies, such as PIF (Michel & Meredig, 2016), MatSeek (Cheung et al., 2009), or MatOWL (Zhang et al., 2009). It is also very remarkable the recent OPTiMaDe consortium (Andersen et al., 2021), that is building a common interface for accessing data from multiple materials platforms. However, all these initiatives are far from becoming a mature standard and being broadly adopted. Especially for industrial purposes, these efforts are typically insufficient, forcing companies to create their own internal, domain-specific ontologies, which reinforces greater heterogeneity in the field.

Acceptance and Ecosystems

Materials data infrastructures will only be useful if they are accepted as a useful tool by all important stakeholders. User friendliness, easy upload and download of data, and trust in the stored data are essential for widespread acceptance. While current infrastructures are predominantly built and used by scientists in academia, the majority of corporate R&D remains disconnected of the academic ecosystem of data. Coordinated projects that build interdisciplinary ecosystems between academic, corporate, governmental, and public stakeholders constitute a essential need.

Longevity and Diffusion

With increasing awareness for open and data-driven science, national and international funding for the development of open data science is rising. However, longevity and diffusion of innovations and new technologies are rarely considered by funding agencies, and long-term financial support for sustained operation is not guaranteed. As a result, these digital infrastructures are in danger of becoming digital ruins of the expansion of open science.

 

In conclusion, despite data-driven materials science has made a huge progress in last years, several challenges need to be addressed to reach its full potential. Since our main goal of accelerating the design and development of new materials is intimately linked to the empowerment of the material data infrastructure, our experts are working to address each of the challenges mentioned in this post. With the development of a curated, standardized, user-friendly, and long-term maintained database, and our commitment to make it relevant to both industry and academia and easily accessible by our AI algorithms, we are very proud to contribute to the unstoppable progress of today’s computational materials science.