Repository logo

Data-Driven Research for Amorphous Materials: Towards Seamless Utilization of Publication Data in Chemical Sciences



Change log



In this work, long standing challenges in the research on amorphous materials have been identified. In particular, the lack of reliable data repositories for properties and structures of amorphous materials significantly limits the possibilities for research. As a consequence, state-of-the-art data-driven methods, which have been widely used for crystalline materials for decades, can hitherto only be used in most limited capacity in the domain of amorphous materials. A pathway towards a resolution of these problems is proposed in this work. The overall methodology relies on the extraction of information from primary-literature sources, i.e., scientific articles. In this way, the entirety of knowledge in the domain, which has been published in the past, can, in principle, be utilized for new scientific discovery. The goal of the work presented in this thesis is to enable state-of-the-art data-driven research for the domain of amorphous materials science. In order to achieve this goal, novel contributions, in the form of new methodologies and their validation, in three distinct fields have been achieved. First, in the domain of information science, the table understanding problem has been approached. Based on previous research in the field, a complete methodology for the standardization of complex table structures is delivered, in the form of the stand-alone software library TableDataExtractor. Secondly, in the domain of data-driven research in the chemical sciences, and based on previous research in the field, new methodologies were developed for the extraction of physical and chemical properties for chemical compounds. For the first time, hierarchies of nested physical properties are extracted from primary literature sources, and without the need for manually written grammatical rules for extraction. As many as 18 interrelated, nested properties of crystalline compounds are extracted to validate the methodology, with an achieved overall precision of 92 %. Finally, the developed methodologies were applied in the domain of amorphous materials. An independent database of glass transition temperatures for arbitrary inorganic compounds has been generated, based on primary literature sources. This has subsequently been used to predict glass transition temperatures for arbitrary inorganic compounds with high accuracy. The presented results validate the developed methodology for overcoming limitations in amorphous materials research. At the same time, the developed methods lay the foundation for seamless utilization of primary literature sources in data-driven research frameworks.





Jasak, Hrvoje


amorphous materials, glass, crystallographic data, computational materials science, natural language processing, glass transition, property prediction


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
EPSRC (1819345)