Repository logo

Accelerating Materials Discovery with Machine Learning



Change log


Goodall, Rhys Edward Andrew 


As we enter the data age, ever-increasing amounts of human knowledge are being recorded in machine-readable formats. This has opened up new opportunities to leverage data to accelerate scientific discovery. This thesis focuses on how we can use historical and computational data to aid the discovery and development of new materials.

We begin by looking at a traditional materials informatics task -- elucidating the structure-function relationships of high-temperature cuprate superconductors. One of the most significant challenges for materials informatics is the limited availability of relevant data. We propose a simple calibration-based approach to estimate the apical and in-plane copper-oxygen distances from more readily available lattice parameter data to address this challenge for cuprate superconductors. Our investigation uncovers a large, unexplored region of materials space that may yield cuprates with higher critical temperatures. We propose two experimental avenues that may enable this region to be accessed.

Computational materials exploration is bottle-necked by our ability to provide input structures to feed our workflows. Whilst \textit{ab-intio} structure identification is possible, it is computationally burdensome and we lack design rules for deciding where to target searches in high-throughput setups. To address this, there is a need to develop tools that suggest promising candidates, enabling automated deployment and increased efficiency. Machine learning models are well suited to this task, however, current approaches typically use hand-engineered inputs. This means that their performance is circumscribed by the intuitions reflected in the chosen inputs. We propose a novel way to formulate the machine learning task as a set regression problem over the elements in a material. We show that our approach leads to higher sample efficiency than other well-established composition-based approaches.

Having demonstrated the ability of machine learning to aid in the selection of promising compound compositions, we next explore how useful machine learning might be for identifying fabrication routes. Using a recently released data-mined data set of solid-state synthesis reactions, we design a two-stage model to predict the products of inorganic reactions. We critically explore the performance of this model, showing that whilst the predictions fall short of the accuracy required to be chemically discriminative, the model provides valuable insights into understanding inorganic reactions. Through careful investigation of the model's failure modes, we explore the challenges that remain in the construction of forward inorganic reaction prediction models and suggest some pathways to tackle the identified issues.

One of the principal ways that material scientists understand and categorise materials is in terms of their symmetries. Crystal structure prototypes are assigned based on the presence of symmetrically equivalent sites known as Wyckoff positions. We show that a powerful coarse-grained representation of materials structures can be constructed from the Wyckoff positions by discarding information about their coordinates within crystal structures. One of the strengths of this representation is that it maintains the ability of structure-based methods to distinguish polymorphs whilst also allowing combinatorial enumeration akin to composition-based approaches. We construct an end-to-end differentiable model that takes our proposed Wyckoff representation as input. The performance of this approach is examined on a suite of materials discovery experiments showing that it leads to strong levels of enrichment in materials discovery tasks.

The research presented in this thesis highlights the promise of applying data-driven workflows and machine learning in materials discovery and development. This thesis concludes by speculating about promising research directions for applying machine learning within materials discovery.





Lee, Alpha Albert


Machine Learning, Materials Informatics


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge