Literature mining on scientific publications of battery materials
Repository URI
Repository DOI
Change log
Authors
Abstract
The scientific literature continues to be the single most important source of critical information for accessing the latest research findings. However, the number of scientific publications is constantly increasing at an ever-increasing rate, which makes it increasingly more difficult to keep up to date with all the literature. Recent advances in natural-language-processing (NLP) techniques have enabled scientists to obtain useful information more efficiently from unstructured literature text, e.g., battery research papers. To this end, this thesis aims at achieving three goals, i.e., i) creating large-scale battery-specific databases, ii) using advanced NLP models to extract and summarise useful information from the literature, and iii) developing user-friendly literature-mining toolkits.
Chapter 1 introduces the importance, opportunities, and challenges of literature-mining studies on battery materials. Chapter 2 reviews the roadmap to literature mining and the relevant methodology. Chapter 3 introduces the first auto-generated large-scale battery database extracted from the literature based on a rule-based approach using ChemDataExtractor. Chapter 4 releases the first property-specific transformer-based BatteryBERT model, which enables us to perform data extraction in a completely unsupervised fashion. The model is further embedded in the battery-aware text-mining toolkit, BatteryDataExtractor (Chapter 5), to increase user-friendliness for automated data extraction. Chapter 6 extends these literature-mining studies to automatic book generation, in the form of the released toolkit, ChemDataWriter, which was used to auto-generate books that summarise research. Chapter 7 concludes this thesis and outlines future research topics that would complement and expand the work into a broader context of literature-mining research on scientific publications of battery materials.
