BatteryBERT: A Pretrained Language Model for Battery Database Enhancement.

Change log

A great number of scientific papers are published every year in the field of battery research, which forms a huge textual data source. However, it is difficult to explore and retrieve useful information efficiently from these large unstructured sets of text. The Bidirectional Encoder Representations from Transformers (BERT) model, trained on a large data set in an unsupervised way, provides a route to process the scientific text automatically with minimal human effort. To this end, we realized six battery-related BERT models, namely, BatteryBERT, BatteryOnlyBERT, and BatterySciBERT, each of which consists of both cased and uncased models. They have been trained specifically on a corpus of battery research papers. The pretrained BatteryBERT models were then fine-tuned on downstream tasks, including battery paper classification and extractive question-answering for battery device component classification that distinguishes anode, cathode, and electrolyte materials. Our BatteryBERT models were found to outperform the original BERT models on the specific battery tasks. The fine-tuned BatteryBERT was then used to perform battery database enhancement. We also provide a website application for its interactive use and visualization.

Journal Title
J Chem Inf Model
Conference Name
Journal ISSN
Volume Title
American Chemical Society (ACS)
Royal Academy of Engineering (RAEng) (RCSRF1819\7\10)
STFC (Unknown)
BASF/Royal Academy of Engineering, Christ College, Cambridge, DOE (contract No. DEAC02-06CH11357).