Repository logo
 

BatteryBERT: A Pretrained Language Model for Battery Database Enhancement.

Accepted version
Peer-reviewed

Change log

Abstract

A great number of scientific papers are published every year in the field of battery research, which forms a huge textual data source. However, it is difficult to explore and retrieve useful information efficiently from these large unstructured sets of text. The Bidirectional Encoder Representations from Transformers (BERT) model, trained on a large data set in an unsupervised way, provides a route to process the scientific text automatically with minimal human effort. To this end, we realized six battery-related BERT models, namely, BatteryBERT, BatteryOnlyBERT, and BatterySciBERT, each of which consists of both cased and uncased models. They have been trained specifically on a corpus of battery research papers. The pretrained BatteryBERT models were then fine-tuned on downstream tasks, including battery paper classification and extractive question-answering for battery device component classification that distinguishes anode, cathode, and electrolyte materials. Our BatteryBERT models were found to outperform the original BERT models on the specific battery tasks. The fine-tuned BatteryBERT was then used to perform battery database enhancement. We also provide a website application for its interactive use and visualization.

Description

Keywords

Journal Title

J Chem Inf Model

Conference Name

Journal ISSN

1549-9596
1549-960X

Volume Title

Publisher

American Chemical Society (ACS)
Sponsorship
Royal Academy of Engineering (RAEng) (RCSRF1819\7\10)
STFC (Unknown)
BASF/Royal Academy of Engineering, Christ College, Cambridge, DOE (contract No. DEAC02-06CH11357).