Repository logo
 

A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor.

Published version
Peer-reviewed

Type

Article

Change log

Authors

Sierepeklis, Odysseas  ORCID logo  https://orcid.org/0000-0002-5102-1018

Abstract

An auto-generated thermoelectric-materials database is presented, containing 22,805 data records, automatically generated from the scientific literature, spanning 10,641 unique extracted chemical names. Each record contains a chemical entity and one of the seminal thermoelectric properties: thermoelectric figure of merit, ZT; thermal conductivity, κ; Seebeck coefficient, S; electrical conductivity, σ; power factor, PF; each linked to their corresponding recorded temperature, T. The database was auto-generated using the automatic sentence-parsing capabilities of the chemistry-aware, natural language processing toolkit, ChemDataExtractor 2.0, adapted for application in the thermoelectric-materials domain, following a rule-based sentence-simplification step. Data were mined from the text of 60,843 scientific papers that were sourced from three scientific publishers: Elsevier, the Royal Society of Chemistry, and Springer. To the best of our knowledge, this is the first automatically-generated database of thermoelectric materials and their properties from existing literature. The database was evaluated to have a precision of 82.25% and has been made publicly available to facilitate the application of data science in the thermoelectric-materials domain, for analysis, design, and prediction.

Description

Keywords

4605 Data Management and Data Science, 46 Information and Computing Sciences, 34 Chemical Sciences

Journal Title

Scientific Data

Conference Name

Journal ISSN

2052-4463
2052-4463

Volume Title

9

Publisher

Nature Research
Sponsorship
EPSRC (EP/T517847/1)