Repository logo
 

A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor.

Published version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Authors

Sierepeklis, Odysseas  ORCID logo  https://orcid.org/0000-0002-5102-1018

Abstract

An auto-generated thermoelectric-materials database is presented, containing 22,805 data records, automatically generated from the scientific literature, spanning 10,641 unique extracted chemical names. Each record contains a chemical entity and one of the seminal thermoelectric properties: thermoelectric figure of merit, ZT; thermal conductivity, κ; Seebeck coefficient, S; electrical conductivity, σ; power factor, PF; each linked to their corresponding recorded temperature, T. The database was auto-generated using the automatic sentence-parsing capabilities of the chemistry-aware, natural language processing toolkit, ChemDataExtractor 2.0, adapted for application in the thermoelectric-materials domain, following a rule-based sentence-simplification step. Data were mined from the text of 60,843 scientific papers that were sourced from three scientific publishers: Elsevier, the Royal Society of Chemistry, and Springer. To the best of our knowledge, this is the first automatically-generated database of thermoelectric materials and their properties from existing literature. The database was evaluated to have a precision of 82.25% and has been made publicly available to facilitate the application of data science in the thermoelectric-materials domain, for analysis, design, and prediction.

Description

Journal Title

Scientific Data

Conference Name

Journal ISSN

2052-4463
2052-4463

Volume Title

9

Publisher

Nature Research

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International
Sponsorship
EPSRC (EP/T517847/1)
EPSRC (EP/R513180/1)