Repository logo
 

Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications.

Published version
Peer-reviewed

Repository DOI


Loading...
Thumbnail Image

Type

Article

Change log

Authors

Abstract

We present an automatically generated dataset of 15,755 records that were extracted from 47,357 papers. These records contain water-splitting activity in the presence of certain photocatalysts, along with additional information about the chemical reaction conditions under which this activity was recorded. These conditions include any co-catalysts and additives that were present during water splitting, the length of time for which the photocatalytic experiment was conducted, and the type of light source used, including its wavelength. Despite the text extraction of such a wide range of chemical reaction attributes, the dataset afforded good precision (71.2%) and recall (36.3%). These figures-of-merit were calculated based on a random sample of open-access papers from the corpus. Mining such a complex set of attributes required the development of novel techniques in knowledge extraction and interdependency resolution, leveraging inter- and intra-sentence relations, which are also described in this paper. We present a new version (version 2.2) of the chemistry-aware text-mining toolkit ChemDataExtractor, in which these new techniques are included.

Description

Acknowledgements: J.M.C. is grateful for the BASF/Royal Academy of Engineering Research Chair in Data-Driven Molecular Engineering of Functional Materials, which includes PhD studentship support (for T.I.). The authors would like to thank Sebastian Martschat for helpful discussions. They are also indebted to the Argonne Leadership Computing Facility, which is a DOE Office of Science Facility, for use of its research resources, under contract No. DE-AC02-06CH11357.


Funder: BASF; doi: https://doi.org/10.13039/100004349

Keywords

4605 Data Management and Data Science, 46 Information and Computing Sciences, 34 Chemical Sciences

Journal Title

Sci Data

Conference Name

Journal ISSN

2052-4463
2052-4463

Volume Title

10

Publisher

Springer Science and Business Media LLC
Sponsorship
Royal Academy of Engineering (RCSRF1819\7\10)
RCUK | Science and Technology Facilities Council (STFC) (Fellowship support from the ISIS Neutron and Muon Source)