Repository logo

Perovskite- and Dye-Sensitized Solar-Cell Device Databases Auto-generated Using ChemDataExtractor.

Published version

Change log


Beard, Edward J 


The number of scientific publications reporting cutting-edge third-generation photovoltaic devices is increasing rapidly, owing to the pressing need to develop renewable-energy technologies that address the climate-change crisis. Consequently, the field could benefit from a central repository where photovoltaic-performance metrics, such as the power-conversion efficiency (η) are recorded. We present two automatically generated databases that contain photovoltaic properties and device material data for dye-sensitized solar cells (DSCs) and perovskite solar cells (PSCs), totalling 660,881 data entries representing 57,678 photovoltaic devices. The databases were generated by applying the text-mining toolkit ChemDataExtractor on a corpus of 25,720 articles. A multi-faceted evaluation, incorporating manual and automatic methods, was applied to ensure that the data contained therein were of the highest quality, with precision metrics ranging from 73.1% to 95.8%. The DSC database contains 475,045 entries representing 41,680 devices, and the PSC database contains 185,836 entries representing 15,818 devices. The databases are available in MongoDB and JSON formats, which can be queried in Python, R, Java and MATLAB for data-driven photovoltaic materials discovery.



Data Descriptor, /639/4077/4072/4062, /639/638/675, /639/301/299/946, /639/766/259, /639/638/630, data-descriptor

Journal Title

Sci Data

Conference Name

Journal ISSN


Volume Title



Springer Science and Business Media LLC
RCUK | Science and Technology Facilities Council (STFC) (Fellowship and student support from ISIS Neutron and Muon Source)