Auto-generated database of semiconductor band gaps using ChemDataExtractor.
View / Open Files
Publication Date
2022-05-03Journal Title
Sci Data
ISSN
2052-4463
Publisher
Springer Science and Business Media LLC
Volume
9
Issue
1
Language
eng
Type
Article
This Version
VoR
Metadata
Show full item recordCitation
Dong, Q., & Cole, J. M. (2022). Auto-generated database of semiconductor band gaps using ChemDataExtractor.. Sci Data, 9 (1) https://doi.org/10.1038/s41597-022-01294-6
Abstract
Large-scale databases of band gap information about semiconductors that are curated from the scientific literature have significant usefulness for computational databases and general semiconductor materials research. This work presents an auto-generated database of 100,236 semiconductor band gap records, extracted from 128,776 journal articles with their associated temperature information. The database was produced using ChemDataExtractor version 2.0, a 'chemistry-aware' software toolkit that uses Natural Language Processing (NLP) and machine-learning methods to extract chemical data from scientific documents. The modified Snowball algorithm of ChemDataExtractor has been extended to incorporate nested models, optimized by hyperparameter analysis, and used together with the default NLP parsers to achieve optimal quality of the database. Evaluation of the database shows a weighted precision of 84% and a weighted recall of 65%. To the best of our knowledge, this is the largest open-source non-computational band gap database to date. Database records are available in CSV, JSON, and MongoDB formats, which are machine readable and can assist data mining and semiconductor materials discovery.
Keywords
Networking and Information Technology R&D (NITRD)
Identifiers
35504897, PMC9065101
External DOI: https://doi.org/10.1038/s41597-022-01294-6
This record's URL: https://www.repository.cam.ac.uk/handle/1810/337825
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk