Show simple item record

dc.contributor.authorDong, Qingyang
dc.contributor.authorCole, Jacqui
dc.date.accessioned2022-06-07T08:16:16Z
dc.date.available2022-06-07T08:16:16Z
dc.date.issued2022-05-03
dc.identifier.issn2052-4463
dc.identifier.other35504897
dc.identifier.otherPMC9065101
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/337825
dc.description.abstractLarge-scale databases of band gap information about semiconductors that are curated from the scientific literature have significant usefulness for computational databases and general semiconductor materials research. This work presents an auto-generated database of 100,236 semiconductor band gap records, extracted from 128,776 journal articles with their associated temperature information. The database was produced using ChemDataExtractor version 2.0, a 'chemistry-aware' software toolkit that uses Natural Language Processing (NLP) and machine-learning methods to extract chemical data from scientific documents. The modified Snowball algorithm of ChemDataExtractor has been extended to incorporate nested models, optimized by hyperparameter analysis, and used together with the default NLP parsers to achieve optimal quality of the database. Evaluation of the database shows a weighted precision of 84% and a weighted recall of 65%. To the best of our knowledge, this is the largest open-source non-computational band gap database to date. Database records are available in CSV, JSON, and MongoDB formats, which are machine readable and can assist data mining and semiconductor materials discovery.
dc.languageeng
dc.publisherSpringer Science and Business Media LLC
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.sourcenlmid: 101640192
dc.sourceessn: 2052-4463
dc.titleAuto-generated database of semiconductor band gaps using ChemDataExtractor.
dc.typeArticle
dc.date.updated2022-06-07T08:16:16Z
prism.issueIdentifier1
prism.publicationNameSci Data
prism.volume9
dc.identifier.doi10.17863/CAM.85234
dcterms.dateAccepted2022-02-08
rioxxterms.versionofrecord10.1038/s41597-022-01294-6
rioxxterms.versionVoR
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by/4.0/
dc.contributor.orcidDong, Qingyang [0000-0002-8782-7638]
dc.contributor.orcidCole, Jacqui [0000-0002-1552-8743]
dc.identifier.eissn2052-4463
cam.issuedOnline2022-05-03


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
Except where otherwise noted, this item's licence is described as Attribution 4.0 International