ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes
Accepted version
Repository URI
Repository DOI
Change log
Authors
Abstract
Knowledge in the chemical domain is often disseminated graphically via means of chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that comprise chemical diagrams and symbols. While intuitively understood by any chemist, like most graphical representations, such drawings are not easily understood by machines; this poses a challenge in the context of data extraction. Currently available tools are limited in their scope of extraction, and require manual pre-processing, thus slowing down the speed of data extraction. We present a new tool, ReactionDataExtractor v2.0, which uses a combination of neural networks and symbolic artificial intelligence to effectively remove this barrier. We have evaluated our tool on a test set composed of reaction schemes that were taken from open-source journal articles and realized F1 score metrics between 75-96%. These evaluation metrics can be further improved by tuning our object-detection models to a specific chemical subdomain, thanks to a data-driven approach that we have adopted with synthetically generated data. The system architecture of our tool is modular, which allows it to balance speed and accuracy, to afford an autonomous, high-throughput solution for image-based chemical data extraction.