Show simple item record

dc.contributor.authorBaker, Simonen
dc.contributor.authorKorhonen, Anna-Leenaen
dc.contributor.authorPyysalo, Sen
dc.date.accessioned2017-12-07T15:29:24Z
dc.date.available2017-12-07T15:29:24Z
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/270037
dc.description.abstractMethods based on deep learning approaches have recently achieved state-of-the-art performance in a range of machine learning tasks and are increasingly applied to natural language processing (NLP). Despite strong results in various established NLP tasks involving general domain texts, here is only limited work applying these models to biomedical NLP. In this paper, we consider a Convolutional Neural Network (CNN) approach to biomedical text classification. Evaluation using a recently introduced cancer domain dataset involving the categorization of documents according to the well-established hallmarks of cancer shows that a basic CNN model can achieve a level of performance competitive with a Support Vector Machine (SVM) trained using complex manually engineered features optimized to the task. We further show that simple modifications to the CNN hyperparameters, initialization, and training process allow the model to notably outperform the SVM, establishing a new state of the art result at this task. We make all of the resources and tools introduced in this study available under open licenses from https://cambridgeltl.github.io/cancer-hallmark-cnn/ .
dc.description.sponsorshipThe first author is funded by the Commonwealth Scholarship and the Cambridge Trust. This work is supported by Medical Research Council grant MR/M013049/1 and the Google Faculty Award.
dc.rightsAttribution 4.0 International*
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleCancer Hallmark Text Classification Using Convolutional Neural Networksen
dc.typeConference Object
dc.identifier.doi10.17863/CAM.12420
rioxxterms.versionVoR*
rioxxterms.licenseref.urihttp://creativecommons.org/licenses/by/4.0/en
dc.contributor.orcidBaker, Simon [0000-0002-0998-438X]
rioxxterms.typeConference Paper/Proceeding/Abstracten
pubs.funder-project-idMedical Research Council (MR/M013049/1)
pubs.conference-nameProceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016)en
pubs.conference-start-date2016-12-16en


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
Except where otherwise noted, this item's licence is described as Attribution 4.0 International