Show simple item record

dc.contributor.authorCarter, Brandonen
dc.contributor.authorBileschi, Maxwellen
dc.contributor.authorSmith, Jamieen
dc.contributor.authorSanderson, Theoen
dc.contributor.authorBryant, Drewen
dc.contributor.authorBelanger, Daviden
dc.contributor.authorColwell, Lucyen
dc.date.accessioned2020-10-20T23:30:11Z
dc.date.available2020-10-20T23:30:11Z
dc.date.issued2020-08en
dc.identifier.issn1066-5277
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/311722
dc.description.abstractIn many application domains, neural networks are highly accurate and have been deployed at large scale. However, users often do not have good tools for understanding how these models arrive at their predictions. This has hindered adoption in fields such as the life and medical sciences, where researchers require that models base their decisions on underlying biological phenomena rather than peculiarities of the dataset. We propose a set of methods for critiquing deep learning models and demonstrate their application for protein family classification, a task for which high-accuracy models have considerable potential impact. Our methods extend the Sufficient Input Subsets (SIS) technique, which we use to identify subsets of features in each protein sequence that are alone sufficient for classification. Our suite of tools analyzes these subsets to shed light on the decision-making criteria employed by models trained on this task. These tools show that while deep models may perform classification for biologically relevant reasons, their behavior varies considerably across the choice of network architecture and parameter initialization. While the techniques that we develop are specific to the protein sequence classification task, the approach taken generalizes to a broad set of scientific contexts in which model interpretability is essential.
dc.format.mediumPrint-Electronicen
dc.languageengen
dc.rightsAll rights reserved
dc.rights.uri
dc.titleCritiquing Protein Family Classification Models Using Sufficient Input Subsets.en
dc.typeArticle
prism.endingPage1231
prism.issueIdentifier8en
prism.publicationDate2020en
prism.publicationNameJournal of computational biology : a journal of computational molecular cell biologyen
prism.startingPage1219
prism.volume27en
dc.identifier.doi10.17863/CAM.58812
dcterms.dateAccepted2019-11-01en
rioxxterms.versionofrecord10.1089/cmb.2019.0339en
rioxxterms.versionAM
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserveden
rioxxterms.licenseref.startdate2020-08en
dc.contributor.orcidColwell, Lucy [0000-0003-3148-0337]
dc.identifier.eissn1557-8666
rioxxterms.typeJournal Article/Reviewen
pubs.funder-project-idSimons Foundation (598399)
cam.orpheus.successMon Oct 26 07:30:25 GMT 2020 - Embargo updated*
rioxxterms.freetoread.startdate2021-08-31


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record