Show simple item record

dc.contributor.authorHasler, Evaen
dc.contributor.authorde, Gispert Adriàen
dc.contributor.authorStahlberg, Felixen
dc.contributor.authorWaite, Aurelienen
dc.contributor.authorByrne, Williamen
dc.date.accessioned2016-10-11T16:04:35Z
dc.date.available2016-10-11T16:04:35Z
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/260714
dc.descriptionThis data set contains subsets of English-German test sets from the Workshop for Machine Translation (WMT) which have been annotated with manual text simplification information on the source side in the form of gap begin and gap end symbols (<gb>, <ge>). The data was tokenized and truecased using the processing scripts distributed with the Moses SMT system. The source simplifications were produced by workers recruited on the crowdsourcing platform Crowdflower (https://www.crowdflower.com). We asked workers to simplify a sentence by deleting words and punctuation, while trying to retain the most important information in the shortened sentence. Their performance was controlled using test questions and a second Crowdflower task which asked workers to identify bad simplifications from the first task. The outcomes of the second task were aggregated by combining an agreement score and the average worker trust score for each simplification. We selected randomly from the remaining simplifications with a combined score of at least 0.5.en
dc.description.sponsorshipEPSRC [EP/L027623/1]en
dc.formattext editoren
dc.rightsAttribution-ShareAlike 4.0 International*
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/*
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.subjecttext simplificationen
dc.subjectmachine translationen
dc.titleResearch data supporting “Source Sentence Simplification for Statistical Machine Translation”en
dc.typeDataset
dc.identifier.doi10.17863/CAM.5868
datacite.iscitedby.urlhttps://www.repository.cam.ac.uk/handle/1810/261713
rioxxterms.licenseref.urihttp://creativecommons.org/licenses/by-sa/4.0/http://creativecommons.org/licenses/by-sa/4.0/en
dcterms.formattxt, taren
rioxxterms.typeOtheren
pubs.funder-project-idEPSRC (EP/L027623/1)
datacite.issupplementto.doi10.1016/j.csl.2016.12.001en


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-ShareAlike 4.0 International
Except where otherwise noted, this item's licence is described as Attribution-ShareAlike 4.0 International