Source sentence simplification for statistical machine translation
Journal Title
Computer Speech & Language
ISSN
0885-2308
Publisher
Elsevier
Volume
45
Pages
221-235
Language
English
Type
Article
This Version
VoR
Metadata
Show full item recordCitation
Hasler, E., de Gispert, A., Stahlberg, F., Waite, A., & Byrne, W. (2016). Source sentence simplification for statistical machine translation. Computer Speech & Language, 45 221-235. https://doi.org/10.1016/j.csl.2016.12.001
Abstract
Long sentences with complex syntax and long-distance dependencies pose difficulties for machine translation systems. Short sentences, on the other hand, are usually easier to translate. We study the potential of addressing this mismatch using text simplifi- cation: given a simplified version of the full input sentence, can we use it in addition to the full input to improve translation? We show that the spaces of original and simplified translations can be effectively combined using translation lattices and compare two decoding approaches to process both inputs at different levels of integration. We demonstrate on source-annotated portions of WMT test sets and on top of strong baseline systems combining hierarchical and neural translation for two language pairs that source simplification can help to improve translation quality.
Keywords
hierarchical machine translation, text simplification, neural machine translation
Relationships
Is supplemented by: https://doi.org/10.17863/CAM.5868
Sponsorship
This work was supported by the EPSRC grant Improving Target Language Fluency in Statistical Machine Translation, grant number EP/L027623/1.
Funder references
EPSRC (EP/L027623/1)
Embargo Lift Date
2100-01-01
Identifiers
External DOI: https://doi.org/10.1016/j.csl.2016.12.001
This record's URL: https://www.repository.cam.ac.uk/handle/1810/261713
Rights
Licence:
http://creativecommons.org/licenses/by/4.0/