Decoding sentiment from distributed representations of sentences

Ponti, EM; Vulić, I; Korhonen, A

Decoding sentiment from distributed representations of sentences

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/269691

Repository DOI

https://doi.org/10.17863/CAM.10796

Files

Accepted version (312.24 KB)

Type

Conference Object

Authors

Ponti, EM

Vulić, I

Korhonen, A

Abstract

Distributed representations of sentences have been developed recently to represent their meaning as real-valued vectors. However, it is not clear how much information such representations retain about the polarity of sentences. To study this question, we decode sentiment from unsupervised sentence representations learned with different architectures (sensitive to the order of words, the order of sentences, or none) in 9 typologically diverse languages. Sentiment results from the (recursive) composition of lexical items and grammatical strategies such as negation and concession. The results are manifold: we show that there is no `one-size-fits-all' representation architecture outperforming the others across the board. Rather, the top-ranking architectures depend on the language and data at hand. Moreover, we find that in several cases the additive composition model based on skip-gram word vectors may surpass supervised state-of-art architectures such as bidirectional LSTMs. Finally, we provide a possible explanation of the observed variation based on the type of negative constructions in each language.

Keywords

Sentence representations, Distributional semantics, Linguistic typology, Sentence polarity detection, Multilinguality

Journal Title

*SEM 2017 - 6th Joint Conference on Lexical and Computational Semantics, Proceedings

Conference Name

Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Publisher

Association for Computational Linguistics

Publisher DOI

https://doi.org/10.18653/v1/s17-1003

Rights

Attribution 4.0 International

Sponsorship

European Research Council (648909)

Collections

Scholarly Works - Theoretical and Applied Linguistics