Show simple item record

dc.contributor.authorJin, Peng
dc.contributor.authorCarroll, John
dc.contributor.authorWu, Yunfang
dc.contributor.authorMcCarthy, Diana
dc.date.accessioned2017-10-03T07:43:42Z
dc.date.available2017-10-03T07:43:42Z
dc.date.issued2012-8-15
dc.identifier.citationPeng Jin, John Carroll, Yunfang Wu, and Diana McCarthy, “Distributional Similarity for Chinese: Exploiting Characters and Radicals,” Mathematical Problems in Engineering, vol. 2012, Article ID 347257, 11 pages, 2012. doi:10.1155/2012/347257
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/267601
dc.description.abstractDistributional Similarity has attracted considerable attention in the field of natural language processing as an automatic means of countering the ubiquitous problem of sparse data. As a logographic language, Chinese words consist of characters and each of them is composed of one or more radicals. The meanings of characters are usually highly related to the words which contain them. Likewise, radicals often make a predictable contribution to the meaning of a character: characters that have the same components tend to have similar or related meanings. In this paper, we utilize these properties of the Chinese language to improve Chinese word similarity computation. Given a content word, we first extract similar words based on a large corpus and a similarity score for ranking. This rank is then adjusted according to the characters and components shared between the similar word and the target word. Experiments on two gold standard datasets show that the adjusted rank is superior and closer to human judgments than the original rank. In addition to quantitative evaluation, we examine the reasons behind errors drawing on linguistic phenomena for our explanations.
dc.rightsAll Rights Reserveden
dc.rights.urihttps://www.rioxx.net/licenses/all-rights-reserved/en
dc.titleDistributional Similarity for Chinese: Exploiting Characters and Radicals
dc.typeArticle
dc.date.updated2017-07-13T08:35:51Z
dc.description.versionPeer Reviewed
dc.language.rfc3066en
dc.rights.holderCopyright © 2012 Peng Jin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.identifier.doi10.17863/CAM.13540
rioxxterms.versionofrecord10.1155/2012/347257


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record