  • ItemOpen Access
    The Value of Languages
    Bennett, Wendy; Bennett, Wendy [0000-0002-2146-3165]
    Policy document from Cambridge Public Policy SRI. A report of a workshop held in Cambridge in October 2015 to discuss current deficiencies in UK language policy, to put forward proposals to address these, and to illustrate the strategic value of languages. Representatives from government departments and bodies included: Ministry of Defence, UK Trade and Investment, Department for Business, Innovation and Skills, Foreign and Commonwealth Office, Ofsted and the devolved administrations.
  • ItemOpen AccessPublished version Peer-reviewed
    Emo, love and god: making sense of Urban Dictionary, a crowd-sourced online dictionary
    (The Royal Society, 2018-05) Nguyen, Dong; McGillivray, Barbara; Yasseri, Taha; McGillivray, Barbara [0000-0003-3426-8200]
    The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the ‘wisdom of the crowd’ has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often unmonitored environment of such projects may make them susceptible to low-quality content. In this work, we focus on Urban Dictionary, a crowd-sourced online dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the dictionary’s voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation.
  • ItemOpen AccessAccepted version Peer-reviewed
    Language interference and inhibition in early and late successive bilingualism
    (Cambridge University Press (CUP), 2018) Peristeri, E; Tsimpli, IM; Sorace, A; Tsapkini, K; Tsimpli, Ianthi [0000-0001-6015-7526]
    The present study explores whether age of onset of exposure to the second language affects interference resolution at the grammatical gender level and whether cognitive functions contribute to interference resolution. Early and late successive Serbian–Greek bilinguals living in the second language context, along with monolinguals, performed a picture-word interference naming task in a single-language context and a non-verbal inhibition task. We found that gender interference from the first language was only present in late successive bilinguals. Early bilinguals exhibited no interference from the grammatical gender of their mother tongue and showed more enhanced inhibitory abilities than the rest of the groups in the non-verbal task. The distinct sizes of interference from the grammatical gender of the first language across the two bilingual groups is explained by early successive bilinguals’ more enhanced domain-general inhibitory processes in the resolution of between-language conflict at the grammatical gender level relative to late successive bilinguals.
  • ItemOpen AccessAccepted version Peer-reviewed
    Object Clitic production in monolingual and bilingual children with Specific Language Impairment: A comparison between elicited production and narratives
    (John Benjamins Publishing Company, 2017) Tsimpli, IM; Peristeri, E; Andreou, M; Tsimpli, Ianthi [0000-0001-6015-7526]
    Abstract Pronominal clitics are sensitive to both morphosyntax and discourse. Problems in clitic use could therefore stem from morphosyntactic or discourse management problems in children with SLI. Previous studies focused on 3rd person clitic use identifying morphosyntactic problems. We compare 1st with 3rd person clitic elicitation by monolingual and bilingual children with SLI to examine whether perspective-switching in the same task would affect performance. Elicited 3rd person clitics were further compared with clitic use in narratives to investigate the role of richer discourse context in clitic production. Perspective-taking was independently examined with first- and second-order Theory of Mind tasks. Bilingual were more accurate than monolingual children with SLI in 1st person clitics, in the use of unambiguous clitics in narratives and in second-order ToM reasoning. We conclude that bilingualism seems to enhance SLI children’s discourse use and perspective-taking strategies which, in turn, improve their use of clitics in context-sensitive conditions.
  • ItemOpen AccessAccepted version Peer-reviewed
    Language external and language internal factors in the acquisition of gender: the case of Albanian-Greek and English-Greek bilingual children
    (Informa UK Limited, 2020) Kaltsa, M; Prentza, A; Papadopoulou, D; Tsimpli, IM; Kaltsa, M [0000-0002-2422-7889]
    © 2017 Informa UK Limited, trading as Taylor & Francis Group The aim of this experimental study is to examine the development of gender assignment and gender agreement in bilingual Albanian-Greek and English-Greek children as well as the exploitation of gender cues on the noun ending in real and pseudo-nouns. Four gender tasks were designed, two targeting gender assignment (determiner + noun production) and two gender agreement (predicate adjective production). Performance is investigated in relation to the role of (positive) L1 transfer (Albanian vs. English), the role of the bilingual’s vocabulary knowledge in Greek as well the role of input factors including the monolingual/bilingual school contexts and the role of parental education as a proxy for socioeconomic status (SES). The results show a strong interaction between the bilinguals’ performance and their Greek vocabulary development and a negative link between gender accuracy and use of the other language.
  • ItemOpen AccessPublished version Peer-reviewed
    Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches.
    (BioMed Central, 2018-05-21) Crichton, Gamal; Guo, Yufan; Pyysalo, Sampo; Korhonen, Anna-Leena; Crichton, Gamal [0000-0002-3036-0811]
    Background: Link prediction in biomedical graphs has several important applications including predicting Drug-Target Interactions (DTI), Protein-Protein Interaction (PPI) prediction and Literature-Based Discovery (LBD). It can be done using a classifier to output the probability of link formation between nodes. Recently several works have used neural networks to create node representations which allow rich inputs to neural classifiers. Preliminary works were done on this and report promising results. However they did not use realistic settings like time-slicing, evaluate performances with comprehensive metrics or explain when or why neural network methods outperform. We investigated how inputs from four node representation algorithms affect performance of a neural link predictor on random- and time-sliced biomedical graphs of real-world sizes (∼6 million edges) containing information relevant to DTI, PPI and LBD. We compared the performance of the neural link predictor to those of established baselines and report performance across five metrics. Results: In random- and time-sliced experiments when the neural network methods were able to learn good node representations and there was a negligible amount of disconnected nodes, those approaches outperformed the baselines. In the smallest graph (∼15,000 edges) and in larger graphs with approximately 14% disconnected nodes, baselines such as Common Neighbours proved a justifiable choice for link prediction. At low recall levels (∼0.3) the approaches were mostly equal, but at higher recall levels across all nodes and average performance at individual nodes, neural network approaches were superior. Analysis showed that neural network methods performed well on links between nodes with no previous common neighbours; potentially the most interesting links. Additionally, while neural network methods benefit from large amounts of data, they require considerable amounts of computational resources to utilise them. Conclusions: Our results indicate that when there is enough data for the neural network methods to use and there are a negligible amount of disconnected nodes, those approaches outperform the baselines. At low recall levels the approaches are mostly equal but at higher recall levels and average performance at individual nodes, neural network approaches are superior. Performance at nodes without common neighbours which indicate more unexpected and perhaps more useful links account for this.
  • ItemOpen AccessAccepted version Peer-reviewed
    The role of segments and prosody in the identification of a speaker's dialect
    (Elsevier, 2018-05-01) Leemann, A; Kolly, MJ; Nolan, F; Li, Y; Nolan, Francis [0000-0002-8302-5726]
    The objective of this study is to investigate the role of segments, rhythm, and rhythm combined with intonation in the identification of a speaker's dialect. In a between-subjects design using three conditions, we tested 62 listeners (Zurich Swiss German) in a two-alternative-forced choice dialect identification experiment: in condition one, 21 listeners were asked to identify two dialects (Valais and Bern Swiss German) in unmorphed form. In condition two, 20 different listeners had to identify the same two dialects but with swapped speech rhythm, and in condition three, 21 different listeners had to identify the same dialects with swapped speech rhythm and intonation. The experiment showed that exchanging speech rhythm alone or speech rhythm combined with intonation had very little effect on the listeners’ dialect identification performance: listeners appear to use primarily segmental information in the identification process. Further results revealed that (a) superimposing the prosodic structure of one dialect (Bern Swiss German) onto another (Valais Swiss German) caused greater variability across some listeners than the other way around and that (b) identification performance varies as a function of sentence material used, i.e. how the sentences differ in segmental and prosodic make-up. We discuss implications for forensic phonetics, language and cognition, and automatic speech recognition.
  • ItemOpen AccessAccepted version Peer-reviewed
    Caused motion across child languages: a comparison of English, German, and French.
    (Cambridge University Press (CUP), 2018-11) Hickmann, Maya; Hendriks, Henriëtte; Harr, Anne-Katharina; Bonnet, Philippe; Hendriks, Henriette [0000-0001-9420-6816]
    Previous research on motion expression indicates that typological properties influence how speakers select and express information in discourse (Slobin, 2004; Talmy, 2000). The present study further addresses this question by examining the expression of caused motion by adults and children (three to ten years) in French (Verb-framed) vs. English and German (Satellite-framed). Participants narrated short animated cartoons showing an agent displacing objects and varying along several dimensions (Path, Manner). A significant increase with age was found in the number of expressed motion components in all languages, as well as an influence of Path (vertical > boundary crossing). However, at all ages, participants encoded more information in English and German than in French, where more variation and structural changes occurred with increasing age. These findings highlight both cognitive and typological factors impacting the expression of caused motion in development. Implications of our findings are sketched in the 'Discussion'.
  • ItemOpen AccessPublished version Peer-reviewed
    Severing telicity from result
    (Springer Science and Business Media LLC, 2018) Song, Chenchen; Song, Chenchen [0000-0002-3543-8489]
    This paper investigates the peculiar behaviors of resultative compound verbs in Dongying Mandarin, a previously unstudied Mandarin Chinese variety. Data from multiple syntactic contexts (e.g. completive, negation, future/irrealis, potential) show that resultative complements in this variety fall in two contrasting categories: atelic and telic. Atelic resultatives have full lexical tones and require a grammati- calized telic marker (liu) in various [+TELIC] contexts, whereas telic resultatives assume the neutral tone and prohibit liu in the same contexts. The theoretical dis- cussion begins with an evaluation of two neo-constructionist approaches, featuring event decomposition and Inner Aspect, and ends with a middle-way model combin- ing and adapting the two. The main proposal is that in Dongying Mandarin, telicity is not encoded in the resultative complement itself, but in a Low Inner Aspect position between the action and the result verbs, which turns the state denoted by the resultative complement into a telos of the complex event. I derive the surface compound verb via the Defective Goal theory (Roberts 2010) and analyze the tonal variation as Root allomorphy.
  • ItemOpen AccessAccepted version Peer-reviewed
    Control into infinitival relatives
    (Cambridge University Press (CUP), 2019-06) DOUGLAS, JAMIE
    This article focuses on a novel English construction involving control and infinitival relatives. Examples such as this is John's book to read have a head noun (book) modified by an infinitival relative clause (to read) and a prenominal possessor (John's). I argue that there is a control relation between the prenominal possessor and the PRO subject of the infinitival relative. I show that this control relation bears the structural hallmarks of obligatory control whilst at the same time permitting PRO to be interpreted as arbitrary. I discuss these empirical facts in the context of a syntactic, Agree-based theory of control.
  • ItemOpen AccessPublished version Peer-reviewed
    Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine.
    (BioMed Central, 2018-02-05) Chiu, Billy; Pyysalo, Sampo; Vulić, Ivan; Korhonen, Anna-Leena; Chiu, Billy [0000-0001-6683-3249]
    Background: Word representations support a variety of Natural Language Processing (NLP) tasks. The quality of these representations is typically assessed by comparing the distances in the induced vector spaces against human similarity judgements. Whereas comprehensive evaluation resources have recently been developed for the general domain, similar resources for biomedicine currently suffer from the lack of coverage, both in terms of word types included and with respect to the semantic distinctions. Notably, verbs have been excluded, although they are essential for the interpretation of biomedical language. Further, current resources do not discern between semantic similarity and semantic relatedness, although this has been proven as an important predictor of the usefulness of word representations and their performance in downstream applications. Results: We present two novel comprehensive resources targeting the evaluation of word representations in biomedicine. These resources, Bio-SimVerb and Bio-SimLex, address the previously mentioned problems, and can be used for evaluations of verb and noun representations respectively. In our experiments, we have computed the Pearson’s correlation between performances on intrinsic and extrinsic tasks using twelve popular state-of-the-art representation models (e.g. word2vec models). The intrinsic–extrinsic correlations using our datasets are notably higher than with previous intrinsic evaluation benchmarks such as UMNSRS and MayoSRS. In addition, when evaluating representation models for their abilities to capture verb and noun semantics individually, we show a considerable variation between performances across all models. Conclusion: Bio-SimVerb and Bio-SimLex enable intrinsic evaluation of word representations. This evaluation can serve as a predictor of performance on various downstream tasks in the biomedical domain. The results on Bio-SimVerb and Bio-SimLex using standard word representation models highlight the importance of developing dedicated evaluation resources for NLP in biomedicine for particular word classes (e.g. verbs). These are needed to identify the most accurate methods for learning class-specific representations. Bio-SimVerb and Bio-SimLex are publicly available.
  • ItemOpen AccessPublished version Peer-reviewed
    Investigating the cross-lingual translatability of VerbNet-style classification.
    (Springer Science and Business Media LLC, 2018) Majewska, Olga; Vulić, Ivan; McCarthy, Diana; Huang, Yan; Murakami, Akira; Laippala, Veronika; Korhonen, Anna; Majewska, Olga [0000-0003-4509-8817]; Huang, Yan [0000-0002-6879-0446]
    VerbNet-the most extensive online verb lexicon currently available for English-has proved useful in supporting a variety of NLP tasks. However, its exploitation in multilingual NLP has been limited by the fact that such classifications are available for few languages only. Since manual development of VerbNet is a major undertaking, researchers have recently translated VerbNet classes from English to other languages. However, no systematic investigation has been conducted into the applicability and accuracy of such a translation approach across different, typologically diverse languages. Our study is aimed at filling this gap. We develop a systematic method for translation of VerbNet classes from English to other languages which we first apply to Polish and subsequently to Croatian, Mandarin, Japanese, Italian, and Finnish. Our results on Polish demonstrate high translatability with all the classes (96% of English member verbs successfully translated into Polish) and strong inter-annotator agreement, revealing a promising degree of overlap in the resultant classifications. The results on other languages are equally promising. This demonstrates that VerbNet classes have strong cross-lingual potential and the proposed method could be applied to obtain gold standards for automatic verb classification in different languages. We make our annotation guidelines and the six language-specific verb classifications available with this paper.
  • ItemOpen AccessAccepted version Peer-reviewed
    L’expression des procès spatiaux causatifs chez les apprenants francophones du chinois : pousser ou entrer ?
    (John Benjamins Publishing Company) Arslangul, A; Hendriks, HPJ; Hickmann, M; Demagny, AC; Hendriks, Henriette [0000-0001-9420-6816]
    The present study examines French adult learners’ expressions of caused motion events in Chinese as a second language using the frameworks proposed by Talmy (1985, 1991, 2000). Productions are elicited by means of animated cartoons from 36 French learners of Chinese (24 intermediate, 12 advanced) as compared to 12 Chinese native speakers and 24 French native speakers. The participants’ productions are analyzed using three related measures: information focus (choice of information expressed), semantic density (amount of information expressed) and information locus (linguistic means used to express the information). Our results show (1) that intermediate level learners produce responses seemingly close to those produced by the French native speakers, and very different from Chinese native speakers: they have difficulties expressing multiple information within one grammatical sentence; (2) advanced learners move away from the source language patterns and show a clear progression towards the patterns of the target language, especially regarding information focus and semantic density; however, the linguistic means used by these advanced learners still differ from those of native Chinese speakers.
  • ItemOpen AccessAccepted version Peer-reviewed
    Dependency parsing of learner English
    (John Benjamins Publishing Company, 2018) Huang, Yan; Murakami, Akira; Alexopoulou, Theodora; Korhonen, Anna; Huang, Yan [0000-0002-6879-0446]
    Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre-annotation of a single parser – can introduce bias to parser evaluation. We propose an alternative annotation method which can control for the annotation bias. Secondly, we quantify the influence of learner errors on parsing errors, and identify the learner errors that impact on parsing most. Finally, we compare the performance of the parsers on learner English and native English. Our results have useful implications on how to select a standard parser for learner English.
  • ItemOpen AccessPublished version Peer-reviewed
    Distributed Representations of Lexical Sets and Prototypes in Causal Alternation Verbs
    Ponti, EM; Ježek, Elisabetta; Magnini, Bernardo; Ponti, Edoardo [0000-0002-6308-1050]
    Lexical sets contain the words filling an argument slot of a verb, and are in part determined by selectional preferences. The purpose of this paper is to unravel the properties of lexical sets through distributional semantics. We investigate 1) whether lexical set behave as prototypical categories with a centre and a periphery; 2) whether they are polymorphic, i.e. composed by subcategories; 3) whether the distance between lexical sets of different arguments is explanatory of verb properties. In particular, our case study are lexical sets of causative-inchoative verbs in Italian. Having studied several vector models, we find that 1) based on spatial distance from the centroid, object fillers are scattered uniformly across the category, whereas intransitive subject fillers lie on its edge; 2) a correlation exists between the amount of verb senses and that of clusters discovered automatically, especially for intransitive subjects; 3) the distance between the centroids of object and intransitive subject is correlated with other properties of verbs, such as their cross-lingual tendency to appear in the intransitive pattern rather than transitive one. This paper is noncommittal with respect to the hypothesis that this connection is underpinned by a semantic reason, namely the spontaneity of the event denoted by the verb.
  • ItemOpen AccessAccepted version Peer-reviewed
    Pragmatic differentiation of negative markers in the early stages of Jespersen’s cycle in North Germanic
    (John Benjamins Publishing Company, 2018-12) Blaxter, Tam; Willis, David; Blaxter, Tam [0000-0002-1466-8306]; Willis, David [0000-0003-0755-9248]
    This article investigates the pragmatic function of new negative markers during incipient renewal of negation (Jespersen’s cycle). It outlines a typology of such markers, suggesting a pathway by which they begin as specialized for use with discourse-old propositions and later expand to inferred propositions before finally becoming possible with discourse-new propositions. This framework is applied to an overlooked case of Jespersen’s cycle in North Germanic: replacement of early Norwegian ei(gi) ‘not’ by ekki (originally “nothing”) from 1250 to 1550. We document a sharp rise in frequency of ekki around 1425, suggesting that, until then, ekki had been restricted to negating discourse-old propositions. Once this constraint was lifted, ei(gi) and ekki competed directly, resulting in rapid replacement of ei(gi) by ekki. This typologically unusual direct replacement of a negator with no intervening doubling stage can be attributed to the new negator’s origin as a negative indefinite and the lack of negative concord in early Norwegian.
  • ItemOpen AccessPublished version Peer-reviewed
    Syntactic and Story Structure Complexity in the Narratives of High- and Low-Language Ability Children with Autism Spectrum Disorder.
    (Frontiers Media SA, 2017) Peristeri, Eleni; Andreou, Maria; Tsimpli, Ianthi M; Tsimpli, Ianthi [0000-0001-6015-7526]
    Although language impairment is commonly associated with the autism spectrum disorder (ASD), the Diagnostic Statistical Manual no longer includes language impairment as a necessary component of an ASD diagnosis (American Psychiatric Association, 2013). However, children with ASD and no comorbid intellectual disability struggle with some aspects of language whose precise nature is still outstanding. Narratives have been extensively used as a tool to examine lexical and syntactic abilities, as well as pragmatic skills in children with ASD. This study contributes to this literature by investigating the narrative skills of 30 Greek-speaking children with ASD and normal non-verbal IQ, 16 with language skills in the upper end of the normal range (ASD-HL), and 14 in the lower end of the normal range (ASD-LL). The control group consisted of 15 age-matched typically-developing (TD) children. Narrative performance was measured in terms of both microstructural and macrostructural properties. Microstructural properties included lexical and syntactic measures of complexity such as subordinate vs. coordinate clauses and types of subordinate clauses. Macrostructure was measured in terms of the diversity in the use of internal state terms (ISTs) and story structure complexity, i.e., children's ability to produce important units of information that involve the setting, characters, events, and outcomes of the story, as well as the characters' thoughts and feelings. The findings demonstrate that high language ability and syntactic complexity pattern together in ASD children's narrative performance and that language ability compensates for autistic children's pragmatic deficit associated with the production of Theory of Mind-related ISTs. Nevertheless, both groups of children with ASD (high and low language ability) scored lower than the TD controls in the production of Theory of Mind-unrelated ISTs, modifier clauses and story structure complexity.
  • ItemOpen AccessAccepted version Peer-reviewed
    Ein Mischmasch aus Deutsch und Französisch: Ideological tensions in young people’s discursive constructions of Luxembourgish
    (Equinox, 2019-05-02) Bellamy, JP; Horner, Kristine
    Luxembourg often has been classified as a ‘triglossic’ country in sociolinguistic literature, due to Luxembourgish being used predominantly for spoken functions and French and German for written functions. However, language use in late modern Luxembourg is characterized by increased levels of spoken French coupled with the growing presence of written Luxembourgish in the public sphere, thus altering certain long-standing patterns of language use. In this context, Luxembourgish is often framed as an important marker of authenticity and national identity in language ideological debates. At the same time, the ideological positioning of Luxembourgish as the national language stands in tension with varying levels of uncertainty regarding writing conventions in Luxembourgish, particularly in more formal contexts. Based on the analysis of metalinguistic comments from the focus group data, this article examines how the participants discursively construct Luxembourgish in their negotiation between positioning Luxembourgish as the national language whilst also describing it as not being a fully-fledged standardized language. On a broader scale, this paper contributes to language ideological research that explores the construction of national languages as well as the relationship between the standard language ideology and the one-nation one-language ideology.
  • ItemOpen AccessPublished version Peer-reviewed
    Slurs, truth-value judgements, and context sensitivity
    (Slovak Academic Press Ltd, 2018-02-08) Sileo, RB
    Cappelen and Lepore (2005) claim that the English language contains a basic and limited set of context-sensitive expressions, as only expressions within this set pass the truth-related tests that they propose to single out context-sensitive from context-insensitive words. In this paper I argue that racial and ethnic slurs also pass Cappelen and Lepore’s context sensitivity tests and that, as a result, slurs should also be seen as context-sensitive expressions in a truth-related sense.
  • ItemOpen AccessAccepted version Peer-reviewed
    Pragmatics and philosophy: In search of a paradigm
    (Walter de Gruyter GmbH, 2018) Jaszczolt, KM; Jaszczolt, Katarzyna [0000-0001-7911-2985]
    AbstractThere is no doubt that pragmatic theory and philosophy of language are mutually relevant and intrinsically connected. The main question I address in this paper is how exactly they are interconnected in terms of (i) their respective objectives, (ii) explanans – explanandum relation, (iii) methods of enquiry, and (iv) drawing on associated disciplines. In the introductory part I attempt to bring some order into the diversity of use of such labels as philosophical logic, philosophical semantics, philosophical pragmatics, linguistic philosophy, or philosophy of linguistics, among others. In the following sections I focus on philosophical pragmatics as a branch of philosophy of language (pragmaticsPPL) and the trends and theories it gave rise to, discussing them against the background of methodology of science and in particular paradigms and paradigm shifts as identified in natural science. In the main part of the paper I address the following questions:How is pragmaticsPPLto be delimited?How do pragmatic solutions to questions about meaning fare vis-à-vis syntactic solutions? Is there a pattern emerging?and, relatedly,What are the future prospects for pragmaticsPPLin theories of natural language meaning?I conclude with a discussion of the relation between pragmaticsPPLand functionalism, observing that contextualism has to play a central role in functionalist pragmatics at the expense of minimalism and sententialism.