Theses - Theoretical and Applied Linguistics
Permanent URI for this collection
Browse
Recent Submissions
Item Open Access The relationship between multilingualism and autism: a bidirectional approach to a complex issueCrockford, SarahThe relationship between multilingualism and autism is still a vastly under-explored topic. The research that has so far been published shows a positive relationship between multilingualism and cognitive abilities in autism (Gonzalez-Barrero & Nadig, 2018; Uljarevi ́c, Katsos, Hudry, & Gibson, 2016), replicating previous research on non-autistic children and adults (Bialystok, 2011). Furthermore, when surveyed, autistic adults have responded positively about learning multiple languages (Digard, Sorace, Stanfield, & Fletcher-Watson, 2020). However, there is no research that currently captures the choices autistic adults make when using and switching between their languages. Furthermore, even though positive relationships between multilingualism and cognition have been established, how multilingual language usage relates to autism is still poorly understood. Therefore, this thesis seeks to answer the following questions: • What multilingual language choices do autistic adults make, compared to their non- autistic counterparts? • Does multilingual language use predict differences in autistic traits, both in autistic children and in autistic and non-autistic adults? The first question is answered in Chapter 6, by investigating the self-reported multilingual language usage of autistic adults, compared to non-autistic adults. Differences between autistic and non-autistic participants in this chapter included, for example, the frequency with which they choose to switch between languages with different interlocutors and the self-reported effort of code-switching, reported by autistic adults as more strenuous than non-autistic adults. Chapters 2 to 5 and 7 answer the second research question by examining the association be- tween language usage on autistic traits in autistic children and autistic and non-autistic adults across five separately collected data-sets. Chapters 2 to 5 and 7 conclude that a higher degree of multilingual usage associates with lower autistic traits in autistic children and non-autistic adults but not in autistic adults. By using multinational data from multiple, independently collected sources, this thesis provides substantive evidence of a relationship between multilingual language usage in autistic traits. Therefore, this research presents a significant contribution to current understandings of multilingualism and autism. Finally, as outlined in Chapter 8, the findings in this thesis lay the foundation for future investigations into the relationship between multilingual language usage and autism.Item Open Access Out of Action? – Re-evaluating Methodological Challenges in Embodied Semantics ResearchHeine, JuliaContrasting with disembodied theories of the mind, the Embodied Semantics Approach proposes that meaning inevitably needs to be grounded in action and perception. Over the last twenty years, an impressive body of evidence has been put forward in its favour. Particularly the seemingly highly reliable Action-Sentence Compatibility Effect (ACE), suggesting a congruency effect between an overt action and the action implied by a sentence, does now seem to be an agreed-upon fact of psycholinguistic research. Similarly, response time effects associated with important semantic psycholinguistic variables such as concreteness and valence have frequently been explained through the embodied grounding of semantic representations. This dissertation presents a series of challenges to apparent certainties: Whereas the theoretical discussion emphasises the need to overcome the sharp dichotomies between embodied and disembodied perspectives on cognition, Embodied and Associationist Accounts of semantics, as well as modal and amodal representations, the empirical section outlines shortcomings of different behavioural and neuroimaging approaches to the investigation of Embodied Semantics Theories, particularly in relation to the causal involvement of action, perception, and emotion systems. Even more crucially, the ACE as a key finding of the Embodied Semantics Approach as well as valence and concreteness effects have encountered serious replication issues in recent years (e.g. Papesh, 2015; Morey at al., 2022; Kousta et al., 2009; Brysbaert et al., 2016). The experimental part of this dissertation therefore considers possible design-related explanations for the observed challenges concerning these behavioural paradigms. Experiment 1 investigates whether overlapping and therefore potentially conflicting temporal and action information embodied on the same back–front spatial axis in the ACE paradigm could contribute to previous replication problems. Experiment 2 emphasises the vastly varying nature of the congruent condition of previous studies concerning the differential contribution of the effector and spatial information to the ACE, whereas Experiments 3 and 4 examine the potential influence of individual differences (Sensory Processing Sensitivity (SPS)) on the successful replication of these well-known effects. However, the evidence for the ACE as well as valence and concreteness effects remains weak. Are embodiment effects such as the Action-Sentence Compatibility Effect indeed out of action? Implications for Embodied Semantics Theories will be discussed.Item Open Access Sensitivity to (sub)lexical cues as a function of cognitive profile and language abilitiesSchwarz, JuliaTo achieve efficient language comprehension, multiple linguistic cues have to be integrated rapidly by readers and listeners when they encounter morphologically simple and complex words. However, there is no compelling evidence to suggest that this process is identical for everyone. The aim of this thesis is to explore systematic inter-individual differences in the integration of form-based cues (i.e. phonological and orthographic) with morphological, semantic, and lexico-semantic cues during written and spoken word processing, as evidenced by behavioural and MEG data. The first part of the thesis shows that the processing of visual words and pseudowords in English varies considerably between individuals. Study 1 reveals that individual differences in vocabulary and spelling not only modulate overall processing speed and precision, but also readers’ processing strategy as evidenced by variable sensitivity to orthographic, morphological, and lexico-semantic information during lexical decision. Building on this, study 2 shows that processing speed and accuracy (as indicators of cognitive differences) also affect readers’ sensitivity to morphological information in visual priming. The second part of the thesis extends these findings to the auditory modality. Study 3 demonstrates for the first time that individual differences in processing speed also affect cue sensitivity in spoken word processing as seen in listeners’ strategic use of morpho-phonological and semantic information during an auditory judgment task. Study 4 expands the findings from language-unimpaired English speakers to Spanish speakers with a common language difficulty (dyslexia). Combined behavioural and MEG data show that phonological deficits impact the neural encoding not only of phonological information, but also lexico-semantic information. The findings from this thesis provide comprehensive evidence that the extent to which individuals rely on form-based information compared to other linguistic cues systematically varies as a function of cognitive profile and language abilities, with important implications for both theoretical accounts and language remediation practices.Item Open Access The Syntactic Structures of RelativisationDouglas, James AlexanderThis thesis examines the syntactic structures of restrictive relativisation in English. English exhibits a variety of different relative constructions with different syntactic properties. We pursue the hypothesis that these properties are accounted for by the systematic variation in the structural size of English relatives. We review the major competing analyses of relative clauses in the literature, with a particular focus on reconstruction effects, ultimately arguing in favour of the Matching Analysis (Chapter 2). The rest of the thesis is dedicated to the ‘size hypothesis’ and its application to finite, infinitival and reduced relatives in English, with cross-linguistic comparisons being made with Italian, Welsh, Malagasy and French where appropriate. We show that there is systematic variation in the structural size of finite and infinitival clausal relatives, i.e. variation in the degree of articulation of their C-domains, and uncover a categorial distinctness effect in the English C-domain (Chapter 3). Differences in structural size combined with anti-locality are argued to provide a novel perspective on subject-object asymmetries in relative clauses and other related phenomena in English, with a close formal similarity between relativisation and topicalisation emerging as an important result (Chapter 4). We describe and analyse a novel construction involving control into infinitival relatives which offers independent yet complementary insights into the structure of the English C-domain (Chapter 5). We argue that systematic size variation also plays a key role in accounting for the properties of reduced relatives, including their restrictions on auxiliaries and participles, the interpretation of the present/progressive participle and the subject restriction, whilst evidence from high adverbs indicates close similarities between the clausal and clause-medial left peripheries (Chapter 6). This thesis thus contributes a range of novel observations, generalisations and analyses with both empirical and theoretical implications for the nature of variation both within and across languages.Item Open Access The MOAN-MOWN and MOAN-GOOSE mergers in Lowestoft English: Perception and productionButcher, Kerri-AnnThis dissertation presents a sociophonetic analysis of vowel phonemes descended from Middle English /ɔ:/ and /ɔu/ (MOAN-MOWN) in Lowestoft English, the East Anglian variety spoken in the UK’s easternmost town. These phonemes started to merge under a single GOAT vowel in the vast majority of English varieties during the 16th or 17th century, but this collapse is reported as progressing in some East Anglian English varieties only since the 1970s. Whether the merger is complete in Lowestoft, which straddles – both linguistically and administratively – the regional divide between varieties with a merger versus a distinction, remains uncertain. I interrogate this issue and empirically evaluate an early claim by Trudgill (1988b) that the merging of MOAN and MOWN may have been delayed in some varieties of East Anglian English by a pre-existing merger between the MOAN and GOOSE vowels. This dissertation utilises and statistically models data from both production and perception for a holistic view of merger. Production data come from 30 interviews in which production of minimal pairs, word lists and spontaneous speech was elicited for both MOAN-MOWN and MOAN-GOOSE. Dynamic formant trajectories are analysed statistically using generalised additive mixed models to identify merger in production. Three perception tasks – AX discrimination, minimal pair judgements and vowel-continua categorisation – are used to identify perceptual merger, and detect any asymmetries synonymous with near merger. Results suggest that long-term resistance to the MOAN-MOWN merger is derived from structural incompatibility between the incoming GOAT form and the East Anglian system. A MOAN-MOWN merger is rapidly progressing amongst younger speakers, but remains incomplete. The MOAN-GOOSE merger is found to have never reached completion, with full distinction in the youngest generation in contrast to full or near merger in older speakers, who variably distinguish these via lowered F2 in MOAN. These findings show near/variable merger to play a powerful role in mitigating sound change.Item Embargo Investigating the role of an indigenised variety of English in the acquisitional and sociolinguistic contexts of the Malaysian ecologySie, Samantha; Sie, Samantha [0000-0002-9795-5043]The realm of New Englishes offers enriching avenues to explore the interplay between language acquisition and sociolinguistic influences in linguistically diverse ecologies. Yet research into this interdisciplinary arena remains lacking. Accordingly, this thesis addresses this paradigm gap by focusing on the Malaysian ecology. One of the three empirical studies conducted as part of this project is i) a CASE STUDY which examines the morphosyntactic properties of an indigenised variety of English viz., Colloquial Malaysian English (CME). The data generated from naturalistic conversations came from two pairs of adult Malaysians with different L1 backgrounds (i.e., Malay and Chinese). While many of the non-standard features supplied could be explained by substrate influence, there were also features resembling general second language (L2) behaviours and creative innovation. The MAIN STUDY adopts a concurrent embedded design, which comprises ii) an ACQUISITIONAL STUDY and iii) a SOCIOLINGUISTIC STUDY. The ACQUISITIONAL STUDY investigates the roles of the first language (L1) and CME in the ultimate acquisition of finiteness in Standard English (StE). The adult participants recruited for this study were 145 Malaysians and 30 British (control). Malaysians who acquired English as (one of) their L1(s) (L1-MalE(+)) were predicted to have less difficulty than their L1-Malay and L1-Chinese peers and perform more similarly to the British English (BritE) monolinguals. This is because, despite the prevalence of CME in the local environment, L1-MalE(+) learners would merely have to reset the optional features of finiteness in CME to obligatory, as required in StE. Meanwhile, L1-Malay and L1-Chinese learners would be faced with an additional learnability burden of acquiring finiteness as a new functional feature, given its absence in their L1s. Findings from a grammaticality judgement task and narrative task revealed that although the Malaysian cohort behaved statistically differently from the L1-BritE control, the L1-MalE(+) groups outperformed the L1-Chinese and L1-Malay groups across the board. That said, the L1-Malay group fared considerably better than its L1-Chinese counterpart and was about on par with the L1-MalE(+) peers. These findings indicated clear L1 effects modulated by typological proximity. Meanwhile, the SOCIOLINGUISTIC STUDY explores Malaysians’ attitudinal behaviours towards CME and StE. The same participants from the acquisitional study undertook a sociolinguistic survey administered for this study. Findings revealed that the participants were non-discriminatory towards CME and StE, and that they were aware of when to use these varieties across different social settings. Altogether, this thesis demonstrates the facilitative role of CME in the acquisition of StE, and concurrently vindicates the functional importance of CME and StE as legitimate varieties in the Malaysian milieu.Item Embargo Translation Policies of Minoritised Languages through Organised Activism: a Comparative Study of Catalan and WelshMoreno-Rivero, JavierThis thesis investigates the role of states and activist organisations in the development of translation policies aimed at promoting minoritised languages. Activists have been crucial in securing language rights in Catalonia and Wales over the last six decades, and their campaigns and initiatives have resulted in supportive legislation. While recent scholarship has recognised translation as a tool for social cohesion, little attention has been paid to the extent to which translation rights can contribute to language revitalisation and normalisation processes. This research identifies intersectional areas where translation is embedded in these processes by conducting a contrastive examination of the scope of existing policies and the priorities of language activists. Using a comparative approach, it investigates the work of five organisations that advocate for the rights of Catalan and Welsh speakers. This thesis includes a cross-sectorial analysis of translation management, practices, and ideologies in both Wales and Catalonia, combining policy analysis of international, supranational, and national legislation with an ethnographic study, involving observations and individual interviews. This study explores the macro and meso-level support for minoritised languages in each jurisdiction, as well as the micro-level policies developed by activists, revealing a misalignment between existing legislation and practices. While translation has traditionally been looked at from a communicative perspective, I argue that, in the context of minoritised languages, translation policies must encompass sociolinguistic principles of language equality, rights, and revitalisation. Therefore, as well as arguing for the importance of studying the societal implications of translation in order to formulate sociolinguistically-informed policies, my work makes two significant contributions. First, it contains the first cross-sectional analysis of minoritised language translation policies, arguing for an inclusive principle of translation equality. Secondly, it asserts that translation policy, frequently overlooked in language policy studies, is a dynamic concept with which to investigate tensions in diglossic societies and should be considered when researching the democratic participation of linguistic minorities. In this way, the thesis argues for the relevance of considering activists’ translation ideologies in shaping inclusive policies that address the needs of a specific linguistic community. Overall, this research argues that involving language advocates is critical in the formulation of translation policies supporting linguistic minorities.Item Open Access Putting PPs into order - Understanding prepositionality in the predominantly postpositional language Lule SamiAjer, Hanna DanboltThis thesis investigates PP-internal word-order variation in an endangered Uralic language, Lule Sami, which is indigenous to parts of Northern Norway and Sweden. The study focuses on the factors underlying prepositional occurrences in this predominantly postpositional language. It is mainly based on spoken data containing more than 4,000 occurrences of adpositions, elicited from eleven native speakers of Lule Sami in Divtasvuodna/Tysfjord, the Lule Sami heartland in Norway. The study aims to contribute to our knowledge of Lule Sami, which is severely understudied. I argue that prepositionality in this language can mark that (part of) the PP is either contrastive or belongs to a system of conventionalised alternatives, which I term a PREDEFINED ALTERNATIVE SET. The findings from Lule Sami may be relevant to the study of related languages as well, as there appear to be many similarities with other Sami languages and Finnic languages. Furthermore, in identifying factors related to semantics and information structure which may influence word order in the PP, and in seeking to account for the variation formally, it is hoped that the study might more generally add to our understanding of AMBIPOSITIONS – adpositions with variable word order (Libert 2006). Lastly, the study highlights the importance of studying variation in endangered languages on its own terms, as the systematic nature of the PP-internal word-order variation in Lule Sami and the similarities with related languages suggest that the prepositional usages cannot simply be attributed to influence from prepositional contact languages. Chapter 1 provides background information about Lule Sami language and history, whereafter chapter 2 outlines the theoretical background of this study. This pertains to Lule Sami grammar, adpositions in related languages, potentially relevant syntactic, semantic, and information-structural factors, and language contact and change. My methodology is laid out in chapter 3. The data and findings from the study are presented in chapter 4, drawing on the distinctions introduced in chapter 2. Chapter 5 puts the findings about PP-internal word-order variation in Lule Sami into a wider context, comparing them to what we know about prepositional usages in other Sami languages and in Finnic languages, and looking at how they may be formally accounted for within Minimalist syntax. I propose that prepositional order in Lule Sami arises through fronting to a Grounding projection in the style of Wiltschko (2021), and that this movement is triggered by definiteness and/or salience features. I argue that the fronting’s original function was to reinforce the relation expressed by the PP, yielding contrastive readings, but that it has also gained an extended function of marking membership in a predefined alternative set. Lastly, I discuss how my findings about prepositionality in Lule Sami relate to theories of language contact and change, drawing on the discussed parallels with related languages and the predictions made by the formal approach chosen. Chapter 6 concludes the thesis.Item Controlled Access On the Evaluation and Modelling of Context-sensitive Lexical SemanticsLiu, QianchuA word can change its meaning in different contexts. The evaluation and the modelling of such contextual effect on lexical meaning are pivotal to natural language understanding. This thesis sets out to answer the following two-fold research question: (1) how can we design a reliable evaluation framework that accurately reflects the challenges in contextual lexical semantics? And (2) how can we improve contextual word representations in a data-efficient manner? To address the first question, the thesis starts with systematic analysis on the existing benchmark datasets and stresses the importance to assess the complex word-context interaction. I propose that the evaluation of crosslingual correspondence can effectively assess this interaction. Specifically, I introduced two datasets, i.e. BTSR (Bilingual Token-level Sense Retrieval) and AM2ICO (Adversarial and Multilingual Meaning in Context), to evaluate crosslingual word-in-context correspondences that require the accurate crosslingual modelling of both the target words and their context. BTSR and AM2ICO complement each other in task formulations and language/word coverage. Finally, I show that evaluating and modelling crosslingual word-in-context representations have direct benefit for downstream applications. In a case study, I apply the BTSR task formulation to improve machine translation for rare senses. While improving the current state-of-the-art contextual models typically involves labelled data which is not easy to obtain especially for low-resource languages, the second research question of this thesis aims to elicit better word in context representations from state-ofthe-art contextual models without resorting to labelled data. As an outcome, I designed two novel unsupervised methods (MIRRORWIC and STATICTRANSFORM) that improve either within-word or inter-word contextualisation of the pretrained contextual models both monolingually and crosslingually. In sum, the thesis contributes to the field of computational lexical semantics by providing challenging and accurate evaluation frameworks and efficient modelling techniques that avoid the need for labelled data.Item Controlled Access Cross Target Generalization for Stance DetectionConforti, CostanzaStance detection is a popular NLP task which consists in automatically inferring the opinion expressed in a text with respect to a given target. Cross-target generalization is a known problem in stance detection, where systems tend to perform poorly when exposed to targets unseen during training. Given that data annotation is expensive and time-consuming, finding ways to leverage other sources of knowledge to improve cross-target stance detection can offer great benefits. In this thesis, I suggest to improve the robustness of cross-target stance detection in three settings.First, I explore weak supervision through synthetically annotated samples as a means to provide knowledge about unseen targets to a stance detection system. To this end, I design a simple and inexpensive framework and show experimentally that integrating synthetic data is helpful for cross-target generalization. Secondly, I investigate cross-genre stance detection, where knowledge from annotated tweets is leveraged to improve news stance detection on targets unseen during training. Due to their peculiar stylistic characteristics, transferring knowledge between samples belonging to different genres is non-trivial. To allow the model to capture the useful stance-specific features, I propose to treat the task adversarially. Thirdly, I study multi-modality as a means to enhance cross-target generalization. Specifically, I design a robust multi-task BERT-based architecture that combines textual input with high-frequency intra-day time series from stock market prices. I show experimentally and through detailed result analysis that the proposed system benefits from financial information, and achieves state-of-the-art results: this demonstrates that the combination of multiple input signals is effective for cross-target stance detection, and opens interesting research directions for future work. In addition, I created the first multi-task, multi-genre and multi-modal resource for stance detection. It provides two aligned textual signals, composed of carefully selected and expert-annotated tweets and news articles; moreover, it contains aligned financial signal in the form of fine-grained intra-day stock market prices variations. This large and integrated resource provides a comprehensive framework for robust training and fair model evaluation of the above-mentioned algorithms. I released the entire resource for future research.Item Open Access Non-native perception, production, and lexical processing of toneLameris, Tim; Lameris, Tim [0000-0002-1365-3022]In this dissertation, I investigate how and why adults differ in the ease with which they learn tone in a non-native language. I examine the extent to which individual variability in tone learning facility depends on factors attributable to a learner’s first language, namely the function of pitch for lexical distinctions (‘L1 tonal status’) and the shapes of native tonal and intonational contrasts (‘tone type’), as well as extralinguistic factors, namely musical experience, working memory, and pitch perception aptitude. In doing so, I aim to provide a novel and integral account of the multiplicity and diversity of factors that influence non-native tone learning facility. The core of this dissertation consists of four empirical data chapters in the shape of journal manuscripts, which each zoom in on non-native tone learning through different lenses. Chapter 1 provides a general introduction. Chapter 2 reports a lab-based study in which 41 Mandarin and English speakers took part in a tone categorization and word identification task to investigate individual variability in pre-lexical and lexical tone perception. Chapter 3 reports two further lab-based studies to investigate pre-lexical and lexical tone processing in the spoken modality to zoom in on individual variability in production. Chapter 4 provides a comparative analysis between the perception and production tasks to discuss differences and similarities between performance in the listening and speaking modalities. Chapter 5 reports a web-based study which involved 114 speakers from typologically different languages (Dutch, Swedish, Japanese, and Thai) and which reassesses the degree to which L1-specific and extralinguistic factors determine tone perception and lexical processing. Chapter 6 provides a general discussion and conclusions. The findings from these empirical studies show that individuals differ greatly in the ease with which they learn non-native tones, particularly at a lexical level of tone processing. Both L1-specific and extralinguistic factors explain why some individuals learn tones with more ease than others do, but these factors interact with one another in dynamic ways to determine tone learning facility. An ‘L1-Modulated Domain-General Account’ is proposed to formally describe the empirical findings from these studies: individual variability in tone learning facility is best captured by extralinguistic factors, but the relative effect of these factors may be modulated by a learner’s language background.Item Open Access The Case for Contextual Linguistic Diversity: Language Profiling, Multilingual Identity, and High-Level Listening Comprehension Ability in South African University StudentsWigdorowitz, Mandy; Wigdorowitz, Mandy [0000-0003-1023-9368]Global multilingualism is undoubtedly increasing, yet some contexts are linguistically more diverse than others purely as a result of the nature of linguistically diverse communities and, by proxy, passive linguistic exposure that individuals may experience by being immersed in the contextual milieu. How the sociolinguistic context of language use contributes to an individual's linguistic repertoire has yet to be fully conceptualised or quantitatively investigated within the language sciences. To meet this goal, I first explore this overlooked contextual linguistic feature through the development, validation, and application of a holistic language profiling measure, the Contextual Linguistic Profile Questionnaire (CLiP-Q). To this end, three research studies are presented. First, I develop and validate a psychometric tool, the Contextual and Individual Linguistic Diversity Questionnaire (CILD-Q as part of the larger CLiP-Q), which measures multilingual exposure and endorsement as pertaining to particular linguistic contexts. From an exploratory factor analysis with data from 353 participants (62.9% South African, 37.1% UK), a three-factor solution best describes the structure of the CILD-Q: Multilingualism in Context (contextual use and societal practice of multiple languages within a community), Multilingualism in Practice (direct and indirect linguistic exchanges and conversational interaction), Linguistic Diversity Promotion (societal and governmental endorsement of linguistic variation). The CILD-Q positively correlates with a metric of the social diversity of language use (language entropy) further evincing its convergent validity, and item scores corresponding to the three factors have sufficient reliability (α’s > .80). Second, I apply the CILD-Q to evaluate whether people who live in a multilingual context (South Africa) report greater contextual linguistic diversity than those from a predominantly unilingual context (England), as well as evaluate the role of language entropy, lingualism status (monolingualism, bilingualism, multilingualism), socio-economic status, and code-switching practice on this effect. Results demonstrate that contextual linguistic diversity differs between nations with South Africans scoring higher. The promotion of multilingualism is dependent on SES only in the England group, where England participants with higher SES score higher on Linguistic Diversity Promotion. Lingualism status is not contextually comparable when measured categorically, and code-switching accounts for linguistic features of South Africans. Finally, a positive relationship emerged between language entropy and contextual linguistic diversity, suggesting complementarity between measures that capture the social influence of language experience. Third is the application of the CLiP-Q to contextualise and appropriately categorise the language experience of South Africans completing tertiary education to investigate high-level text comprehension ability. The ability to draw inferences from auditory and written input is crucial for comprehension and successful educational outcomes, and is especially relevant in linguistically diverse contexts where learners have heterogeneous language backgrounds but are educated in the predominant language of the country. Such a case is South Africa, where tertiary education is almost exclusively received through the medium of English, though it is not the first language (L1) for the majority of citizens. Accordingly, the third study assesses the role of language experience (L1, multilingualism, and contextual linguistic diversity) and inhibitory control on high-level listening comprehension in undergraduate multilingual South Africans with advanced English proficiency. Results indicate that L1-English participants were more efficient and accurate at monitoring and revising their listening comprehension, while participants with higher contextual linguistic diversity were less efficient at monitoring and less accurate at revising the comprehension content. Furthermore, individual differences in inhibitory control were associated with differences in revision where participants with lower inhibitory control took longer to update the content and replace their initial interpretation for a new one. Participants’ L1 appears to supersede their advanced English proficiency on highly complex listening comprehension involving revision. In this dissertation, I demonstrate that the CLiP-Q is a holistic instrument with which to measure and quantify contextual linguistic diversity which, in turn, is relevant to a range of higher order linguistic skills essential for academic development.Item Open Access Phonological acquisition in a multidialectal and multicultural context: The case of bilingual preschoolers in SingaporeSim, Jasper; Sim, Jasper [0000-0002-0245-4087]This thesis seeks to better understand early phonological acquisition in a context in which linguistic input can be especially varied and variable. It focuses on preschoolers’ acquisition of Singapore English, a variety that emerged from long-term language contact, within a multilingual, multicultural setting that is linguistically and sociolinguistically complex. The four individual studies herein explore the variation in the English child-directed speech (CDS) of Singaporean caregivers and its possible connections with or effects on the outcomes of phonological acquisition in their preschool children. The introductory chapter (Chapter 1) describes the sociolinguistic setting and reviews key factors that contribute to variable development and outcomes in early bilingual phonological acquisition, with a focus on input quality (i.e. specific phonetic or phonological properties of the input). Chapter 2 details the caregiver-child speech corpus that was developed for the production studies in this thesis. The four studies in this thesis centre on two phonological features of Singapore English, namely the (non)release of coda oral stops and L-allophony. The first study (Chapter 3) reveals inter-adult variation in the release of English coda stops by ethnically Chinese caregivers, which is shown to be reflected in their children’s production. The other three studies focus on the realisations of coda /l/ in Singapore English, namely vocalised-l, dark-l and clear-l. Through a matched-guise test, Chapter 4 demonstrates that these three variants are imbued with diverse socio-indexical meanings, and their interpretation and evaluation are dependent on and shaped by the hearer’s individual experiences with the social world. Chapter 5 explores whether, how and why Malay caregivers vary their English coda /l/ in their CDS. The study reveals socially-conditioned variation between maternal and paternal CDS, and within maternal CDS. Finally, Chapter 6 examines the bilingual development of English and Malay laterals in Malay children, in order to understand how they negotiate the multiple allophones of /l/ in their caregivers’ input, and between the competing input models of their caregivers and significant others. Chapter 7 reiterates and synthesises the findings, and at the same time, discusses six key implications that can be drawn from the four studies: (1) inter-speaker variation can be difficult to predict or model, (2) children acquire the differential speech properties in the input, (3) variation and/or inconsistencies in the input can affect the building of contrastive categories, (4) variation in the input can be complex, (5) there are multiple moderators of language outcomes, and (6) multiculturalism as a social force can be a moderator. The chapter then shows how usage-based accounts of language acquisition, specifically the exemplar model, may be useful in accounting for the variable outcomes observed in this thesis and in bilingual acquisition more generally. It concludes with some limitations and avenues for future work.Item Open Access The Potential Influence of Crosslinguistic Similarity on Lexical Transfer: Examining Vocabulary Use in L2 EnglishShatz, Itamar; Shatz, Itamar [0000-0001-8916-9010]Learners’ native language (L1) influences their knowledge and use of second language (L2) vocabulary, a phenomenon known as lexical transfer. Past research on this shows that learners’ L1 influences their L2 word choices, and that lexical similarity—which relates to cognancy—between L1 words and their L2 counterparts facilitates the processing of the L2 words, particularly during the early stages of L2 acquisition, and makes speakers more likely to use the L2 words in spontaneous productions. To extend past research, the present research investigates whether crosslinguistic similarity influences L2 vocabulary use in a task-based, English-as-a-foreign language educational setting. Specifically, it investigates whether increased similarity between languages as a whole increases L2 lexical diversity, and whether increased similarity between L1 words and their L2 counterparts increases the use of the L2 words. It investigates this using two matching learner samples, containing 8,500 and 6,390 English texts, written in response to 95 and 71 tasks, by speakers of 9 typologically diverse L1s, in the A1–B2 CEFR range of L2 proficiency. Surprisingly, lexical similarity between the L1 and the L2 as a whole did not influence L2 lexical diversity, regardless of learners’ L2 proficiency. Likewise, lexical similarity between corresponding L1-L2 words did not influence the use of the L2 words, again regardless of L2 proficiency. Conversely, there were strong task effects on both L2 lexical diversity and L2 word choice. These findings show that the facilitative effect of crosslinguistic lexical similarity (especially the cognate facilitation effect) is constrained, and suggest that communicative needs and other task effects can override positive lexical transfer. This highlights the role of situational factors in crosslinguistic influence, and raises questions regarding when and how these and similar factors can override language transfer, for example when it comes to different types of transfer (e.g., positive vs. negative, or lexical vs. syntactic). In addition, this research contains substantial insights into related topics, such as the developmental patterns of L2 lexical diversity, accounting for task effects in language assessment, measuring crosslinguistic distance, and using online platforms to develop language corpora.Item Open Access Neutral Tone in Mandarin: Representation and Interaction with Utterance-level ProsodyZhang, YixinIn Standard Mandarin, there are syllables that do not carry any of the four citation tones (T1: High-level tone, T2: Mid-rising tone, T3: Low-convex tone and T4: High-falling tone), and they are said to have a neutral tone (NT). These syllables are usually shorter, lighter, prosodically grouped with the preceding CT-bearing syllables. These characteristics of NT have led to a prevailing view that it has no underlying phonological specification. However, research has focused more on how the surface pitch variations of NT are realized rather than the underlying representation of NT. In contrast, morphological, sociolinguistic and diachronic work on NT has suggested that NT may not be a homogeneous entity. In this thesis, I provide acoustic and psycholinguistic evidence that there are two types of NT, Intrinsic NT and Derived NT. Intrinsic NT refers to morphemes that were lexicalized as tone-deleted, unstressed syllables even before the formation of the four CTs of modern Mandarin. Derived NT refers to morphemes derived from the CTs via stress-related tone-deletion. In Part A, the phonological representation of Intrinsic and Derived NT is explored through two production and two processing experiments. The results show that Intrinsic NT is likely to have an underspecified tonal target while Derived NTs are underlyingly CTs. In addition, both subtypes of NT are metrically light, unlike heavy CTs. Part B explores the interaction between NTs and utterance-level prosody in production and perception experiments. NT-bearing syllables have lengthening patterns under focus similar to CT-bearing syllables, in contrast to the realization of unstressed syllables in English. In perception, the identification of intonation (Statement vs. Question) on Intrinsic NT was similar to Derived NT. When compared to CTs, the NTs elicit less bias towards question than T4, and higher accuracy than T2, which may result from their simpler surface representations.Item Open Access Injecting Inductive Biases into Distributed Representations of Text(2021-11-12) Prokhorov, Victor; Prokhorov, Victor [0000-0002-5843-8756]Distributed real-valued vector representations of text (a.k.a. embeddings), learned by neural networks, encode various (linguistic) knowledge. To encode this knowledge into the embeddings the common approach is to train a large neural network on large corpora. There is, however, a growing concern regarding the sustainability and rationality of pursuing this approach further. We depart from the mainstream trend and instead, to incorporate the desired properties into embeddings, use inductive biases. First, we use Knowledge Graphs (KGs) as a data-based inductive bias to derive the semantic representation of words and sentences. The explicit semantics that is encoded in a structure of a KG allows us to acquire the semantic representations without the need of employing a large amount of text. We use graph embedding techniques to learn the semantic representation of words and the sequence-to-sequence model to learn the semantic representation of sentences. We demonstrate the efficacy of the inductive bias for learning embeddings for rare words and the ability of sentence embeddings to encode topological dependencies that exist between entities of a KG. Then, we explore the amount of information and sparsity as two key (data-agnostic) inductive biases to regulate the utilisation of the representation space. We impose these properties with Variational Autoencoders (VAEs). First, we regulate the amount of information encoded in a sentence embedding via constraint optimisation of a VAE objective function. We show that increasing amount of information allows to better discriminate sentences. Afterwards, to impose distributed sparsity we design a state-of-the-art Hierarchical Sparse VAE with a flexible posterior which captures the statistical characteristics of text effectively. While sparsity, in general, has desired computational and statistical representational properties, it is known to compensate task performance. We illustrate that with distributed sparsity, task performance could be maintained or even improved. The findings of the thesis advocate further development of inductive biases that could mitigate the dependence of representation learning quality on large data and model sizes.Item Open Access Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language ProcessingMajewska, Olga; Majewska, Olga [0000-0003-4509-8817]Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs. To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.Item Open Access The acquisition of finiteness in English by child second language learners in instructed contexts: age of onset and L1 effects(2021-07-24) Ntalli, AthinaThis thesis examines the acquisition of finiteness in English by child L2 learners by investigating the impact of the age of onset and the role of the learners’ L1 on their L2 acquisition. Following Meisel’s hypothesis that children older than 4 will resemble adult L2 acquisition in the domain of inflectional morphology, I investigated how two groups of children of different L1s and older than 4 learn the features of tense and agreement and whether accuracy would be declining as an effect of an older age of onset. Participants were 73 Chinese and 74 Russian learners who were aged either 9 or 12 at time of testing and had age of onset of learning English at ages 4 and 7 respectively. Children were all EFL learners recruited from EF (English First) private afternoon English language schools in Shanghai and Moscow, where children attended classes for a few hours a week. To assess children’s performance, I employed two types of tasks: two elicited production tasks whose prompts involved 3SG-agreement and past tense contexts (TEGI), and a freer type of elicitation prompting stories based on a sequence of pictures (MAIN). Data analysis demonstrated low accuracy, high numbers of omissions, asymmetries in the acquisition of morphemes, overgeneralisation of the progressive tense in 3SG-habitual contexts, and use of the periphrastic structure ‘is + verb(x)’. These results show that L2 children resemble aL2 acquisition supporting Meisel’s hypothesis. The empirical findings are interpreted in light of two opposing views that account for the optionality in verb inflection in L2 acquisition; the Full Access to UG and the Representational Deficit approaches; as argued data are more consistent with a representational deficit account. Older children consistently outperformed younger ones; as features are inaccessible, older learners compensate by relying on their higher cognitive abilities, learning strategies and metalinguistic skills, while younger children are mostly implicit learners using more the periphrastic structure as immersed children do. The periphrastic structure appears to be a stage in L2 development of verb morphology in English which denotes the emergence of finiteness as a category being triggered semantically through interpretable features of be. This is a first stage toward activation of uninterpretable features. Finally, signs of L1 influence became more pronounced in older learners; it was the older children showing more L1 effects, a finding which is again more consistent with a representational deficit account.Item Open Access The production and perception of domain-initial strengthening in Seoul, Busan, and Ulsan KoreanYoo, Kayeon; Yoo, Kayeon [0000-0002-7642-987X]Korean exhibits one of the most consistent examples of the cross-linguistic phenomenon of domain-initial strengthening (hereafter DIS; T. Cho & Keating, 2001; Keating, Cho, Fougeron, & Hsu, 2004). DIS is defined as temporal and/or spatial enhancement of segmental articulation in the initial position of prosodic domains. Broadly, this dissertation serves as a detailed case study of the production patterns and the perceptual benefits of this phenomenon. The recent findings of denasalisation and devoicing of the initial nasals in Korean (Young Shin Kim, 2011; Yoo, 2015a) suggest that there is a striking parallelism between the lenis stops /p, t, k/ and the nasal consonants /m, n/ in their patterns of DIS. Nevertheless, we currently lack an account that captures this parallelism. In addition, there is disagreement over the categorical nature of lenis stop voicing (S.-A. Jun, 1993; Docherty, 1995) and denasalisation (Yoshida, 2008; Young Shin Kim, 2011). Despite the obvious similarities between the arguably discrete processes of lenis stop voicing and denasalisation, and the kind of gradient effects widely reported for DIS, there has been no explicit investigation of the links among them. Thus, I examined the hypothesis that DIS, operating in the phonetic component, has given rise to the categorical rules of lenis stop voicing and denasalisation in the phrase-level phonology through rule scattering, as predicted by the theory of the life cycle of phonological processes (Bermúdez-Otero & Trousdale, 2012; Turton, 2014). Recordings were collected in Seoul, Busan, and Ulsan, and various auditory and acoustic analyses were conducted to examine the phonetic variation of the relevant stops. The study adopted the three-city design as these varieties were expected to be at different stages in the life cycle, particularly with regard to the stabilisation of denasalisation. In the second part of this dissertation, I conducted a perception experiment to investigate if listeners are able to use DIS patterns as a cue to a prosodic boundary. According to the results, Seoul showed the most advanced patterns in the stabilisation of DIS. As predicted by rule scattering, speakers who showed evidence of categorical lenis stop voicing and/or denasalisation also showed an overlaid effect of a gradient phonetic process. The perception study strongly supported the hypothesis that listeners exploit DIS cues to detect the beginning of a prosodic domain. Based on these findings, this dissertation offers a unified account of lenis stop voicing, denasalisation, and DIS within a single framework, offering insights into the nature of DIS as well as its functional role in prosodic parsing.Item Open Access Inductive Bias and Modular Design for Sample-Efficient Neural Language LearningPonti, Edoardo; Ponti, Edoardo [0000-0002-6308-1050]Most of the world's languages suffer from the paucity of annotated data. This curbs the effectiveness of supervised learning, the most widespread approach to modelling language. Instead, an alternative paradigm could take inspiration from the propensity of children to acquire language from limited stimuli, in order to enable machines to learn any new language from a few examples. The abstract mechanisms underpinning this ability include 1) a set of in-born inductive biases and 2) the deep entrenchment of language in other perceptual and cognitive faculties, combined with the ability to transfer and recombine knowledge across these domains. The main contribution of my thesis is giving concrete form to both these intuitions. Firstly, I argue that endowing a neural network with the correct inductive biases is equivalent to constructing a prior distribution over its weights and its architecture (including connectivity patterns and non-linear activations). This prior is inferred by "reverse-engineering" a representative set of observed languages and harnessing typological features documented by linguists. Thus, I provide a unified framework for cross-lingual transfer and architecture search by recasting them as hierarchical Bayesian neural models. Secondly, the skills relevant to different language varieties and different tasks in natural language processing are deeply intertwined. Hence, the neural weights modelling the data for each of their combinations can be imagined as lying in a structured space. I introduce a Bayesian generative model of this space, which is factorised into latent variables representing each language and each task. By virtue of this modular design, predictions can generalise to unseen combinations by extrapolating from the data of observed combinations. The proposed models are empirically validated on a spectrum of language-related tasks (character-level language modelling, part-of-speech tagging, named entity recognition, and common-sense reasoning) and a typologically diverse sample of about a hundred languages. Compared to a series of competitive baselines, they achieve better performances in new languages in zero-shot and few-shot learning settings. In general, they hold promise to extend state-of-the-art language technology to under-resourced languages by means of sample efficiency and robustness to the cross-lingual variation.