Understanding language and attention: brain-based model and neurophysiological experiments Massimiliano Garagnani Wolfson College, Cambridge, UK This dissertation is submitted for the degree of Doctor of Philosophy, University of Cambridge Submitted: December 2008 i Preface The work described within this thesis was conducted at the Medical Research Council, Cognition and Brain Sciences Unit (MRC-CBSU), Cambridge, UK during the period 2005–2008 under the supervision of Prof. Friedemann Pulvermüller (MRC- CBSU) and Dr. Thomas Wennekers, Centre for Theoretical and Computational Neuroscience, University of Plymouth, UK. This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration, except where specifically indicated in the text and Acknowledgements. Excerpts from Chapters 2, 3, 4 and 5 have been published or submitted in the following papers: • Garagnani, M., Shtyrov, Y., & Pulvermüller, F. (2009) Effects of attention on what is known and what is not: MEG evidence for discrete memory circuits. Frontiers in Human Neuroscience 3:10. doi: 10.3389/neuro.09.010.2009 • Garagnani, M., Wennekers, T. & Pulvermüller, F. (2008) A neuroanatomically- grounded Hebbian learning model of attention-language interactions in the human brain. European J. of Neuroscience 27(2):492-513. • Garagnani, M., Wennekers, T. & Pulvermüller, F. (2007) A neuronal model of the language cortex. Neurocomputing, 70(10-12):1914-19. • Wennekers, T., Garagnani, M. & Pulvermüller, F. (2006) Language models based on Hebbian cell assemblies. Journal of Physiology – Paris, 100(1-3):16-30. The material contained in this manuscript has not previously been submitted, in whole or in part, for any other degree, diploma or qualification at another institution. This dissertation does not exceed the limit of length prescribed by the Biology Degree Committee. The copyright of this thesis rests with the author. No quotation from it should be published without his prior written consent and publication of information derived from it should acknowledge the original source. ii Acknowledgements First and foremost I would like to express my deepest gratitude to my supervisors, Prof. Friedemann Pulvermüller and Dr. Thomas Wennekers, for their invaluable support, help and guidance throughout this project; without them, this work simply would have not been accomplished. They have been an incredible source of insight, experience and knowledge, always willing to indefatigably discuss ideas, provide constructive criticisms and feedback (on time!), and show appreciation and respect for my work: this aspect, on its own, made the entire (as many people said, “crazy”) enterprise of doing a second PhD worthwhile. At the beginning of this project, Thomas installed on the local Linux cluster a copy of the Felix simulation tool (http://www.pion.ac.uk/~thomas/felix.htm, of which he is the author and inventor). This software was vital to the achievement of this work, as the computational model that I developed builds upon, extends and makes heavy use of the routines available from the Felix library. Secondly, I wish to thank my family, friends and colleagues who helped me, in different ways, during these three years; in particular, my parents, Luciana & Luigi, and my grandmother Milena (or Melina...), for their constant encouragement and support. Finally, I would like to express my deepest gratitude to my partner Sandra for having walked (or, rather, danced) into my life and having filled it with joy. This research was made possible by a full-time PhD studentship generously offered by the UK Medical Research Council, Cognition and Brain Sciences Unit, and supported by the Cambridge European Trust. Understanding language and attention: brain-based model and neurophysiological experiments Massimiliano Garagnani Summary This work concerns the investigation of the neuronal mechanisms at the basis of language acquisition and processing, and the complex interactions of language and attention processes in the human brain. In particular, this research was motivated by two sets of existing neurophysiological data which cannot be reconciled on the basis of current psycholinguistic accounts: on the one hand, the N400, a robust index of lexico- semantic processing which emerges at around 400ms after stimulus onset in attention demanding tasks and is larger for senseless materials (meaningless pseudowords) than for matched meaningful stimuli (words); on the other, the more recent results on the Mismatch Negativity (MMN, latency 100-250ms), an early automatic brain response elicited under distraction which is larger to words than to pseudowords. We asked what the mechanisms underlying these differential neurophysiological responses may be, and whether attention and language processes could interact so as to produce the observed brain responses, having opposite magnitude and different latencies. We also asked questions about the functional nature and anatomical characteristics of the cortical representation of linguistic elements. These questions were addressed by combining neurocomputational techniques and neuroimaging (magneto-encephalography, MEG) experimental methods. Firstly, a neurobiologically realistic neural-network model composed of neuron-like elements (graded response units) was implemented, which closely replicates the neuroanatomical and connectivity features of the main areas of the left perisylvian cortex involved in spoken language processing (i.e., the areas controlling speech output – left inferior- prefrontal cortex, including Broca’s area – and the main sensory input – auditory – areas, located in the left superior-temporal lobe, including Wernicke’s area). Secondly, the model was used to simulate early word acquisition processes by means of a Hebbian correlation learning rule (which reflects known synaptic plasticity mechanisms of the neocortex). iv The network was “taught” to associate pairs of auditory and articulatory activation patterns, simulating activity due to perception and production of the same speech sound: as a result, neuronal word representations distributed over the different cortical areas of the model emerged. Thirdly, the network was stimulated, in its “auditory cortex”, with either one of the words it had learned, or new, unfamiliar pseudoword patterns, while the availability of attentional resources was modulated by changing the level of non-specific, global cortical inhibition. In this way, the model was able to replicate both the MMN and N400 brain responses by means of a single set of neuroscientifically grounded principles, providing the first mechanistic account, at the cortical-circuit level, for these data. Finally, in order to verify the neurophysiological validity of the model, its crucial predictions were tested in a novel MEG experiment investigating how attention processes modulate event-related brain responses to speech stimuli. Neurophysiological responses to the same words and pseudowords were recorded while the same subjects were asked to attend to the spoken input or ignore it. The experimental results confirmed the model’s predictions; in particular, profound variability of magnetic brain responses to pseudowords but relative stability of activation to words as a function of attention emerged. While the results of the simulations demonstrated that distributed cortical representations for words can spontaneously emerge in the cortex as a result of neuroanatomical structure and synaptic plasticity, the experimental results confirm the validity of the model and provide evidence in support of the existence of such memory circuits in the brain. This work is a first step towards a mechanistic account of cognition in which the basic atoms of cognitive processing (e.g., words, objects, faces) are represented in the brain as discrete and distributed action-perception networks that behave as closed, independent systems. v Table of Contents CHAPTER 1 – INTRODUCTION .........................................................................................1 1.1 BACKGROUND..................................................................................................................1 1.2 LANGUAGE, LEARNING, AND WORD-RELATED NEURONAL CIRCUITS ..............................5 1.3 THE LANGUAGE CORTEX..................................................................................................7 1.4 MODELLING LANGUAGE PROCESSING..............................................................................9 1.5 ATTENTION ....................................................................................................................13 1.6 SUMMARY ......................................................................................................................17 CHAPTER 2 – A NEURONAL MODEL OF THE LANGUAGE CORTEX ..................18 2.1 RELATED WORK .............................................................................................................18 2.2 NETWORK STRUCTURE AND FUNCTION .........................................................................23 2.2.1 Model of cortical neurons ......................................................................................24 2.2.2 Modelling Hebbian Synaptic Plasticity ..................................................................27 2.2.3 System-level Architecture .......................................................................................29 2.3 DISCUSSION....................................................................................................................33 2.4 SUMMARY AND MAIN CONTRIBUTIONS..........................................................................36 CHAPTER 3 – SIMULATING THE EMERGENCE OF DISCRETE AND DISTRIBUTED CELL ASSEMBLIES FOR WORDS ......................................................37 3.1 EXPERIMENT SET 1 – INTRODUCTION............................................................................37 3.1.1 Experiment Set 1 – Methods...................................................................................38 3.1.2 Experiment Set 1 – Results .....................................................................................40 3.1.3 Experiment Set 1 – Interim Discussion ..................................................................46 3.2 EXPERIMENT SET 2 – EMERGENCE OF CAS IN THE REVISED MODEL.............................49 3.2.1 Experiment Set 2 – Methods...................................................................................50 3.2.2 Experiment Set 2 – Results .....................................................................................50 3.2.3 Experiment Sets 1 & 2 - Discussion .......................................................................57 3.3 SUMMARY AND MAIN CONTRIBUTIONS..........................................................................60 CHAPTER 4 – SIMULATING LEXICALITY AND ATTENTION EFFECTS .............61 4.1 EXPERIMENT SET 3 – REPLICATING LEXICALITY EFFECTS ............................................61 4.1.1 Experiment Set 3 – Methods...................................................................................61 4.1.2 Experiment Set 3 – Results .....................................................................................62 4.1.3 Experiment Set 3 – Interim Discussion ..................................................................65 4.2 EXPERIMENT SET 4 – MODELLING EFFECTS OF LEXICALITY AND ATTENTION .............67 vi 4.2.1 Experiment Set 4 – Methods...................................................................................67 4.2.2 Experiment Set 4 – Results .....................................................................................68 4.3 EXPERIMENT SETS 3 & 4 – DISCUSSION.........................................................................70 4.3.1 Explaining the Influences of Lexicality and Attention............................................72 4.3.2 Fit of model predictions and neurophysiological data...........................................73 4.3 SUMMARY AND MAIN CONTRIBUTIONS..........................................................................74 CHAPTER 5 – NEUROPHYSIOLOGY OF ATTENTION AND LANGUAGE INTERACTIONS: AN MEG STUDY..................................................................................76 5.1 INTRODUCTION ..............................................................................................................76 5.2 MATERIALS AND METHODS...........................................................................................77 5.2.1 Subjects...................................................................................................................77 5.2.2 Design.....................................................................................................................77 5.2.3 Instructions.............................................................................................................78 5.2.4 Tests........................................................................................................................78 5.2.5 Stimuli preparation and delivery............................................................................78 5.2.6 Procedures .............................................................................................................80 5.2.7 MEG Recording......................................................................................................81 5.2.8 MEG Data Processing ...........................................................................................81 5.2.9 Statistical Analysis .................................................................................................83 5.3 RESULTS.........................................................................................................................84 5.3.1 Behavioral ..............................................................................................................84 5.3.2 MEG results............................................................................................................85 5.3 DISCUSSION....................................................................................................................91 5.4 SUMMARY AND MAIN CONTRIBUTIONS..........................................................................93 CHAPTER 6 – SUMMARY AND CONCLUSIONS..........................................................95 APPENDIX A .........................................................................................................................99 APPENDIX B .......................................................................................................................102 ABBREVIATIONS ..............................................................................................................107 REFERENCES.....................................................................................................................108 vii 1 Chapter 1 – Introduction This Chapter provides the necessary background, reviews some of the relevant literature, and introduces the specific research questions that we addressed and that motivated this work. 1.1 Background Our brains can effortlessly store knowledge about objects, faces, words and facts. The nature of the cortical representation of the basic components of knowledge, however, is still a major issue in cognitive neuroscience (see Patterson, Nestor & Rogers (2007) for a recent review). In psycholinguistics, most existing theoretical and computational approaches explain language processes either as the activation and long-term storage of localist elements (e.g., Dell (1986), Dell, Chang & Griffin (1999), Levelt, Roelofs & Meyer (1999), McClelland & Elman (1986), Norris (1994), Page (2000)) or on the basis of fully distributed activity patterns (Gaskell, Hare, & Marslen-Wilson, 1995; Joanisse & Seidenberg, 1999; McClelland & Rumelhart, 1985; Plaut, McClelland, Seidenberg, & Patterson, 1996; Rogers et al., 2004; Rogers & McClelland, 1994; Seidenberg & McClelland, 1989). Localist approaches typically assume, a priori, the existence of separate nodes for separate items (words), and of pre-established, “hard- wired” connections between them. Nodes are usually considered active (“on”) only if their activation overcomes a pre-specified threshold; the feature of anatomically distinct nodes allows different item representations to be active at the same time while avoiding cross-talk. Distributed accounts, on the other hand, do not make such a- priori assumptions: in them, the representations of the relevant items emerge as distributed patterns of strengthened connections in a set of nodes (hidden layer). In this approach, the same set of nodes is used to encode different items as different patterns of graded activation; this, however, makes it impossible to maintain different item representations separate when these are simultaneously active. In general, 2 cognitive arguments (e.g., our proven ability to maintain multiple item representations distinct) favour localist, discrete-activation representations, whereas neuroscience arguments weight in favour of distributedness (Elman et al., 1996; Page, 2000; Rolls & Tovee, 1995). These two accounts make different predictions about the functional nature (discrete vs. graded activation, respectively) and cortical characteristics (local vs. distributed ne test the the co to- en cal res and me un sig tw Fo pro (E N4 tex tworks, respectively) of the knowledge representations in the brain. One way to se predictions and investigate the presence and functional characteristics of rtical representations of linguistic items is to apply electro- and magne cephalography (EEG/MEG) techniques, and measure how neurophysiologi ponses differ when the stimuli presented in input consist of either (i) familiar aningful elements (e.g., words, coherent text) or (ii) equivalently complex but familiar, meaningless items (e.g., pseudowords, incongruent sentences). A nificant body of evidence indicates different patterns of brain activation for these o cases. Figure 1.1 Typical N400 response (elicited in presence of attention) to spoken words (dashed curve) and pseudowords (solid). The dotted oval indicates the interval where the differences between the curves are statistically significant. The vertical axis indicates stimulus onset time. Note the large N400 amplitude to pseudowords [adapted from (Friedrich, Eulitz, & Lahiri, 2006), their Fig. 3.(C)] r example, a well-known and robust neurophysiological index of lexical-semantic cessing is the “N400” (see Figure 1.1), a negative-going event-related potential RP) peaking around 400ms after stimulus onset (Kutas & Hillyard, 1980). The 00 is larger for senseless materials (e.g., pseudowords, semantically incoherent t) than for matched meaningful language (common words or coherent text), and is 3 elicited under conditions where subjects are attending to the input (Barber & Kutas, 2007; Kutas & Hillyard, 1980). Differences in neurophysiological brain responses to words and pseudowords have been recorded also at short latencies (e.g., Hauk, Davis, Ford, Pulvermüller & Marslen-Wilson (2006), Segalowitz & Zheng (2008), Sereno, Rayner & Posner (1998)), especially in the mismatch negativity (MMN) brain response (Korpilahti, K 01; P ler, 2 ted re ent a re h F F s w s rause, Holopainen, & Lang, 2001; Pettigrew et al., 2004; Pulvermüller, 20 ulvermüller et al., 2001; Pulvermüller & Shtyrov, 2006; Shtyrov & Pulvermül 002). The MMN (Näätänen, Gaillard, & Mäntysalo, 1978) is an early event-rela sponse (latency 100-250ms) elicited in oddball experiments by the infrequcoustic events (so-called “deviant stimuli”) presented occasionally among frequently peated sounds (“standard stimuli”). The MMN is elicited even when subjects are eavily distracted, and, unlike the N400, is larger for words than for pseudowords. Figure 1.2. Typical Mismatch Negativity (MMN) response to words and pseudowords. Note that the MMN in word context (red curves) is enhanced compared with the MMN in pseudoword context (blue curves). The acoustic waveforms of the stimuli which elicited the MMNs are shown at the top [after (Pulvermüller et al., 2001, their Fig. 2)]. igure 1.2 shows two examples of MMN, obtained from ERPs of native speakers of innish to word and pseudoword stimuli. The MMNs were elicited here by the critical yllables /ki/ (left) and /ko/ (right) when placed in a word context and in a pseudo- ord context. More precisely, the two syllables were presented after the context yllable /va/ (resulting in “vakki” and “vakko”, two pseudo-words in Finnish) and 4 after the context syllable /la/, thereby completing meaningful Finnish words, “lakki” (CAP) and “lakko” (STRIKE). Although, in principle, they could be used to judge cognitive brain theories of distributed vs. localist representations, neurophysiological results are rarely brought to fruit in this context. The question of why these brain indicators of lexico-semantic processes arise at different latencies and present reversed relative magnitude (N400 larger for pseudowords, MMN larger for words) is left unexplained by current psycholinguistic theories. One possible argument may be that these two divergent patterns of responses are the result of the different processing conditions under which they are elicited. In particular, while the N400 is generally recorded during tasks that require subjects to pay attention to the stimuli (e.g., lexical decision tasks), the MMN is typically elicited in the passive oddball task, in which subjects are instructed to focus their attention on a silent video and ignore the speech stimuli. Thus, the reversal of the response pattern might be caused by the different amounts of attentional resources available to process the linguistic stimuli. A number of studies have confirmed that ERPs and MMN amplitudes are modulated by the attentional load that is required by the task under which they are elicited (Alho, Woods, Algazi, & Näätänen, 1992; Bentin, Kutas, & Hillyard, 1995; Otten, Rugg, & Doyle, 1993; Pulvermüller, 2007; Woldorff, Hillyard, Gallen, Hampson, & Bloom, 1998; Woods, Alho, & Algazi, 1992). Indeed, Szymanski and colleagues (1999), in a study which used spoken phonemes, reported that “top-down controls not only affect the amplitude of the MMN, but can reverse the pattern of MMN amplitudes among different stimuli” (Szymanski, Yund, & Woods, 1999). However, to date, no study has thoroughly investigated the effects of attention on the processing of words and pseudowords while strictly controlling for physical/acoustic stimulus properties. In addition, although existing data suggest that the opposite responses might be caused by the different attentional load, the previous studies have failed to provide an account of the mechanisms that may underlie the differential neurophysiological responses to words and pseudowords: How do the different neural processes interact so as to produce brain responses having opposite magnitude and different latencies? 5 One way to address this question is to implement a neurocomputational model that can reproduce spatial and temporal aspects of brain activity in the relevant cortical areas and provide a mechanistic explanation, at the cortical-circuit level, of the existing neurophysiological findings. The present manuscript describes such a model, how it was applied to explain the observed effects, and the testing of its novel predictions with experimental (MEG) methods. As this work aimed at explaining the mechanisms underlying neurophysiological data at the level of nerve-cell circuits, implementing a biologically realistic model was a crucial aspect of the project; we take the view that structural and functional network properties are critical for the nature of the language representations that the model – and the brain – give rise to. The following sections provide the theoretical background, neuroscientific principles and basic modelling assumptions underlying this work; we also introduce the cognitive constructs of interest, identify the relevant neuroanatomical structures and neural mechanisms, and characterize the high-level mapping between such mechanisms and corresponding entities in the model. Chapter 2 describes in detail the computational model. Chapters 3 and 4 illustrate how we used the model to replicate and explain, at the cortical-circuit level, brain processes of early word learning and the effects of lexicality1 and attentional load on the processing of speech and language. Chapter 5 describes the testing of the model’s crucial predictions by means of a novel critical MEG experiment. 1.2 Language, learning, and word-related neuronal circuits In cognitive terms, the main objects of interest of this study are the building blocks of language, namely, words. We start from the hypothesis that the neural correlate of a word is a memory circuit (“trace”) that develops during early language acquisition (Pulvermüller, 1999). It is well-known that even during the earliest stage of speech- like behaviour, babbling (Fry, 1966; Pulvermüller & Preissl, 1991), near-simultaneous correlated activity is present in different brain parts, especially those areas controlling speech output (left inferior-prefrontal cortex, IF) and those where neurons respond to auditory features of speech (left superior-temporal lobe, ST). The same applies to adults: whenever we utter a word, there is activity in IF cortex controlling the 1 The lexical status of a linguistic item (words are lexical items, pseudowords are not). 6 articulatory gestures along with ST activity, the neural response to the incoming sound. In the adult brain these areas are reciprocally connected (see Section 1.3). We conjecture that through associative Hebbian learning mechanisms (Hebb, 1949) such connections allow the acquisition of sensory-motor associations between co-occurring cortical patterns of activity, in such a way that, for example, listening to speech sounds involving specific articulators leads to the “lighting up” of the corresponding motor representations. A significant body of experimental evidence confirms the presence of speech-motor associations as networks of strongly interconnected neurons distributed between left superior-temporal and inferior-frontal cortex (Fadiga, Craighero, Buccino, & Rizzolatti, 2002; Pulvermüller, 1999; Pulvermüller et al., 2001; Pulvermüller & Shtyrov, 2006; Watkins & Paus, 2004; Watkins, Strafella, & Paus, 2003; S. M. Wilson, Saygin, Sereno, & Iacoboni, 2004; Zatorre, Meyer, Gjedde, & Evans, 1996) and their role in language processing. We will refer to such distributed networks of strongly and reciprocally connected neurons as to cell assemblies (CAs) (Braitenberg, 1978; Hebb, 1949; Palm, 1982; Wennekers, Sommer, & Aertsen, 2003). A CA can be thought of as a highly specialized functional unit that “responds” by becoming fully active only when a specific brain activation pattern – brought about by the sensory (or internal) stimulation – conveys at least a critical amount of activation in its neuronal circuits. Sensory-motor CA could receive their input (e.g., lexical items, words) through the auditory or the motor modalities. We simulated the setting up of such sensory-motor links for lexical items at early stages of language acquisition in a brain-inspired neural network that models neuroanatomical, connectivity, and neurophysiological properties of the language areas in the left hemisphere in close proximity of the sylvian fissure (perisylvian cortex, here referred to as “language cortex” – see Sec. 1.3). To induce CA formation in the model, we repeatedly exposed the network to predetermined pairs of (random and sparse) activation configurations, each activation-pattern pair representing the model equivalent of an auditory-articulatory word form, and allowed the network’s synaptic weights to adapt through Hebbian learning. Crucially, in the attempt to replicate and explain the effects of lexicality and attention on the processing of speech, we used the resulting network to simulate the response of the language cortex to words and pseudowords under variable attentional load. The details of the methods adopted for this part of the study and corresponding results are presented in Chapter 3. 7 1.3 The language cortex This section specifies the core areas of the cortex involved in language processing that were reproduced in the model, and their connectivity features. Some of the structural features are evident from neuroanatomical investigations of the human brain; however, others, especially the fine grained wiring between and within cortical areas, have been inferred from monkey studies (Pandya & Yeterian, 1985; Rauschecker & Tian, 2000; Romanski et al., 1999) and tractography (Catani, Jones, & Ffytche, 2005). The primary cortices involved in spoken language processing (see Fig. 1.4.(a)) include (i) the primary auditory area (Brodmann’s Area 41), located in the caudal part of the planum supratemporale (the part of the upper convolution of the temporal lobe which lies in the sylvian fissure), and (ii) the ventral part of the primary motor cortex (Brodmann’s Area 4), situated near the sylvian fissure (Pulvermüller, 1992). These two areas are active during perception of speech sounds and execution of articulatory movements, respectively. A third primary cortex involved in spoken language processing is the somatosensory cortex, located posterior to the central sulcus; in particular, this includes the inferior parts of BA (Brodmann’s Areas) 1, 2 and 3, which are necessary for sensations within the mouth region. In both the primary auditory and somatosensory areas, afferent fibres carrying sensory input enter the cortex; the primary motor cortex, on the other hand, contains large pyramidal cells that project to motor neurons controlling articulatory muscles. According to neuroanatomical studies in the rhesus monkey (Macaca mulatta) the primary perisylvian motor or articulatory cortex is tightly connected to the premotor (secondary) regions anterior to it. These, in turn, are connected to regions around the inferior branch of the arcuate sulcus (Pandya & Yeterian, 1985), in the inferior prefrontal cortex. Experimental evidence (Fuster, 1997; Rizzolatti, Fogassi, & Gallese, 2001) suggests that similar connection patterns are likely to be present in the homologous structures in man, located in the ventral motor (BA 4) and premotor (BA 6) cortices, and within BA 44 and BA 45 (Broca’s area). As discussed in detail by Pulvermüller (1992), a similar picture can be drawn for the somatosensory and auditory cortex (see also (Kaas & Hackett, 2000; Rauschecker & Tian, 2000; Scott, Blank, Rosen, & Wise, 2000)). That is, each of the primary cortices 8 t , relevant for spoken language is strongly and reciprocally connected to its adjacen secondary region, which, in turn, is connected to its neighbouring association area. In the macaca, the relevant auditory areas are sometimes defined as “auditory core” (labelled “A1” in Fig. 1.4.(b)), “belt” and “parabelt” (AL, ML and CL in Fig. 1.3.(b)) respectively (Petkov, Kayser, Augath, & Logothetis, 2006). These structures may be related – although an exact homology is not likely – to BA 41, 42 and 22 in thehuman brain. (b) (a) Figure 1.4 The relevant areas of the perisylvian cortex in man, and homologous structures in monkey. (a): The different regions of the language cortex in the human brain, indicated by differently shaded areas. Note the long-distance cortico-cortical connections between the auditory and motor association areas, indicated here by black arrows [after (Pulvermüller, 1992)]. (b): Neuroanatomical structure and projections of superior-temporal and perisylvian cortical areas in the monkey brain [after (Romanski et al., 1999)]. See text for details. 9 Studies in non-human (Pandya & Yeterian, 1985; Petrides & Pandya, 2002; Romanski et al., 1999) and human (Catani, Jones, & Ffytche, 2005; Makris et al., 1999; Parker et al., 2005) primates (see Rilling et al., (2008) for a cross-species comparison) suggest that the respective association cortices of each of these primary areas are strongly and reciprocally interconnected with each other via the arcuate and uncinate fascicles, and the extreme capsule. The presence of such long-range cortico-cortical connections between the auditory association (Wernicke’s) and motor association (Broca’s) areas is indicated schematically in Figure 1.4.(a) by (“dorsal” and “ventral”) black arrow- pointed arcs. The fact that these long-distance connections – especially through the fasciculus arcuatus – are more developed in the humans than in apes or monkeys, and are stronger in the left than in the right hemisphere, accounts, in part, for the specificity of language to humans, but also for the left-laterality of language in most human brains (Catani, Jones, & Ffytche, 2005; Makris et al., 1999; Parker et al., 2005; Rilling et al., 2008). 1.4 Modelling language processing A plethora of connectionist models of word learning and language processing exists in the literature (e.g., (Dell, 1986; Elman, 1991; Gaskell, Hare, & Marslen-Wilson, 1995; Joanisse & Seidenberg, 1999; McClelland & Elman, 1986; Norris, 1994; Plaut & Gonnerman, 2000; Plaut, McClelland, Seidenberg, & Patterson, 1996; Plunkett & Marchman, 1993; Seidenberg & McClelland, 1989; Sejnowski & Rosenberg, 1987; Shastri & Ajjanagadde, 1993), to name a few representative examples; see (Christiansen & Chater, 1999; Dell, Chang, & Griffin, 1999) for useful accounts). These models have provided an important contribution to the understanding of how, at the system level, different parts of the human brain may play an active role in language processing; they can explain existing experimental data, and allow new predictions to be made and theories to be tested. Apart from a few recent notable exceptions (e.g., (Guenther, Ghosh, & Tourville, 2006; Husain, Tagamets, Fromm, Braun, & Horwitz, 2004; Westermann & Miranda, 2004)), however, most approaches tend to “abstract away” from the neurophysiological mechanisms and neuroanatomical structures that underlie spoken language processing in the brain. In general, they are usually prone to one or more of the following criticisms: they (i) are based on “hard-wired” networks, in which (ii) the weights of the links between the 10 nodes are set up ad hoc, or (iii) make assumptions which are of questionable biological plausibility (e.g., use backpropagation (Rumelhart, Hinton & Williams, 1986) as learning rule, or adopt all-to-all connectivity), or (iv) do not incorporate knowledge about neuroanatomical structure of the perisylvian cortices and their connections, which constrain and form the basis for the emergence of brain circuits underlying linguistic functions. Because of this, they fall short of providing a mechanistic explanation – at the level of nerve cells – of the neurobiological mechanisms at the basis of language acquisition and processing. We addressed the above shortcomings by implementing a connectionist network specifically designed to mimic neuroanatomical, connectivity, and neurophysiological properties of the left perisylvian language cortex, as summarised below (a detailed description is provided in Chapter 2): (i) Six interconnected cortical areas are modelled, identified on the basis of neuroanatomical studies (see Sec. 1.3): (1) primary auditory cortex, (2) auditory belt and (3) parabelt areas (Wernicke’s area), (4) inferior prefrontal and (5) premotor cortex (Broca’s area), and (6) primary motor cortex; (ii) Neurons are modelled as graded-response cells with adaptation, whose output represents average firing rate within a local pool of pyramidal cells; (iii) Within- (recurrent) and between-area connectivity is implemented via sparse, random, “patchy” next-neighbour synaptic links between cells, as typically found in the mammalian cortex (Braitenberg & Schüz, 1998; Gilbert & Wiesel, 1983); (iv) Both local and global (non-specific) cortical inhibition mechanisms are realised: a. inhibitory cells reciprocally connected with neighbouring excitatory cells simulate the action of a pool of inter-neurons surrounding a cortical pyramidal cell in serving as lateral inhibition and local activity control; b. area-specific inhibitory loops implement a mechanism of self- regulation (see Figure 1.3), preventing the overall network activity 11 from falling into non-physiological states (total saturation or inactivity); (v) Synaptic plasticity is implemented purely through associative (Hebbian) learning mechanisms. Although the specific details of the implementation are presented in Chapter 2, it is appropriate to briefly discuss here some of the above points and related assumptions. As we are mainly interested in modelling and explaining the setting up of acoustic- articulatory associations between the auditory and motor modality (see Sec. 1.2), areas belonging to the somatosensory speech region (see Fig. 1.4) were not included in the model. The network already contains a “module” for sensory input (modelling the three areas in superior-temporal cortex). Adding a second module entirely analogous in structure and connectivity to the auditory one (see Sec. 1.3) would allow the simulation of additional experimental data, but does not represent a conceptually important extension (but see discussion in Sec. 3.1.3). Another point to note is the use of graded response units instead of spiking neurons. We do not aim at simulating individual cortical neurons but rather employ a lumped or mean-field type model in the simulations, where each node (cell) of the network represents the average activity of a local pool of neurons, or “column” (Eggert & van Hemmen, 2000; H. R. Wilson & Cowan, 1973). This modelling choice is justified by two reasons. First, the level of abstraction required to model and replicate neurophysiological (MEG, EEG) data does not require the modelling of ion channels or single action potentials: analogous approaches based on the neuronal mass model (Freeman, 1978; Nunez, 1974) have been used in the past as generative models of EEG/MEG and fMRI (functional magnetic resonance imaging) signals (David & Friston, 2003; Husain, Tagamets, Fromm, Braun, & Horwitz, 2004). Second, the use of spiking neurons would have a huge impact on the computational load, and would not buy anything in terms of explanatory power of the model. Thus, this level of detail should be introduced only if necessary for the phenomena of interest – as just said, modelling the cortical interactions at the level of cortical columns is sufficient for the present purposes. Approximately 20% of all synapses in the neocortex are estimated to be GABA- ergic (Douglas & Martin, 2004; Gabbott, Somogyi, Stewart, & Hamori, 1986); thus, 12 the presence of inhibitory mechanisms in the model (see point (iv)) is well motivated. However, while local (lateral) inhibition is generally believed to be an underlying architectural feature of the cortex (Braitenberg & Schüz, 1998; Douglas & Martin, 2004), the evidence in support of the existence of non-specific (global) cortical inhibition is somewhat less direct. It has been argued that the cortex must have developed a self-regulatory mechanism designed to keep activation between certain t a e e c l t e d ; e e e n bounds (Braitenberg, 1978; Braitenberg & Schüz, 1998). Although there is agreemen that the regulation of cortical activity is necessary, the exact characteristics of such mechanism and the brain systems that realise it are still a matter of debate (se (Fuster, 1995; Pulvermüller, 2003, pp. 78-81; Wickens, 1993)). In our model, w implemented cortical self-regulation through the introduction of area-specifi inhibitory loops, which dampen activation in one area in proportion to the tota activity within that area (see Fig. 1.3 and Sec. 2.2.3 for details). The net result is tha the activity within each area is maintained stable and within limits; these bounds ar determined by the strength of the inhibitory feedback. In the brain, these circuits coul be implemented by cortico-striato-thalamic loops (R. Miller & Wickens, 1991 Wickens, 1993). Cortical Area Input Output A Θ Figure 1.3 The mechanism of cortical self-regulation implemented in the model. Activity within an area is modulated by the non-specific inhibition (filled arrow) as a linear function Θ of the current total activation “A” in that area [after (Braitenberg, 1978)]. Finally, in relation to point (v), we postulate that the brain mechanisms mediating th development of specialized cell assemblies (driven by the repeated presentation of th same sensory-motor input patterns) are generic Hebbian mechanisms of associativ learning, and take the phenomena of long-term potentiation (LTP) and depressio (LTD) to be the neural correlates of learning. LTP and LTD consist of long-term 13 increase or decrease in synaptic strength resulting from pairing presynaptic activity with specific levels of postsynaptic membrane potentials (Buonomano & Merzenich, 1998; Malenka & Nicoll, 1999). These phenomena are believed to play a key role in experience-dependent plasticity, memory, and learning (Malenka & Bear, 2004; Rioult-Pedotti, Friedman, & Donoghue, 2000). In the model, we implemented synaptic plasticity by allowing the strength (weight) of the connections between different cells to adapt only according to an LTP/LTD-based rule (see Section 2.2.2 for details). 1.5 Attention Attention is a central theme in cognitive neurosciences (e.g., see (Raz & Buhle, 2006) for a recent review). A complete report on the state of the art of this field falls outside the scope of this work; we briefly describe here only some of the key ideas that have played an important role in the development of this area and that are relevant to this research. No single, unifying definition of attention currently exists in the literature. William James (1890, pp. 403-404) originally wrote: “Everyone knows what attention is. It is the taking possession of the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought”. James distinguished between “active” and “passive” modes of attention, the former being used when attention processes are controlled in a top-down way by the individual’s current goals, thoughts, behaviour, the latter when attention is controlled in a bottom-up way by external stimuli (e.g., a loud noise, movement). This distinction still appears to be important in recent research (e.g., (Corbetta & Shulman, 2002)). The first modern theory of attention was the selective-filter theory proposed by Donald Broadbent (1958): Broadbent postulated a low level filter (nowadays called “early selection”) that allows only a limited number of percepts to reach the cognitive processes at any time, and proposed that much conscious, attention demanding information processing is dependent on a single, common “limited-capacity system”. This theory accounted for a wide range of existing experimental results and 14 phenomena, such as divided attention (difficulties in listening to two simultaneous speech signals), sustained attention (performance decrement over time) and focused attention (increased distractability due to stresses such as noise or sleep loss), minimizing the importance of top-down, consciously directed attention. The idea of a single limited-capacity system, however, turned out later on to be an over- simplification. For example, in dual-task interference, if task x poses more demands than task y on the system, it should always produce more interference with concurrent activities (Kahneman, 1973; Navon & Gopher, 1979). Instead, it seems that the similarity of the tasks is a key factor: interference, or competition, is stronger for tasks which have obvious properties in common (Allport, 1980; Baddeley, 1986) — for example, two verbal tasks, or two tasks which make shared demands on similar (input or output) processing systems (Duncan, 2006). In this work, we take selective attention to be associated with the cognitive ability to internally focus on, or be aware of, only a small subset of the sensory information in input, relevant to current thought or behaviour, at the expense of the rest. The “biased competition” model of attention (Desimone & Duncan, 1995; Duncan, 1980, , 1996; Duncan & Humphreys, 1989) provides a useful perspective on the possible brain mechanisms underlying such cognitive processes. The model is based on distributed, integrated competition across the sensorimotor network (see also (Walley & Weiden, 1973)), and is supported by a significant body of behavioural and neurophysiological evidence (Bundesen, 1990; Chelazzi, Duncan, Miller, & Desimone, 1998; Chelazzi, Miller, Duncan, & Desimone, 1993; Moran & Desimone, 1985; Sato, 1989; Sperling, 1960). As elucidated by Duncan (2006), the model has three basic ideas. First, processing is competitive in many, perhaps most, of the brain systems responding to sensory input. This is shown, for example, by the relative suppression of the normal response to a visual stimulus when a second (possibly irrelevant) stimulus is also present in the receptive field (E. K. Miller, Gochin, & Gross, 1993). Thus, different stimuli compete for shared (attentional) resources. Second, sustained signals from task context act to bias competition, so that the stimulus relevant to the current task or behaviour “wins”. Third, and crucial for object-based attention, competition is integrated between one brain system and another: the tendency is for the same object to assume dominance throughout the network, processing in different regions representing its different 15 properties and implications for action (Duncan, 2006, pp. 5-6). In the remainder of this section we briefly discuss how these three ideas can be mapped to corresponding neurally-plausible mechanisms implemented in the present and other connectionist models. (a) The first type of competition may be mediated, at different cortical levels, by (local) lateral inhibition: mutually inhibitory synapses between neighbouring excitatory cells (a widespread characteristic of the cortex (Braitenberg & Schüz, 1998; Douglas & Martin, 2004)) might act as local winner-take-all (WTA) mechanisms (Yuille & Geiger, 2003), producing the observed competition phenomena. Following a number of other works (e.g., (Fukai & Tanaka, 1997; Knoblauch & Palm, 2002; Mao & Massaquoi, 2007; Rabinovich et al., 2000; Riesenhuber & Poggio, 1999; Rolls & Deco, 2002)), our network implements local competition and WTA dynamics using an underlying area of inhibitory cells with next-neighbour connectivity (see Sec. 2.2.3). (b) The “top-down” signal responsible for biasing the competition amongst the co- active representations may be realised via excitatory links projecting to cells (or cell populations) that represent specific sensory features (spatial location, color, pitch, etc.) or specific items (see, for example, the architecture proposed by Rolls and Deco (2002, p. 328), or (Deco & Rolls, 2005b; Deco, Rolls, & Horwitz, 2004)). The model presented here does not explicitly attempt to implement such top-down attentional signal. Following Occam’s razor, we decided not to make assumptions on its characteristics or origins; as it turns out, this feature was indeed unnecessary for the model to be able to simulate and explain the phenomema of interest here. (c) A number of computational models of visual attention have suggested how the second type of (integrated, or “object level”) competition might occur (e.g., (Bundesen, Habekost, & Kyllingsbaek, 2005; Dehaene, Sergent, & Changeux, 2003; Phaf, Vanderheijden, & Hudson, 1990; Schneider, 1995)). Although the details differ, the general approach, based on the network principle of attractor states (Hopfield, 1982), is to set up mutual support (i.e., reciprocal excitatory links) between units that respond to the same object, and competition (i.e., mutual inhibitory links) between units that do not. In this way, the system spontaneously seeks a state in which different units responding to the same object are active together. However, while synaptic plasticity (LTP/LTD) can explain the emergence of strong reciprocal 16 excitatory links between co-activated (and weakening of links between non- coactivated) sets of units, it is less clear which neurophysiological mechanisms might lead to the strengthening of inhibitory synapses between (not neighbouring) populations of neurons that are not active together. The model presented here implements the integrated, “item level” competition purely as a result of the cortical activity-regulation mechanism (see Fig. 1.3), which is independently motivated by the need for functional stability (Braitenberg, 1978; Fuster, 1995; Wickens, 1993). Past theoretical works (Braitenberg, 1978; Hopfield, 1982; Palm, 1982, , 1987; Willshaw, Buneman, & Longuet-Higgins, 1969) have shown that non-specific inhibition not only enhances the network stability but can also solve the superposition problem (Knoblauch & Palm, 2002), which requires the simultaneous full activation of two different cell assemblies to be prevented. Accordingly, our network does not need to assume (or develop) reciprocal inhibitory links between populations of strongly interconnected cells (CAs) representing different items, as the mutual inhibition between cell assemblies “falls out” of the global inhibition mechanism. In fact, the response of the non-specific inhibition loop to a stimulus in input depends on the strength of the feedback link forming such loop (depicted as a filled arrow in Fig. 1.3, and henceforth called “FI”, feedback inhibition). Therefore, in the model, attention at the object level (or item-level competition) is realised as follows: strong FI (leading to strong mutual inhibition between co-active CAs) simulates – at the cognitive level – a situation in which small amounts of attentional resources are available for processing new sensory input (as low attention implies a tougher competition between co-stimulated representations to enter the focus of attention). Analogously, reduced FI (i.e., less competition between co-active CAs) models greater availability of attentional resources: in the latter situation, several representations can be active at the same time (allowing phenomena like that of “divided attention”, or attention to a large perceptual space). The use of non-specific inhibition to successfully model aspects of attention is not new (Deco, Rolls, & Horwitz, 2004; Rolls & Deco, 2002; Szabo, Almeida, Deco, & Stetter, 2004). However, past approaches used non-specific inhibition as a tool to implement the first type of competition, i.e., lateral inhibition between cells of a specific cortical area, whereas global inhibition is used here to model item-based attention, implementing the second type of competition between cell assemblies that 17 are distributed across different brain areas. 1.6 Summary This Chapter provided the necessary background and introduced the research questions that motivated this work, reviewing some of the relevant literature, describing the neuroscientific principles and the methodological approach adopted, and discussing the basic underlying assumptions (Sections 1.1, 1.2 and 1.4). Section 1.3 identified the neuroanatomical structures and the neurophysiological principles and observations motivating the functional features implemented in the neurocomputational model of the language cortex (see next Chapter), while Sec. 1.5 introduced the basic ideas underlying the biased competition model of attention, mapping them to corresponding entities in the neural network. 18 Chapter 2 – A Neuronal Model of the Language Cortex This chapter provides a detailed description of the computational model of the left perisylvian cortex that we implemented. The approach follows similar attempts to build models linking neuronal circuits to functional brain systems, especially in the domain of visual (Corchs & Deco, 2002; Deco & Rolls, 2005a; Tagamets & Horwitz, 1998), language (Guenther, Ghosh, & Tourville, 2006; Pulvermüller & Preissl, 1991; Westermann & Miranda, 2004) and auditory processing (Husain, Tagamets, Fromm, Braun, & Horwitz, 2004). The characteristics of the more closely related models and the general features that set apart both these and previous works from the present architecture are discussed in the next section. 2.1 Related work Several examples of distributed connectionist models exist in the literature which, like the present one, demonstrate how cognitive behaviour can emerge from neurobiological structure and function (e.g., (Corchs & Deco, 2002; Deco & Rolls, 2005a; Husain, Tagamets, Fromm, Braun, & Horwitz, 2004; Tagamets & Horwitz, 1998)). These models have been used to explain (and simulate PET/fMRI data resulting from) visual and auditory attention phenomena at the mechanistic level of cortical circuits. However, none of these attempts to address language function. Most relevant here is the ground-breaking work by Husain and colleagues (Husain, Tagamets, Fromm, Braun, & Horwitz, 2004), who built a neuroanatomically-based connectionist model of the left perisylvian areas to simulate electrophysiological and fMRI activities in multiple brain regions during an auditory delayed-match-to-sample task for tonal patterns. Their architecture consists of four major brain regions: (1) primary/core auditory cortex; (2) secondary sensory cortex (belt and parabelt areas); (3) superior temporal gyrus/sulcus (ST); and (4) prefrontal cortex (PFC). Each region 19 is composed of 81 excitatory-inhibitory units (modified Wilson–Cowan units), each of which represents a cortical column; both feedforward and feedback connections link the different regions. A first shortcoming of Husain and colleagues’ model is that, in spite of the large body of experimental evidence showing that the mammalian brain exhibits experience-dependent plasticity (see Sec. 1.4), it is not endowed with any learning mechanism. Secondly, the model assumes the existence of different types of cells exhibiting pre-specified behaviours, and the connections between areas are “hard wired” in an ad hoc manner. For example, the PFC area (Husain, Tagamets, Fromm, Braun, & Horwitz, 2004, their Fig. 1) is assumed to contain four different types of built-in neuronal units: “cue-sensitive” units (assumed to respond when an external stimulus is present), two types of “delay” units (one assumed to be active during stimulus presentation and subsequent delay before presentation of the following stimulus, the other assumed to be only active during the delay between presentations of stimuli), and “response” units, whose activities are assumed to increase when the second stimulus matches the first; these sets of units are assumed to form separate modules, connected by arbitrary links having fixed and predetermined synaptic weight (ibid., their Table A2). (Note that these built-in properties, especially the active- memory function, have been argued to be the net-effect of neuronal assemblies, not a feature intrinsic to single cells (Fuster, 2003; Zipser, Kehoe, Littlewort, & Fuster, 1993)). The secondary area is assumed to contain “contour-selective” units for which there is no direct experimental evidence, and there are no excitatory-excitatory (recurrent) within-area connections in the primary, secondary and ST areas. Finally, the architecture includes an “attention” module (which the authors explicitly declare to be “not modelled in a biologically realistic fashion” (Husain, Tagamets, Fromm, Braun, & Horwitz, 2004, p. 1710) that projects to only one of the two delay-modules and directly defines the strength of the representation maintained by such delay units. In summary, Husain and colleagues’ model (i) does not include any learning mechanism, (ii) assumes the existence of different types of cells with conveniently pre-defined, built-in behaviours, and of modules for which there is no neurobiological evidence, (iii) assumes ad hoc connections between such elements, and (iv) does not deal with language, but with simple tonal patterns. In spite of these aspects, this architecture still constitutes the distributed connectionist model of the left perisylvian 20 areas that come closest, in terms of neuroanatomical and neurophysiological detail, to the model that we present here. A connectionist model of speech acquisition and production that does incorporate learning and addresses language function was proposed recently by Guenther, Ghosh, & Tourville (2006). This architecture (composed of several components, including premotor, motor, auditory and somatosensory cortical areas, in addition to a cerebellum module) is used to simulate a range of acoustic and kinematic data (including compensation to lip and jaw perturbations during speech) and fMRI activity during syllable production. The model provides a very effective and insightful account of language processing based on mechanisms that are assumed to simulate neuronal and synaptic level phenomena. To achieve high effectiveness at the functional level whilst maintaining a sufficiently fine-grained level of modelling, however, engineering considerations were prioritised in the implementation at the expenses of neurobiological faithfulness. For example, all projections between the different cortical areas are assumed to be unidirectional (e.g., premotor cortex projects to superior temporal cortex, but no projections exist in the opposite direction) and do not exhibit next-neighbour, random and sparse topology as typically found in the mammalian cortex (Amir, Harel, & Malach, 1993; Douglas & Martin, 2004) but all- to-all connectivity, which is not neurobiologically realistic (Braitenberg, 2001; Braitenberg & Schüz, 1998, p. 63). The model also makes use of some simplifying localist assumptions: for example, each single cell in the “Speech Sound Map” module (modelling the left ventral premotor cortex (Guenther, Ghosh, & Tourville, 2006, their Fig. 1)) is assumed to represent one specific speech sound, defined as “a phoneme, syllable, word, or short phrase that is frequently encountered in the native language and therefore has associated with it a stored motor program for its production” (ibid., 2006, p. 283). In the language acquisition simulation described, one cell in premotor cortex was used to represent the entire phrase “good doggie”. Finally, the tuning of the synaptic weights during the simulation of language acquisition (including the preliminary babbling and subsequent “practice phase”, involving the learning of more complex speech sounds) is not realised, like in the present model, via a uniform, constantly-acting mechanism that closely replicates neurophysiological features of synaptic plasticity and is applied equally to all areas during the training, but through a set of different, ad hoc procedures of little 21 biological plausibility that are carried out at different times on different sets of synaptic projections.2 While the models mentioned above were used to simulate PET and fMRI data, the modelling of EEG/MEG signals has been also object of research for several years: e.g., epileptic-like (Jansen & Rit, 1995; Wendling, Bellanger, Bartolomei, & Chauvel, 2000), gamma-(Jefferys, Traub, & Whittington, 1996) and alpha-rhythm dynamics (Suffczynski, Kalitzin, Pfurtscheller, & Lopes da Silva, 2001) and spectral activity in different frequencies (David & Friston, 2003) have been successfully simulated in the past. However, we are not aware, at present, of any biologically realistic model able to simulate and explain the MEG/EEG dynamics observed during higher-level cognitive and language tasks, which was one of the main goals of this work. As mentioned in Sec. 1.4, one of the features shared by most existing connectionist models of language processing which incorporate learning is the use of the back- propagation mechanism (Rumelhart, Hinton, & Williams, 1986). This learning algorithm, although very effective, makes use of information that is not local to the synapse undergoing the efficacy change (i.e., information related to the activity of the two pre- and post-synaptic cells), but which is obtained from the network’s “output” layer by means of comparing current and desired activity there. It is not entirely clear whether (and, if so, how) the brain can actually implement such non-local back- 2 For example, the synaptic weights of the projections from ventral premotor cortex to superior temporal cortex (“Auditory Error Map”), encoding the auditory targets for each speech-sound cell, are conveniently ordered in “spatio-temporal” matrices, in which each column represents the target at one point in time, and there is a different column for every 1ms of the duration of the speech sound. Using an audio file containing the appropriate speech sound, a specified procedure sets up the synaptic weights in such a way that the values are (exactly) the upper and lower bounds of each of the first three formant frequencies, at 1ms intervals for the duration of the utterance. This “learning” procedure is run once, during the practice phase only (and not during the babbling). On the other hand, the weight matrix encoding the projections from premotor to somatosensory cortex is updated only during correct self-reproductions of the corresponding speech sound (i.e., strictly after the learning of the auditory target for the sound). Moreover, in order to account for temporal delays, this process involves artificially aligning the somatosensory error “data slice” with the appropriate time slices of the weight matrices (see (Guenther, Ghosh, & Tourville, 2006), their Appendix B). 22 propagation of errors. In this work we relaxed this assumption, and made things more difficult (but more realistic) by limiting ourselves to modelling synaptic plasticity mechanisms that are well established and widely accepted (namely, LTP/LTD); as it will be seen, it is solely by means of these mechanisms that the model correlates of the cortical representations of linguistic items (word cell assemblies) can emerge in the network. A second important aspect setting apart several of the existing models in psycholinguistics (e.g., (Dell, 1986; Dell, Chang, & Griffin, 1999; McClelland & Elman, 1986; Norris, 1994), to name but a few) from the present one concerns the adoption of a localist representation, whereby one node of the network does not represent a pool of cortical neurons, but a phonological feature, a phoneme, or even a whole word. While a localist approach offers several advantages (including reduced computational load and easier implementation), it requires deciding a priori the behaviour of the (simulated) brain representations of the entities of interest (e.g., words, phonemes). To clarify: building a (localist or distributed) connectionist model requires specifying the computational properties of the nodes of the network; if one assumes, for example, that nodes represent words, then specifying their computational properties de facto means establishing, in advance, the behaviour of the (brain representations of) words. We deliberately chose not to follow this approach: our aim was to demonstrate that (and explain how) such linguistic representations (and their macroscopic behaviour) can spontaneously emerge from an initially homogenous, sparsely and randomly connected brain-like network of identical nodes by means of neurobiologically plausible (microscopic) mechanisms. In the visual domain, this approach has led, for example, to the successful modelling of the emergence of ocular dominance and orientation columns in a network with similar connectivity features (Mikkulainen, Bednar, Choe, & Sirosh, 2005). The adoption of this method offers two main advantages: (I) it allows one to look at the properties exhibited by the representations that emerge (as opposed to assuming them as built-in features, as, e.g., Husain and colleagues (2004) did in their work) and use them to make predictions about the properties of their neural correlates; (II) the model can be used to understand the cortical mechanisms that underlie the actual setting up of such representations — in this case, the neural processes underlying early word learning. 23 2.2 Network structure and function A complete characterization of the model requires describing both the fine-grained (or neuronal) and high (or systems) level. For each of these levels, the structure (the sub- components and how they are integrated) and function (the result of the dynamic interactions of the component parts) will be explained. In the three following subsections, we start from the basic computational unit of our model (the “cell”, representing a local pool of neurons) and move on to the higher levels of area and network (a “system” of cortical areas), alternating structural and functional descriptions as appropriate. The main quality criterion for the model was biological faithfulness. This led to implementing an architecture which was realistic both at systems level (especially the anatomical and connectivity features of the model, linking it to a specific brain part – the perisylvian cortex) and micro-physiological level. Bearing this criterion in mind, it was necessary to find a good compromise between the two conflicting additional goals of developing a model that was sufficiently detailed so as to allow the emergence of the relevant complex processes observed in the human brain, and sufficiently simple so as to be computationally tractable. We achieved the latter by implementing a relatively simple (computationally speaking) “activity regulation” mechanism mimicking a coarse-grained attentional threshold control system (see Section 2.2.3), and by keeping the total number of cells in the network within a manageable range. The overall architecture of the neural network (see Figure 2.1.(b)) replicates the neuroanatomical features and interconnections of the (spoken) language cortex summarized in Section 1.4. In particular, the model reproduces the main sensory input areas (the primary auditory cortex A1 and its surrounding belt and parabelt areas, AB and PB) and the motor output areas (the perisylvian motor cortex, M1, and areas PM and PF). Each of these cortical areas is modelled as a 25-by-25 area of artificial (excitatory and inhibitory) cells (see Section 2.2.3 for details). In addition to the six areas of excitatory-inhibitory cells, the network is endowed with a self-regulation mechanism (not shown in Figure 2.1), necessary to maintain the total activity of the network within certain limits (see also Sec. 1.3). 24 (a) (b x 2.2 Th po (a) PFPM AB PB A1 M1 ) Area 1: Auditory corte Area 5: Inferior Premotor cortex Area 4: Inferior Prefrontal cortex Area 3: Parabelt Area 2: Auditory Belt Area 6: Inferior Motor cortexFigure 2.1 The relevant areas of the perisylvian cortex, the overall network architecture, and the mapping between the two, indicated by the colour code. (a) The six different areas of the perisylvian language cortex modelled, labelled as M1, PM, PF, A1, AB, PB. Black arrows indicate long-distance cortico-cortical connections between the auditory and motor association areas (see Section 1.3). (b) The six-areas network model and an illustration of the type of distributed functional circuit that developed during learning of perception-action patterns. Each small filled oval represents an excitatory neuronal pool (E-cell); solid and dashed lines indicate, respectively, strong reciprocal and weak (and/or non-reciprocal) connections. Co- activated cells are depicted as black (or grey, indicating smaller activation) ovals. Only forward and backward links between co-activated cells are shown. Inhibitory inter- neurons are not depicted [after (Garagnani, Wennekers, & Pulvermüller, 2008) ]. .1 Model of cortical neurons e basic computational unit of our model is the “cell”, an element representing a ol of cortical neurons (either pyramidal cells or inhibitory inter-neurons). Each cell 25 or “node” of the network may be considered to represent a cortical column of approximately 0.25mm2 size (Hubel, 1995; Mountcastle, 1997), containing ~2.5⋅104 neurons (Braitenberg & Schüz, 1998, p. 25; Rockel, Hiorns, & Powell, 1980)3. Each cell has a membrane potential V(x,t) (reflecting temporal low-pass properties of local neuron pools, see Equation (2.1) below)) and transforms its potential into firing rate by means of a sigmoid output function (Eq. (2.2)) reflecting local firing activity. The membrane potential V(x,t) at time t of a model cell x with membrane time constant τ is governed by the equation: ),(),(),( txVtxV dt txdV In+−=⋅τ (2.1) where VIn(x,t) is the total input to cell x, representing the sum of all excitatory and inhibitory postsynaptic potentials – EPSPs, IPSPs – acting upon neuron pool x at time t (inhibitory inputs are given a negative sign); these subsynaptic EPSPs and IPSPs drive inward currents in neurons of pool x, producing the charging of their somata. The value O(x,t) produced as output by a cell x is the only signal propagated by x to other cells. The output value O(x,t) of a cell x at time t is a piecewise linear sigmoid function of the cell’s membrane potential V(x,t): (V(x,t)− φ) if 0 < (V(x,t)− φ) ≤ 1 0 if V(x,t)≤ φ 1 otherwise (2.2) O(x,t) = In other words, the output is clipped into the range [0, 1] and has slope 1 between the lower and upper thresholds φ and φ+1. The value of φ is initialized to 0 but varies in time (see below). The output value of a cell x at time-step t represents the cumulative (graded) output (number of action potentials per time unit) of cluster x at time t; this value predicts action potential frequency in a certain time-window (centred on t), and, thus, changes in the post-synaptic potentials induced by the neuron pool x in all the 3 These figures are meant to provide only an estimate of the grain of the model; as noted in (Hubel, 1995), the size of a macrocolumn (or “module”) varies substantially between cortical layers (going from 0.1mm2 in layer 4C to 4mm2 in layer 3) and cortical areas (ibid., p.130). 26 synapses downstream from it. We integrate the low-pass dynamics of the network cells (Eq. 2.1) using the Euler scheme with step size ∆t (Press, Teukolski, Vetterling, & Flannery, 1992). The value for ∆t chosen in the simulations was 0.5 (in arbitrary units of time). A relatively wide integration step size was chosen to speed up simulations of the full model, as for the time-continuous (non-spiking) neuron model considered here, smaller step-sizes lead to largely the same network properties. An estimate of the “real” duration of one simulation step (∆t) can be obtained by matching the simulated neurophysiological responses with the corresponding experimental data. According to such approximate mapping (see Sec. 4.3.2 for details), one ∆t is equivalent to about 20ms. Cells come in two different types: excitatory cells (called “E-cells”) and inhibitory cells (or “I-cells”); they model populations of cortical pyramidal neurons and pools of inhibitory interneurons, respectively. The behaviour of an E-cell is specified entirely by Equations (2.1-2.2). I-cells behave identically, except that their output O(x,t) does not saturate at high values (i.e., it is simply V(x,t) for V(x,t)≥0, and 0 elsewhere). In addition, the value used for the time constant τ in Eq. (2.2) is 2.5 for E-cells and 5 for I-cells (in simulation time-steps, or ∆t’s). The use of these two different values is motivated by the higher time constants of IPSPs as compared with EPSPs (Kandel, Schwartz, & Jessell, 2000, p. 923). Assuming that ∆t ≈20ms, E- and I-cells have time constants of about 50ms and 100ms, respectively. Notice, however, that these values should not be interpreted as model correlates of IPSPs and EPSPs time constants, as each cell here represents a population of neurons. Cells can be connected by links (“synapses”). Each synapse is associated to a numeric value (weight) representing the efficacy of that connection. If cell x is linked to cell y with weight wx,y, it contributes a potential O(x,t) · wx,y to the total input VIn(y,t) of the target cell y, where O(x,t) is defined by Eq. (2.2). Without loss of generality, we limit the numeric values of the weights to the range [0, 1]. Finally, E-cells are also endowed with a simple mechanism of adaptation. When a real neuron receives above-threshold stimulation and starts firing, it produces a few spikes at high frequency; if the stimulus is maintained, the rate gradually gets lower and then levels off: this phenomenon is normally referred to as neural (or “spike- rate”) adaptation (Dayan & Abbott, 2001, p. 165; Kandel, Schwartz, & Jessell, 2000, 27 p. 424). In the model, adaptation is realised (in E-cells only) by allowing the value of parameter φ in Eq. (2.2) to vary in time. In particular, φ is tied to the time-average of the cell’s recent output,4 so that higher- (lower)-than-average values of O(x,t) lead to a gradual increase (decrease) in φ. This has the effect of adapting the cell’s response to the input level. 2.2.2 Modelling Hebbian Synaptic Plasticity The weights of the links between E-cells are not fixed but are allowed to change in time, modelling the neurobiological phenomena of long-term potentiation (LTP) and depression (LTD) (Buonomano & Merzenich, 1998; Malenka & Nicoll, 1999). We tried two different computational abstractions of LTP and LTD: one based on Sejnowski’s covariance rule (Sejnowski, 1977), a well-known Hebbian learning rule, the other one based on the ABS model of LTP and LTD (Artola & Singer, 1993). The adoption of Sejnowski’s co-variance rule (Sejnowski, 1977) was motivated by the following considerations: (i) as a Hebbian rule, it is neurobiologically based (e.g., see (Crepel & Jaillard, 1991; Stanton & Sejnowski, 1989; Tsumoto, 1992); but cf. Miller (1996) for a discussion); (ii) it is one of the most simple and computationally tractable correlation-based rules, and (iii) it has been successfully used by a number of connectionist models (e.g. (Peter Dayan & Sejnowski, 1993; Linsker, 1988; Penke & Westermann, 2006; Westermann & Miranda, 2004)). In this rule, the change of synaptic weight ωij of the excitatory link from pre-synaptic cell i to post-synaptic cell j per unit time is defined as: ))(( 〉〈−〉〈−=∆ jjiiij xxxxαω (2.3) where α∈]0,1] is a constant <<1 specifying the learning rate, xi is the current output of cell i, and is the time-average output of cell i. In our simulations, we used α = 0.004. 〉〈 ix 4 For computational efficiency, the time-average of the output O(x,t) of each E-cell is estimated numerically by low-pass filtering O(x,t) with time constant τa=15. The final φ is then obtained by scaling down the estimated time-average by a small factor (0.026 in our simulations; see Appendix A). 28 While this rule captures well the essence of Hebbian learning (neurons that ‘‘fire- together, wire-together’’), it was not originally built to accurately mimic known mechanisms of synaptic plasticity. Miller & Mackay (1994) have shown, for example, that the co-variance rule cannot implement competitive learning (see also Sec. 3.1.3), a behaviour which is often considered a hallmark of many forms of developmental plasticity (Buonomano & Merzenich, 1998; Katz & Shatz, 1996). Indeed, subsequent and more realistic computational models of LTP/LTD exist which address this shortcoming (e.g. (Bienenstock, Cooper, & Munro, 1982; Shastri, 2001; Song, Miller, & Abbott, 2000); see Bi & Poo (2001) for a review). In view of this, and to attain higher biological realism, we chose the ABS model of LTP and LTD (Artola, Bröcher, & Singer, 1990; Artola & Singer, 1993) as a basis for implementing the second learning rule. This rule: (1) is based on experimental evidence and closely mirrors well-known neurophysiological phenomena (see below); (2) is computationally tractable; (3) addresses some of the limitations of the covariance rule (see Sec. 3.1.3); and (4) is an extended and more neurobiologically accurate version of the well-known ‘‘Bienenstock-Cooper-Munro’’ (BCM) rule (Bienenstock, Cooper, & Munro, 1982), which exhibits competitive learning.5 While the BCM rule had been originally developed to account for cortical organization and receptive field properties during development, the ABS model is derived from neurophysiological data obtained in the mature cortex. Such experimental data (Artola, Bröcher, & Singer, 1990) suggest that similar presynaptic activity (namely, brief activation of an excitatory pathway) can lead to synaptic LTD or LTP, depending on the level of postsynaptic depolarization co-occurring with the presynaptic activity. In particular, data from structures susceptible to both LTP and LTD indicate that a stronger depolarization is required to induce LTP than to initiate LTD.6 Accordingly, the ABS rule postulates the existence of two voltage dependent thresholds in the postsynaptic cell, called θ− and θ+ (with θ− < θ+). The direction of change in synaptic efficacy depends on the membrane potential of the postsynaptic 5 A direct comparison of ABS and BCM rules is included in the discussion section of this Chapter, Sec. 2.3. 6 The level of postsynaptic depolarization determines the amount of Ca2+ entering the dendritic spine: a moderate rise in Ca2+ leads to a predominant activation of phosphatases and LTD, while a stronger increase favours activation of kinases and LTP. 29 cell: if the potential reaches the first threshold (θ−), all active synapses depress; if the second threshold (θ+) is reached, all active synapses potentiate. We implemented a tractable version of the full ABS model (Artola & Singer, 1993), as described below. The simplifications involve discretizing the continuous range of possible synaptic efficacy changes to only two levels, +∆w and −∆w (∆w∈]0,1] is fixed a priori and represents the learning rate), and defining as “active” at time t any input link from a cell x such that O(x,t) > θpre, where θpre∈]0,1] is an arbitrary threshold representing the minimum level of presynaptic activity required for LTP to occur. More precisely, given any two E-cells x and y currently linked with weight wt(x,y), the new weight wt+1(x,y) is calculated as follows: wt(x,y)+∆w if O(x,t)≥ θpre and V(y,t) ≥ θ+ wt(x,y)−∆w if O(x,t)≥ θpre and θ− ≤V(y,t) < θ+ wt(x,y)−∆w if O(x,t)< θpre and V(y,t) ≥ θ+ wt(x,y) otherwise (2.4) wt+1(x,y) = where V(y,t) is the membrane potential of the postsynaptic cell y at time t (Eq. (2.1)). In our simulations, we used θ−=0.15, θ+=0.25, θpre=0.05 and ∆w = 0.0005. The three cases of Eq. (2.4) model, respectively, (i) homosynaptic and associative LTP, (ii) homosynaptic LTD, and (iii) heterosynaptic LTD. The latter type of LTD involves synaptic change at inputs that are themselves inactive but that undergo depression due to depolarization spreading from adjacent active synapses. It should be noted that in order to avoid runaway synaptic strengths and unphysiologically high cell activities, in both implementations of the synaptic plasticity mechanisms, synaptic weight values were limited to the range [0, 0.2]. (This means that five fully potentiated synapses could drive a cell to saturation). 2.2.3 System-level Architecture The neural-network model (see Fig. 2.1.(b)) reproduces the auditory input areas (A1, AB and PB) and motor output areas (M1, PM and PF) of the language cortex (Fig. 2.1.(a)). Each of these (primary, secondary and association) areas is modelled as a lattice (grid) of interconnected cells; more precisely, each model area consists of an area of 25x25 graded-response excitatory cells (E-cells) sitting on an underlying area 30 of 25x25 graded-response inhibitory cells (I-cells, not shown in Figure 2.1.(b)). We assume that each E-cell (together with its underlying I-cell) represents a cortical column of size 0.25mm2; thus, each model area simulates the activity of a cortical area of about 625 times 0.25mm2 ≈1.6cm2. Both between- (cortico-cortical) and within-area (lateral and recurrent) excitatory connections are realised, so that one E- cell can project to neighbouring E-cells within the same area and to E-cells of adjacent areas. Links between non-adjacent areas are not implemented (however, note that the two adjacent Areas 3 and 4 correspond to cortical areas that are not anatomically adjacent). This results in a hierarchical architecture that closely reflects the neuroanatomical data discussed in Sec. 1.3; in fact, the primary cortical areas (M1 and A1) are reciprocally connected to their neighbouring secondary areas (PM and AB); these, in turn, are reciprocally linked to their respective association areas (PF and PB), which are also interconnected (via long cortico-cortical links). The same type of hierarchical (or multi-layer) architecture is also found in other sensory modalities, a notable example being the visual system (Lamme & Roelfsema, 2000; Maunsell & Newsome, 1987; Young, 2000). Finally, the two areas of E- and I-cells that constitute a single area are closely and reciprocally connected, forming negative- feedback circuits that model local activity control and lateral inhibition (i.e., winner- take-all) mechanisms. The presence of lateral inhibition and next-neighbour connectivity, based on known characteristics of the cortex (Braitenberg & Schüz, 1998; Douglas & Martin, 2004), is shared by many neurobiologically based connectionist models of the cortex (e.g., (Riesenhuber & Poggio, 1999; Rolls & Deco, 2002)). The precise characteristics of the connections realised are now described in more detail (refer to Figure 2.2). The recurrent excitatory links projecting from an E-cell to E-cells of the same area are realised as follows: a link from a cell A to a cell B is created with probability plink(A,B), where plink(A,B) decreases as the cortical distance between A and B (in lattice units, i.e., cells) increases, according to a Gaussian curve. More formally: 2)/),(( σBAdek −⋅ 0 if sq(A,B) > ρ otherwise plink (A,B) = (2.5) where ρ∈ℵ+, σ∈ℜ+, k∈[0,1], and, if cells A and B have lattice (or area) co-ordinates 31 (xA,yA), (xB,yB), respectively, then sq(A,B) and d(A,B) are defined as sq(A,B) = max (|xA−xB|, |yA−yB|) (2.6) (2.7) d(A,B)=((xA−xB)2 +(yA−yB)2 )1/2 In short: if B is located outside a square of (2ρ+1)2 cells centred on A, the probability of a “synapse” being created between A and B is null; otherwise, the probability is a Gaussian function (with variance σ2 and amplitude k) of the (“cortical”) Euclidean distance between cells A and B (we used k=0.15, ρ=7 and σ=4.5).7 Thus, E-cells that are more than ρ lattice units (cells) apart cannot be (directly) connected. If one cell is assumed to represent a cortical column of size ~ 0.5x0.5 mm2, the radius of within- area lateral projections is 0.5⋅ρ ≈ 3.5mm. Finally, if an excitatory link between two E- cells is created, its weight is initialised to a real number chosen randomly between 0 and wup, with wup = 0.1. Equation (2.5) implies that the probability of having any E-cell linked to itself is exactly7 k. Figure 2.2 Connectivity and structure of a single “cortical” area. Each model area comprises two overlaying bi-dimensional layers of 25-by-25 excitatory (E) and inhibitory (I) cells each. Each E-cell (depicted as a filled black circle) projects (in a sparse, “patchy” manner) to neighbouring E-cells in the same area (REC, cell 1) but also to E-cells in the previous (FB) and next (FF) areas via feedback (cell 2) and forward (cell 3) connections, respectively. The small brighter squares on the black background represent an example of where such patchy links might be established, brighter levels of gray indicating stronger link weights. Inhibitory cells (e.g., I-cell 4, depicted as a dashed circle) receive input from (all) E-cells located within an overlaying 5×5 neighbourhood (INH) and inhibit the E-cell located at the centre of it (i.e., I-cell 4 inhibits E-cell 5). Area-specific inhibition feedback loops are not depicted. 32 Excitatory “forward” and “backward” links, connecting any E-cell A with co- ordinates (xA,yA) in area a1 to other E-cells of an adjacent area, a2, are realised in the same way: randomly weighted links may only be established between A and a square of (2ρ+1)2 cells centred on cell (xA,yA) in area a2, where the probability of creating a link between any two cells is defined by Equation (2.5). For forward and backward connections, the parameters that we used are k=0.28, ρ=9 and σ=6.5. Hence, within- area projections are smaller and less dense, on average, than between-area ones (see Fig. 2.2). Whilst the exact values of these parameters were calibrated through simulation studies, the type of excitatory connections realised in the network is biologically motivated and aims at reproducing the next-neighbour, patchy and sparse connectivity typically found in the mammalian cortex (Amir, Harel, & Malach, 1993; Braitenberg & Schüz, 1998; Douglas & Martin, 2004; Gilbert & Wiesel, 1983).8 The reciprocal connections between a layer of E-cells and its underlying lattice of inhibitory I-cells are similar but somewhat simpler than those described above. First of all, each I-cell (pool of inter-neurons) receives excitatory inputs from all E-cells situated within an overlying 5x5 neighbourhood (i.e., within a radius ρ=2, equivalent to ~1mm) and projects back (with weight =1) to the single E-cell located directly above it. The smaller radius ρ reflects the fact that inhibitory inter-neurons (basket or chandelier cells) present smaller and more verticalised dendritic arborizations than pyramidal cells do (Jin, Mathers, Szabo, Katarova, & Agmon, 2001; Somogyi, Cowey, Halasz, & Freund, 1981). Moreover, the weight of the lateral connections from E-cells to I-cells is not assigned randomly, but decreases with the distance according to the Gaussian function defined in Eq. (2.5) (with σ=2.0, k=0.295). This negative feedback circuits function both as local activation control and lateral inhibition mechanism, simulating the action of a pool of inhibitory interneurons surrounding a pyramidal cell in the cortex (Braitenberg & Schüz, 1998). As discussed in Sec. 1.3, in order to prevent overactivation or bursting, the network 8 In addition to using a sparsely connected network, the stimuli representing acoustic or motor cortical activity were also activating the network in a random and sparse way (see Sec. 3.2.1 for details). Some experimental evidence suggests that the neural code adopted by the brain to represent complex stimuli may indeed be distributed and sparse (e.g., (Olshausen & Field, 1996; Rolls & Tovee, 1995); cf. (Reddy & Kanwisher, 2006) for a discussion). 33 had to implement a self-regulatory mechanism. This mechanism was realised by introducing area-specific feedback-inhibition (FI) loops that control the total activity within each area (see Fig. 2.3). More precisely, all E-cells of each area project (with weight =1) to a single, area-specific I-cell (not part of the underlying layer of local I- cells), henceforth called FI-cell. Each FI-cell, in turn, projects back to all the E-cells of that area, providing an amount of inhibition proportional to the total activity within that area. This guarantees that the total network activation is maintained within “physiological” bounds. Note that, as explained in Sec. 1.5, the strength of these FI loops (depicted as striped arrows in Fig. 2.3) was manipulated during the experiments des of atte A c com 2.3 Th hum pro cribed in Chapter 4 in order to simulate the presence of different amounts ntional resources during language processing. A1 omplete formulation of the computational features of the model, summarizing and plementing the description given in this chapter, is reported in Appendix A. Discussion e neural network model implemented aims at mimicking the basic properties of the an perisylvian language cortex during word learning. The basic anatomical perties that were translated into network structure were the following: Figure 2.3. Implementation of the self-regulatory cortical mechanism in the network architecture. Each feedback-inhibition (FI) cell, depicted in grey, receives input from and projects to all E-cells of one area. See text for details. AB PF PM M1 PB 34 I. The parcellation of perisylvian cortex into M1, PM, PF, and A1, AB and PB, which is known from work in animals and humans; II. The next neighbour and long-distance connections linking these areas directly, which is based on work in animals and humans; III. General principles of cortical connectivity, especially sparseness and patchiness, topography of projections of long-distance connections and next-neighbour preference of local links; IV. Embedding of excitatory cortical neurons into a network of local inhibitory cells; V. Embedding of excitatory cortical neurons into area-specific inhibitory feedback loops designed to regulate local activation levels. Although the connections that were realised are well motivated by neuroanatomical studies in both humans and monkeys (see Pulvermüller (1992) for a discussion), we only reproduced the predominating next-neighbour connections and long-distance, cortico-cortical links that are known to exist in this part of the brain, and did not include fine-grained details such as connections between non-adjacent cortical areas (for example, linking A1 to the auditory parabelt). There are several reasons for these choices. First, neuroanatomical data indicate that each cortical neuron may receive links from fewer than 3% of neurons underlying the surrounding square millimetre of cortex (Stevens, 1989), and that the probability for a connection between two cortical neurons decreases with their distance (Braitenberg & Schüz, 1998). Second, there is little evidence for some of these “jumping” connections: for example, pronounced direct connections between primary auditory and motor cortex do not seem to exist. Third, adding connections that link non adjacent areas (e.g., from area A1 to PB, or from AB to PM, as some evidence would suggest (Catani, Jones, & Ffytche, 2005)) would reduce the minimum number of areas that separate area A1 from M1, making the binding of sensory-motor pattern pairs even easier and effectively resulting in a simplified version of the same model (cf. also Sec. 3.1.3). As noted in Sec. 1.3, the network is primarily designed as a model of the left language dominant perisylvian cortex, as the direct links between superior temporal and inferior frontal cortex appear much more developed there than in the non- dominant right hemisphere (Parker et al., 2005; Rilling et al., 2008). The present 35 number of six areas seems to constitute a minimum for approximating the relevant cortical structures, and, at the same time, a sufficient level of complexity for replicating and explaining, at cortical-circuit level, the rich dynamics and temporal aspects of the neurophysiological brain responses of interest (recall that the neural- network model of the left perisylvian areas proposed by Husain and colleagues (2004) contained four areas, while the connectionist model of early language acquisition described by Westermann & Miranda (2004) simulated only two cortical areas, and assumed all-to-all connectivity between their constituent cells). We conclude this discussion by clarifying the main aspects which distinguish the ABS learning rule (implemented here) from the well-known, classical BCM rule. First of all, in the BCM rule the LTP/LTD threshold – corresponding to parameter θ+ in Eq. (2.4) – is not, like here, a predefined, fixed value, but a sliding threshold that changes according to the running average of the postsynaptic cell’s activity.9 As pointed out by Miller (1996), although evidence suggesting that the LTP/LTD threshold may be affected by the activity of the cell does exist (Bear, 1995; Kirkwood, Rioult, & Bear, 1996), it has been established that this effect is input (i.e. synapse) specific, and that it depends on the pattern of pre-synaptic rather than postsynaptic activity (Abraham & Bear, 1996). Thus, the assumption of a single, postsynaptic-driven LTP/LTD threshold that applies to all the synapses of a cell is not entirely justified.10 Second, in the BCM rule LTD occurs even with very small postsynaptic potentials, whereas experimental evidence suggests that if postsynaptic depolarization remains below a 9 More precisely, for the BCM rule to exhibit stable learning behaviour, the threshold must be a more-than-linear function of the cell’s average output rate (a power of 2 is usually adopted). 10 Although evidence in support of the existence of homeostatic plasticity mechanisms exists (see (Turrigiano & Nelson, 2004) for a review), phenomena such as that of synaptic scaling — showing that prolonged changes in the cell’s activity lead to the multiplicative scaling of all the amplitudes of the miniature excitatory postsynaptic currents (Turrigiano, Leslie, Desai, Rutherford, & Nelson, 1998) — do not constitute direct evidence for the presence of a single sliding LTP/LTD cell-threshold. Equally, synaptic scaling does not justify assuming that the norm of the vector of the synaptic strengths is conserved and equal for all cells, as often presupposed by neurobiologically inspired implementations of Hebbian learning (e.g. (Krichmar, Seth, Nitz, Fleischer, & Edelman, 2005)). 36 certain threshold, the synaptic efficacy should remain unchanged, regardless of any presynaptic activity (Artola, Bröcher, & Singer, 1990). This aspect was implemented in the ABS rule using the second (fixed) threshold, parameter θ− in Eq. (2.4). Finally, unlike the ABS rule, the BCM rule is unable to model heterosynaptic LTD (the weakening of synaptic inputs that are themselves inactive), as it requires at least some presynaptic activity to be present at a synapse for LTD to take place. This form of LTD has been observed in the hippocampus and neocortex (Hirsch, Barrionuevo, & Crepel, 1992); the induction protocols require strong postsynaptic activation (e.g., high frequency stimulation of the cell through excitatory inputs), which is accurately reflected in the third line of Eq. (2.4) by the condition requiring V(y,t) ≥ θ+. 2.4 Summary and main contributions This chapter described the neurocomputational model of the human perisylvian language that was implemented. The original contribution lies in the level of accuracy that the network model incorporates in terms of neuroanatomical structure, connectivity and neurobiological features: (1) six interconnected cortical areas were modelled, identified on the basis of neuroanatomical studies; (2) cell activity was modelled at the level of single cortical columns; (3) within- and between-area synaptic connections were not “all-to-all” but sparse, random, patchy and next- neighbour, as typically found in the mammalian cortex; (4) both local (lateral) and global (area-specific) cortical inhibition mechanisms were implemented; (5) learning was modelled solely as synaptic plasticity (LTD/LTP) mechanisms that are known to take place in the neocortex. As discussed in Sec. 2.1 and 2.3 (and earlier, in Sec. 1.4), there is, at present, no other computational model of language processing specified at the level of cortical columns that implements all of the above features. Neurobiological faithfulness was necessary for (i) modelling and explaining existing neurophysiological data on lexical processes at the cortical-circuit level, and (ii) making precise predictions on the spatio-temporal patterns of brain activation during language processing, which could be tested experimentally using MEG techniques. The accomplishment of these goals is described in Chapters 4 and 5, respectively. 37 Chapter 3 – Simulating the emergence of discrete and distributed cell assemblies for words In this Chapter we describe two sets of experiments carried out using the neural- network model of the left-perisylvian language cortex described in Chapter 2. The main focus here was on simulating and explaining, at the cortical-circuit level, the processes that may take place in the cortex during early word acquisition. 3.1 Experiment Set 1 – Introduction Even during the earliest stage of speech-like behaviour, near-simultaneous correlated activity is present in different brain parts (see Sect. 1.2 and Sec. 1.4). Word production (controlled in inferior-frontal and prefrontal areas) leads to acoustic signals that cause stimulation of superior-temporal auditory areas. Since inferior- frontal (IF) and superior-temporal (ST) areas are connected reciprocally, and neurons that “fire together wire together” (Hebb, 1949), speech-related co-activation of neurons in these areas should lead to the formation of word cell assemblies (CAs) distributed over IF and ST cortex (Braitenberg, 1978; Pulvermüller, 1999). In order to test the mechanistic validity of this theory, we used the model described in Chapter 2 to carry out proof-of-concept simulations aimed at demonstrating the spontaneous emergence of such perception-action circuits in a neurobiologically realistic model of the language cortex. Word learning was simulated in the model as repeated simultaneous activation of predetermined sets of cells in Area 1 (the primary auditory cortex – see Fig. 3.1) and Area 6 (primary motor cortex, M1). The presence of an activity pattern in Area 6 can be thought to represent the spontaneous motor-cortical activity that one might observe in M1 during the babbling phase (Fry, 1966). The pattern presented as input to Area 1 38 simulated the cortical activation that would result in A1 from the near-simultaneous perception of the speech sounds generated by the articulatory movements driven by the activity in M1. The main prediction here was that well-defined, strongly connected CAs would develop for the sensory-motor pairs, associating auditory and articulatory activation patterns and representing the network equivalents of brain circuits for words (Pulvermüller, 2003). The theory predicted that these CAs should be (a) distributed across cortical areas; (b) word-specific, and (c) activated even by partial (e.g., only auditory) stimulation. Due to their strong internal and reciprocal connections, CAs were also expected to exhibit “memory” and “pattern completion” features (see also (Wennekers & Palm, 2007)), i.e., reverberation of excitation within the circuit in absence of any input following stimulation, and full activation after only partial stimulation. 3.1.1 Experiment Set 1 – Methods The network was confronted with four stimulation patterns, each pattern representing auditory and articulatory components of a word form: two predetermined, randomly generated sets of cells were activated at the same time in the primary auditory (A1) and motor (M1) areas (see Fig. 3.1), simulating speech production and correlated perception of the same speech element. A1 AB PB PF PM M1 Area 1 Area 2 Area 3 Area 4 Area 5 Area 6 Figure 3.1. Schematic illustration of network simulation of early word acquisition processes: predefined stimulus patterns were presented simultaneously to areas A1 and M1, resulting in a temporary wave of activation that spread across the network. Black (gray) cells indicate strongly (weakly) activated cells. Synaptic links between cells are not depicted to avoid clutter. See text for details. The number of cells (seventeen) activated in each primary area equalled 2.72% of the 39 total number of cells of one area. The training consisted of the cyclic presentation of the four different pairs of patterns; during each cycle, one stimulus pair was presented continuously to the network for 2 simulation time-steps, followed by a period of 50 steps during which no input was given and activity was driven by white noise. A different stimulus pair, chosen randomly among the other three, would then follow, until each of the four stimuli had been presented to the network for thirty five hundred times (adding up to 14·103 stimulus presentations in total). Throughout the training (including the period in which no input patterns were present) the weights of all the links between E-cells were left free to adapt according to Sejnowski’s covariance rule (see Sec. 2.2.2), which leads to the strengthening of the links between co-activated cells and the weakening of links between cells that present uncorrelated activation. After the training, the network was tested with a view to reveal the presence and properties of cell assemblies, which were expected to emerge for the given auditory- motor pattern pairs. More precisely, for each of the four patterns presented in input, the time-average of the response (output value, or “firing rate”) of each E-cell in the network was computed and stored.11 These averages were used to identify the CAs that developed in the network in response to the four input pairs, as follows: a CA was defined simply as the subset of E-cells exhibiting average output above a given threshold γ∈[0,1] during stimulus presentation.12 Using the above functional definition, we then measured, for different values of γ, (i) CA size (averaged across the CAs that emerged in the network as a result of learning) and (ii) distinctiveness of a CA, quantified as the average overlap (number of cells that two CAs shared) between one randomly chosen CA and the other three (this is also a measure of the amount of cross-talk between pairs of CAs). We repeated the above process and collected these measures for ten different networks, each randomly initialized and trained with a different set of stimulus pairs. 11 The time-averages of the output values were computed during the training, recording the cell responses as the four patterns were presented to the network for learning. 12 E.g., if γ = 0.75, all cells presenting output above 75% of the output of the maximally active cell in their area during stimulus presentation were considered to belong to the active CA. 40 3.1.2 Experiment Set 1 – Results As the training progressed, we observed the emergence of distributed c different assemblies responding selectively to a different input phenomenon becomes apparent by examining the time-averaged resp input pattern induced in the network at the different stages of the learnin r W1 W2 W3 W4 W1 W2 W3 W4 W1 W2 W3 W4 W1 W2 W3 W4 W1 W2 W3 W4 W1 W2 W3 W4 50 100 M1 PM PF AB PB A1 10 ell assemblies, pattern. This onse that each g process. Figure 3.2.(a). Average esponse of one networkto the 4 input patterns (W1,..W4) at different stages of learning: after 10 (top), 50 (middle) and 100 (bottom) stimulus presentations (see also Fig. 3.2.(b) on next page). 41 Figure 3.2 (panels (a) and (b) are shown on two separate pages) contains the time- avera irs (one prese ged response of one (randomly chosen) network to the four input patterns pafor each row), at different points during the training (after 10, 50, 100 stimulus ntations in Fig. 3.4.(a), and after 1000, 2000, 3500 in Fig. 3.4.(b)). 3500 W4 W3 W2 W1 W2 W3 W4 W1 W2 W3 W4 W1 W1 W1 W2 W3 W4 W1 W2 W3 W4 1000 2000 A1 AB PB PF PM M1 W2 W3 W4 Figure 3.2.(b). Average response of one network to the 4 input patterns after 1000 (top), 2000 (middle) and 3500 (bottom) stimulus presentations. See text for details. 42 In the figure, the different output value (firing rate) of each cell within an area is coded using different brightness levels: very bright or white squares indicate cells with average output ∼1.0, dark or black areas indicate silent cells (output ∼0.0). Initially, the presentation of the input pattern pairs produces only weak activation in the two secondary areas AB and PM, and no activation in the central (or associative) areas PB and PF. As the learning progresses, however, the average response produced by the same stimulus reaches further towards the central areas (Fig. 3.2.(a)), where the binding of the sensory-motor patterns is expected to take place. Note that the average responses after 2000 and 3500 stimulus presentations (Fig. 3.2.(b)) are essentially identical, suggesting that the four CAs have reached a stable size and their boundaries have not changed during the past six thousand alternated stimuli presentations. The time-averaged response of the other trained networks was qualitatively equivalent. To see that the binding of the four auditory and articulatory pattern pairs induced by the learning process has taken place, consider Figures 3.3 and 3.4. In them, the rows represent “snapshots” of a network activity taken at successive time points following brief (2-step) stimulation pulse to Area 1 with the auditory part (left pattern only) of a stimulus pair. Time t is in simulation steps. The network of Fig. 3.3 was untrained (i.e., the stimulus presented to A1, shown in the leftmost column, had never been “heard” before by the network). Figure 3.4 shows the response of the network to a learnt auditory pattern after the training had been completed. In absence of training, activity propagates in a rather cloudy and unfocussed manner, reaching only the first and second areas, and is then dispersed (Fig. 3.3). One point to notice is that the wave of activation spreading appears to be “pushed” to the right. This is due to the presence of the FI mechanism (see Sec. 2.2.3): the area-specific inhibition loop takes effect as soon as the activation within one area increases, and remains active for a few steps; this prevents activation to immediately “re-enter” an area which has just been active. The response of a trained network to a known, familiar auditory pattern differs significantly (see Fig. 3.4). First of all, the activity is now much more focussed: only very specific sets of cells are strongly activated. At time t=2 the active cells in Area A1 already produce activation in a specific subset of cells in area AB. The activity of these cells is significantly higher than that of cells activated in the surround by the non-specific wave (compare their brightness with that of the active cells in AB in Fig. 3.3). This indicates that their input must come directly from the strongly active cells 43 in A1. Hence, these cells will respond strongly whenever this specific input pattern is present. Furthermore, the activation does not stop at the first few areas, but progresses through the entire network until it reaches area M1. This indicates the existence of strong (possibly reciprocal) links between cells distributed across the six areas, which developed as a result of learning. Crucially, the cells activated in M1 reconstruct part of the motor pattern (shown in the figure for illustrative purposes only) that had been presented to that area in association with this specific “word”. Thus, the observed behaviour suggests the presence of a distributed, stimulus-specific CA in the network associating the two sensory-motor patterns. t=8 t=10 t=1 t=4 t=0 t=2 t=6 Input to A1 A1 AB PB PF PM M1 Figure 3.3. Network response to stimulation of A1 (auditory cortex) with an activation pattern before training. Each row in the figure is a snapshot (taken at time-steps t=0, 1, 2, ..., 10) of the output activity of the six model areas (columns). The input pattern briefly presented to Area 1 is shown on the left. 44 t=0 t=1 t=2 t=4 t=6 t=8 Input to A1 A1 AB PB PF PM M1 Figu (ave thres 3.1.1 eme as a a “c relia and aver show CAs with Figure 3.4. Network response to A1 stimulation with an auditory activation pattern (on the left) after training. The motor pattern that had been paired with the auditory input is shown on the right for illustrative purposes. Refer also to Fig. 3.2 [after (Garagnani, Wennekers, & Pulvermüller, 2007)] res 3.5 and 3.6 below plot cell assembly size and specificity, respectively raged across ten trained networks) as a function of the minimal-activation hold γ, the parameter used for identifying the CAs and their boundaries (see Sec. ). Fig. 3.5 indicates that, on average, distributed, stimulus-specific CAs reliably rged in all the network simulations. However, their size decreased approximately linear function of the minimal-activation threshold γ. This suggests the absence of ritical” level of activation above which only a well-identifiable set of cells is bly activated. Instead, the boundaries of the CAs appear to be somewhat “fuzzy” not so well defined, and to overlap significantly with those of other CAs. The age overlap (or cross-talk) between pairs of CAs is reported in Figure 3.6, which s the % of shared cells between a randomly chosen CA and (i) the other three (we plot the mean of the three overlaps) and (ii) the CA maximally overlapping the chosen one, averaged across the ten networks. 45 Average cell-assembly size 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 minimal-activation threshold γ nu m be r o f c om po ne nt c el ls Figure 3.5. Average cell-assembly size. The average (SEM) number of cells within the entire network that were activated above threshold γ by a specific input stimulus (auditory and articulatory word forms) is plotted as a function of the threshold γ. Vertical bars give standard errors of the mean (SEM). Overlap between pairs of cell assemblies 35 average overlap maximum overlap-5 5 15 25 0 0.2 0.4 0.6 0.8 1 minimal-activation threshold γ % s ha re d ce lls Figure 3.6. Cell-assembly distinctiveness: average (SEM) overlap between pairs of CAs as a function of the minimal-activation threshold γ (see text for details). 46 3.1.3 Experiment Set 1 – Interim Discussion We applied the model of the left-perisylvian language cortex to simulate brain processes of early language learning. The sensory-motor patterns that were repeatedly presented to the network (producing simultaneous activation of Area 1 and Area 6) led to the formation, through Hebbian learning processes, of strongly interconnected sets of cells, associating the acoustic and articulatory components of the simulated word patterns. These cell assemblies were (a) distributed across cortical areas, (b) word-specific, and (c) activated even by partial (e.g., purely auditory) stimulation. The formation of distributed and distinct (although partly overlapping) circuits associating activity patterns as a result of biologically grounded correlation learning in a network structure involving several areas is remarkable and of theoretical significance, particularly in view of the random and sparse connectivity realised between and within the areas. Indeed, it is often argued (O'Reilly, 1998) that learning (hetero) associations between arbitrary pairs of patterns requires supervised mechanisms analogous to back-propagation (Rumelhart, Hinton, & Williams, 1986), whose biological plausibility remains questionable. As discussed in Sec. 2.3, the connections implemented in the model are well motivated by neuroanatomical studies in both humans and monkeys (cf. (Pulvermüller & Preissl, 1991)). Still, one may want to ask questions about the dependence of the results on the network structure. The model is robust to a reduction in the number of areas, and produces analogous results when 4 or 3 areas are used. This is because a smaller number of areas actually means a shorter path to be traversed by the auditory and motor activation patterns in order to “meet” the wave of activity coming from the opposite end. Indeed, if the number of areas is reduced to only two, the model becomes a simple two-layer interactive network with no “associative” layer (analogous to that used by Westermann & Miranda (2004)) and the binding between the two patterns takes place through the reinforcement of the synapses that exist between co-active cells. Hence, the introduction of such more direct links is not expected to produce any significant change in the qualitative behaviour of the network. On the other hand, an increase in the total number of areas separating the two “primary” areas A1 and M1 is expected to make CA formation slower and more 47 difficult. Subject to some parameter changes, however, and up to a certain number of additional areas, results should still hold, although a greater number of training steps will be required. It should be noted, however, that there is relatively strong evidence for the existence of a 6-area pathway connecting A1 to perisylvian primary motor cortex (Pulvermüller, 1992). Even if additional, “parallel” pathways, connecting A1 and M1 through a number of areas higher than 6, were introduced in the model (see, e.g., (Catani, Jones, & Ffytche, 2005)), it is unlikely that their presence would prevent the development of cell assemblies within the shorter, still viable, 6-area pathway, which would be automatically recruited. Although in many cases CAs were entirely word-specific (i.e., none of the cells active above threshold for a specific word was also active for a different word), sometimes they did overlap significantly (see Fig. 3.6). A high level of overlap (or cross-talk) is undesirable as it may cause a CA to activate in response to the wrong stimulus pattern, and activity in one CA to reliably induce ignition of another CA; in presence of Hebbian correlational learning, this eventually leads the two CAs to merge into a single one which responds to both input stimuli. Indeed, this phenomenon (and other inter-related problems) often hindered the formation of distinct CAs in the network during preliminary simulations (see Appendix B). The significant overlap between CAs is a symptom of the network’s inability to separate, or “pull apart” input representations that produce overlapping activations. We attribute this to the fact that Sejnowski’s covariance rule (Sec. 2.2.2) does not implement competitive learning (K. D. Miller & Mackay, 1994). Competitive learning (Grossberg, 1976a, , 1976b; Kohonen, 1984; Kohonen & Makisara, 1989) is a form of unsupervised learning in which the network learns how to categorize and gradually “separate” input patterns so that only one output unit responds to a given pattern. The covariance rule fails to achieve this, and encourages CA merging rather than CA separation. To see why this is so, consider the 2-area network of cells depicted in Figure 3.7 below. Let us assume that the network uses sparse coding, and that the cells in area 1 are repeatedly confronted with different patterns of activation. Assume that two input patterns (called A and B) strongly activate cells A1, A2, C1, C2 and B1, B2, C2, C3 respectively. 48 During learning, the weights are modified according to the co-variance rule, which can be summarized by the following table: Pre-synaptic cell Post-synaptic cell ∆w= pre* post active ↑ active ↑ ↑ silent ↓ active ↑ ↓ active ↑ silent ↓ ↓ silent ↓ silent ↓ ↑ The size of the arrows in the table indicates the magnitude of the difference between current and average activity of that cell; the orientation indicates the sign of such difference (up: positive; down: negative). The differences are larger when cells are fully active than when they are silent (in a sparsely active network, a cell’s average activity is much closer to zero than to its maximum level of activation). First, note that links between two cells that are simultaneously silent are strengthened (case (d)). In addition to not being neurobiologically plausible, this leads to an overall “gluing” effect. Addressing this issue by simply setting ∆w = 0 in case (d) would not be sufficient to solve the merging problem. In fact, because of the Figure 3.7 Schematic illustration depicting an example of overlapping cell assemblies. Nodes simultaneously active are depicted using the same fill pattern. The weights of the links between area 1 and area 2 are labelled w1,....,w6. The dashed and dotted ines identify the two CAs activated by two different input patterns (see text for details) area 1 area 2 B2 B1 A2 A1 w1 C1 w2 w3 C2 w4 w5 C3 w6 (a) (b) (c) (d) . l 49 differences in magnitude, the net effect produced by the alternated strengthening (a) and weakening (cases (b) or (c)) of a link is an increase in strength. In the example of Fig. 3.7, alternation of inputs A and B means alternated increase (a) and decrease (b) of w3 and w4: the net effect is a weight increase in both, which, in the long run, will cause the two cell assemblies to merge into a single one. This problem may be addressed in different ways, e.g., by imposing a fixed ∆w (so that weakening and strengthening would produce weight changes of equal magnitude), changing the density of the between-areas connectivity, or increasing the level of spontaneous activity in the network (so that the average activation of a cell is mid-way between silent and fully active). Some of these strategies were adopted in the revised version of the model, in which the covariance rule was replaced by the second, more biologically accurate Hebbian rule, based on the ABS model of LTP/LTD (see Sec. 2.2.2). Unlike the covariance rule, the ABS rule: • uses the same amount of weight change ∆w per unit time for both LTP and LTD; • does not strengthen links between cells that are simultaneously silent; • uses a single parameter’s value (the postsynaptic membrane potential) to determine whether LTP or LTD should occur – see Eq. (2.4). In view of the previous considerations, we expected the first two features to lead to a lower degree of merging and overlap between CAs. The last feature (also based on neurobiological evidence) allows one to precisely define the ranges of values of the postsynaptic membrane potential for which either LTP or LTD will occur. We conjectured that, by changing the ratio between the widths of these ranges, it should be possible to modulate the total amount of competitive learning that takes place in the network. Experiment 1 was then repeated using the revised model; the results of these simulations are reported below. 3.2 Experiment Set 2 – Emergence of CAs in the revised model This set of experiments was analogous to Experiment Set 1 (previous section), but was performed using a model that implemented the Artola-Bröcher-Singer rule instead of Sejnowski’s covariance rule to simulate synaptic plasticity. This aimed at 50 reducing the amount of overlap between the CAs and hence the likelihood of them merging. In addition to replicating and improving on the results of Experiment Set 1, we were interested in quantifying the functional characteristics and neurophysiological properties of the CAs in terms of their distributedness, memory features and pattern completion abilities. 3.2.1 Experiment Set 2 – Methods The methods for this set of experiments are analogous to those used in Experiment Set 1, Sec. 3.1.1: we generated and randomly initialised eight different networks, and trained each of them with four different pairs of random sensory-motor patterns; here, each stimulus pair was presented five thousand times. As in Experiment 1, subsequently to the successful emergence of distributed cell assemblies, we measured (i) average CA size and (ii) average overlap (number of cells that two CAs shared). In addition, by recording the networks’ responses to stimulation of Area 1 only, we also measured CA input specificity and recollection (or “pattern reconstruction”) ability of a CA, which quantified how easily (what portion of) a CA became fully active following activation of just a subset of its component cells. This was done by presenting, for four time steps, only the auditory component of the four learnt pairs and measuring, area by area, the average of (a) the induced CA activity (in %), and (b) the cumulative portion of CA cells that were reactivated by the stimulus. The averages were calculated across all the four patterns for each of the eight different networks, producing a total of 32 different (stimulus, network) pairs. 3.2.2 Experiment Set 2 – Results Like in Experiment Set 1, as the training progressed, we observed the emergence of distributed cell assemblies associating sensory-motor patterns. However, the CAs that emerged were qualitatively different from those observed in the previous set of experiments. Figure 3.8.(a) shows the time-averaged response of one (randomly chosen) of the networks to the presentation of one of the four input patterns (words) at different stages of learning. Compare the network responses shown in this figure with those shown in Figure 3.2 (a) and (b). 51 M1 PM PF PB AB A1 10 100 1000 5000 Figure 3.8.(a). Average response of one network to one of the 4 word pattern pairs that it had been trained with, at different stages of learning: after 10, 100, 1000 and 5000 stimulus presentations. Like before, activation is initially weak in the middle areas; however, as learning progresses, the CA quickly reaches and expands within areas PB and PF, where the binding between sensory and motor patterns takes place. The number of cells that are involved in the binding is significantly higher than that observed in the previous simulations. Crucially, at later learning stages, the size of the CA in the middle areas decreases (compare the responses after 100 and 5000 stimulus presentations: both the number of white squares and the intensity of their activation is reduced). This indicates that, after the initial period of expansion, the cells in areas PB and PF, most densely populated, undergo a process of competition, which allows only the most active ones (and the strongest links) to survive, leading to a “pruning” of the synaptic connections and CA size reduction. Figure 3.8.(b) illustrates an interesting example of one of the networks responding to the auditory pattern of a word stimulus after training. The behaviour of the network during the first 12-16 steps is analogous to that obtained with the previous version of the model (see Fig. 3.4). Notice, however, also the presence of a fast, unfocussed wave of activity, produced by non-specific activation in the auditory area, which quickly traverses the entire network and is over by time t=20. 52 Area 1 Area 2 Area 3 Area 4 Area 5 Area 6 Input to Area 1 t=0 t=44 t=1 t=2 t=4 t=8 t=12 Figure 3.8.(b). Network response to Area 1 stimulation with the auditory component of one of the learnt pairs after training. Each row is a snapshot of the network output taken at successive time points. The associated motor pattern that the network was trained with is shown, for comparison only, on the right hand side. See text for details. t=16 t=20 t=40 t=48 53 Consider Area 1-3: the specific cells activated there remain active well beyond the removal of the input stimulus. This suggests that these cells are part of a circuit of strongly connected cells, which emerged with learning and which create within- and between-area reverberant activity. As in Experiment 1, the somewhat slower propagation of activity within specific, isolated cells continues across the network, although the number of cells strongly active appears to decrease as the middle and rightmost areas of the network are reached (time t=12–24). When the reverberant activity reaches the final area (t=20), an interesting process takes place: from the activity of a few cells situated mostly in the top part of Areas 4-6, an entire new “pulse” of reverberant activation develops, not producing a dispersed cloud but strongly activating only a very specific set of cells in Areas 6, 5 and 4. Notice that when this second slow wave “peaks” (t~36), the articulatory activation pattern (shown in the rightmost column of Fig. 3.8.(b) for illustrative purposes only) that had been paired with this auditory input pattern is reproduced almost entirely in Area 6. Finally, the wave of reverberant activation stops when it fails to activate a specific set of cells in Area 3 strongly enough so as to allow self-sustained activation to continue. Average CA size 0 20 40 60 80 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Activation threshold γ N um be r o f c el ls . Size Figure 3.9. Average CA size. The average (SEM) number of cells within the entire network that responded above threshold to a specific input stimulus is plotted as a function of the threshold γ. Figure 3.9 plots CA size (averaged across 32 CAs, produced by the four input stimuli in each of the eight networks) as a function of the threshold γ, where a CA is defined as specified in Sec. 3.1.1. As one would expect from such a functional definition, 54 small values of γ still correspond to larger assembly sizes, and vice versa. However, the size of the CA does not change much when γ is in the range [0.05, 0.7], and, even for γ=0.95, the CA size is around 50 cells. Thus, CAs appear to be well identifiable entities formed by a “core” of about 50 cells that respond very strongly (at least 95% of the maximally active cell, on average) to the input stimulus, and by an additional “belt” of about 30 cells that respond more moderately but still well above average (at least 70% of the maximally active cell). Figure 3.10 plots the results concerning the CA distinctiveness. The maximum overlap (i.e., maximum % of cells in a CA that are shared with another CA) is above 5% only for values of γ < 0.1. The average overlap between two CAs, on the other hand, is always below 5% and less than 2% for γ > 0.2. This makes cross-talk very unlikely, as activation of 2%-5% of the cells is not sufficient to cause full CA activation (see also Fig. 3.13). Overlap between pairs of cell assemblies 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 minimal-activation threshold γ % o f s ha re d ce lls Maximum overlap Average overlap Figure 3.10. CA distinctiveness. The graph plots the mean overlap between one CA and the other three (solid line) and the overlap between one CA and the maximally overlapping CA (dashed line) as a function of the minimal-activation threshold γ. The data are the average (SEM) of the results of eight network simulations. Figure 3.11 and 3.12 show the area-specific spatio-temporal activation and pattern completion properties of the CAs, respectively. Figure 3.11 plots the percentage of 55 CA cells active above threshold γ13 in each area after stimulation of Area 1 only with a learnt auditory pattern. The figure delineates how the wave of CA activation spreads across the network, and the contributions of the different areas to the total activation, each area peaking at a different time and with different intensity. Figure 3.12 summarizes the average pattern-completion abilities of the network, plotting the cumulative portion (in %) of CA cells in the different areas that are re-activated following stimulation of Area 1. This graph is obtained by integrating over time the plots of Figure 3.11. As one might expect, pattern completion worsens as activity propagates further away from the input area and activation becomes weaker. The motor pattern that had been paired in Area 6 with the auditory pattern in Area 1 (now given as input to the network) is, on average, reconstructed only partially (approximately 30%), while the average pattern reconstruction across the six areas is above 75%. It should be noted that the network responses to learnt patterns never contained any “spurious” cells; in other words, the only “errors” are missing cells that fail to be fully reactivated. Thus, although the associated pattern is not entirely reconstructed, all the cells activated by the stimulus are correct and can be seen as a reliable set of “core” representation units. Average CA activation following Area-1 stimulation 0 20 40 60 80 100 1 11 21 31 41 51 time (simulation steps) % C A c el ls ac tiv e ab ov e . th re sh ol d . Area1 Area2 Area3 Area4 Area5 Area6 Figure 3.11. Spatio-temporal pattern of activation of a CA. The curves show the average area-specific CA activation following Area-1 stimulation with one of the learnt auditory patterns (words) as a function of time. 13 We used γ=0.45, but, as discussed above, any γ∈]0.2, 0.7] is expected to produce similar results. 56 Cumulative portion of CA cells re-activated by stimulation of Area 1 Area1 Area2 Area3 Area4 Area5 Area6 Average 0 20 40 60 80 100 % C A c el ls ac tiv at ed Figure 3.12. Average pattern-completion abilities of a CA. The bars show the cumulative portion of a CA (% of CA cells per area) that a learnt stimulus presented to Area 1 successfully reactivates over the 50 steps following stimulation, averaged across 32 different auditory patterns (four patterns per network). The rightmost bar indicates the average of the six area-specific values. Cell-assembly responses to Area-1 stimulation with a word 0 10 20 30 40 1 11 21 31 41 51 simulation time-step su m o f c el ls ' o ut pu t (w ith in C A ) .. Maximally active CA 2nd best CA 3rd best CA 4th best CA Figure 3.13. CA specificity. The graph shows the average (SEM) response of the four different CAs following auditory (Area 1) stimulation with one of the learnt patterns. The activation threshold used was γ=0.45. Finally, Figure 3.13 illustrates the results on CA input specificity. Each curve plots the sum, across the six areas, of the output of all the cells of each CA as a function of time. CAs appear to be highly specific: only one CA is strongly activated by the pattern in input, while the others show very little, if any, activity. These results 57 confirm the conclusions drawn from Figure 3.10, which suggested little probability of cross talk between CAs. 3.2.3 Experiment Sets 1 & 2 - Discussion The results of Experiment Sets 1 and 2 demonstrate the emergence of CAs in the network. As mentioned in Sec. 3.1.3, the successful setup of distributed Hebbian circuits spanning a realistic number of cortical areas forming the substrate of perception-action learning is remarkable, given that no computational “tricks” such as back-propagation of errors (Rumelhart, Hinton, & Williams, 1986) were used during the training. The emerging CAs are strongly interconnected sets of cells that exhibit: (a) Distributedness and sparseness (Fig. 3.9 and Fig., 3.11, respectively): one CA consists, on average, of less than 100 cells distributed across the six areas, equivalent to less than 2.67% of all cells within the network; (b) reverberation and persistence of activity (Fig. 3.13 shows strong CA activity until 35-40 steps after stimulus offset) in absence of input within well- identifiable sets of cells; (c) relatively stable size for different critical activation thresholds γ (Fig. 3.9); (d) small overlap and cross-talk between pairs of CAs (less than 5% on average), and high specificity of response (Figures 3.10 and 3.13); (e) pattern completion abilities (averaged across areas) above 75%, in spite of the sparse and random character of the network connectivity (Fig. 3.12). These results suggest that a CA behaves as a highly specialised, discrete activation (“on-off”) functional unit which, if sufficiently stimulated, becomes fully active through a positive-feedback process of reverberation (Braitenberg, 1978; Hebb, 1949; Pulvermüller, 1999). Indeed, the macroscopic behaviour of a CA appears to be non- linear and characterised by a specific activation threshold, very much like a single neuron. For the positive-feedback loops that form a CA to be able to “drive” the circuit towards full activation, it is necessary that sufficient activity is captured by them so that the amount of self-generated excitation overcomes the amount of “leakage” and dispersion. If the activity present in the positive-feedback loops 58 exceeds this threshold (the value of which depends on the specific characteristics – strength, reciprocity – of the internal connections of the CA), the total activity in the CA does not dissipate but starts to increase and propagate to the rest of the CA, in a wave-like fashion (see Figures 3.4 and 3.8), producing a momentary “pulse” or peak of activation in the entire CA (see Fig. 3.13). This surge of activity in the network (sometimes called “ignition” (Braitenberg, 1978)) causes the area-specific inhibition mechanism to take effect, which then subsequently inhibits the CA and the entire network (overshoot). In the revised version of the model, Hebbian learning was implemented according to the ABS model of LTP and LTD (Artola & Singer, 1993). Compared with the original concept of coincidence learning mentioned by Hebb (in which synaptic modification occurs only as strengthening of connections between two co-active neurons), both the covariance and ABS rules envisage, in certain cases, the weakening of links: more precisely, while co-occurrence of sufficient pre-synaptic activity (O(x,t)≥θpre) and strong post-synaptic depolarization (V(y,t)≥θ+) leads to a weight increase (LTP), presence of only one of these conditions leads to a decrease (LTD). Such weakening contrasts the ever increasing synaptic weights that are brought about by coincident activation. The effects of adopting the more neurobiologically realistic (ABS) rule, however, are evident. First of all, CA size is much more stable across different threshold values (compare Fig. 3.9 and Fig. 3.5); the results indicate that the distributed representations that emerged are clearly identifiable sets of strongly interconnected cells. Secondly, CAs are significantly “thicker” in the middle areas (compare Fig. 3.8.(a) and Fig. 3.2); this allows more cells to be involved in the binding between sensory and motor patterns, leading to stronger CAs and better pattern completion capabilities (compare the portion of the motor pattern reconstructed in Area 6 by the response shown in Fig. 3.8.(b) with that shown in Fig. 3.4). Most importantly, the adoption of the ABS rule introduced a competitive element in the learning process (see Fig. 3.8.(a)) which minimized the problems of merging and cross-talk (compare Fig. 3.10 and Fig. 3.6) and led CAs to become anatomically distinct (and functionally discrete) units. The adoption of a more realistic unsupervised learning rule made the formation of relatively stable cell assemblies more difficult than it would have been using supervised (e.g., backpropagation) learning methods, and made it subject to the 59 optimization of various parameters of the network. Appendix B describes these problems in detail and the way in which they were addressed. In the past, some of these issues have been used as arguments against the feasibility of correlation learning and of the Hebbian cell-assembly model. For example, in a useful compendium of such arguments, Milner (1996) wrote: “It is difficult […] to understand why the synaptic modification that links neurons to form an assembly fails to involve more and more neurons until the whole brain becomes one immense and useless cell assembly” (ibid., p.70) and, later: “Another serious problem is that an assembly of neurons linked by excitatory connections would be inherently unstable and liable to fire out of control at the slightest disturbance” (ibid., p.72). Our model provides evidence that these problems can be overcome, even if biologically plausible associative learning is used. First of all, the growth of a CA is limited by the slow but constant competition for shared cells that takes place between different CAs (see Fig. 3.8.(a)). To clarify: every time a CA is stimulated, the learning causes some synapses to strengthen and others to weaken. As a result, some cells become more strongly connected to a CA (i.e., more likely to be activated by it), and less to other, inactive, CAs. If the network were always confronted with only one stimulus, the corresponding CA would indeed keep growing and take over the entire network. However, during training, the input stimuli alternate continuously (see Sec. 3.1.1); each different stimulus excites a different CA, possibly overlapping with other CAs. The continuous alternation of different stimuli causes the cells that are shared by the different CAs to be alternatively bound more strongly into one or the other assembly. If the input stimuli alternate in a balanced way (as was ensured here), the cells in the overlap never become entirely an exclusive part of any of the competing CAs; rather, they are the site of a constant competition in which each of the assemblies is limiting the growth of the others, producing a state of dynamic equilibrium. Secondly, regarding the instability of a CA (and of the network), spontaneous activation of CAs during periods in which no input was presented did occur, as 60 predicted, due to the background noise present in the network.14 However, whenever this happened, the self-regulation mechanism (FI) started to operate, causing the CA to be “switched off” soon after its full activation and preparing the ground for the next CA activation. One last point concerns the number of (sensory-motor) pattern pairs used to train and test the network, which is very small (four) when compared to the number of words that our brain can store. Implementing a large-scale network capable of storing a realistic number of lexical items was not one of the objectives of this work: our main aim was to show proof-of-concept simulations that enable the explanation of previous experimental findings and prediction of future ones. As the next Chapter will demonstrate, for these purposes it is sufficient to model the acquisition and processing of a limited number of exemplar sensorimotor patterns, lexical items, or words. 3.3 Summary and main contributions We used the model described in Chapter 2 to test the mechanistic validity of the theory according to which speech-related co-activation of neurons in IF and ST cortex should lead, in presence of Hebbian learning, to the formation of word cell assemblies (CAs) distributed over these areas (Braitenberg, 1978; Pulvermüller, 1999). The simulations demonstrated the spontaneous, unsupervised emergence of such strongly connected perception-action circuits, providing proof-of-principle evidence in support of the theory, and demonstrating the viability of correlational learning for the formation of (sensory-motor) associations in a hierarchical, brain-like, multi-layered neural network architecture. A second contribution of the simulations is the prediction that the emerging lexical representations will exhibit the following characteristics: functional discreteness (“on- off” activation levels), cortical distributedness, sparseness, reverberation (short-term memory features), anatomical distinctiveness, and pattern completion abilities. Some of these characteristics, together with the simulations described in Chapter 4, will give rise to specific predictions about the neurophysiological effects of attention on lexical processes, which will be tested in Chapter 5. 14 This behaviour was not entirely undesired, as it can be interpreted as a model analogue of a “spontaneous thought”. 61 Chapter 4 – Simulating Lexicality and Attention effects This chapter describes two additional sets of experiments carried out using the neural- network model of the language cortex presented in Chapter 2. These tested whether a network that had developed a set of word representations as a result of learning (see Chapter 3) could replicate and explain existing neurophysiological data on the effects of lexicality and attention on the processing of speech. 4.1 Experiment Set 3 – Replicating lexicality effects Here we used the set of eight networks resulting from Experiment Set 2 (Sec. 3.2) to simulate the brain responses to meaningful words and meaningless pseudowords (i.e., non-English, phonotactically correct word-like material, such as “sklued”, or “drock”). We wanted to test whether the model could replicate recent evidence according to which early (< 200 ms. post stimulus onset) neurophysiological responses are larger to (spoken) words than to pseudowords – see Sec. 1.1., Figure 1.2 (Korpilahti, Krause, Holopainen, & Lang, 2001; Pettigrew et al., 2004; Pulvermüller et al., 2001; Pulvermüller & Shtyrov, 2006; Shtyrov & Pulvermüller, 2002). 4.1.1 Experiment Set 3 – Methods We recorded the network responses following brief stimulation (four time steps) of the auditory area only (Area 1, see Fig. 3.1) with either one of the learnt patterns (words) or a new, previously unseen, pattern (pseudoword). We generated pseudoword patterns by “gluing” together randomly scrambled sub-word patterns. More precisely, for each network, the four pseudoword patterns were generated by combining sub-parts of the four word patterns at random (recall that these are 25-by- 25 squares of binary configurations containing n=17 cells set to “1” and 608 cells set to “0”), using the following procedure: 62 • For all i∈{1,..4}, divide word pattern wi into 25 sub-patterns of size 5x5; • For all j∈{1,..4}, initialise the pseudoword pattern pwj as empty; • let j=1: (A) copy six randomly chosen sub-patterns from each of the four wi into pwj , so that the original position of each sub-pattern in wi is preserved in pwj; (B) if the number of active cells in pwj is > (<) n, set a randomly chosen cell in pwj to 0 (to 1) until pattern pwj contains exactly n set to “1”; • Repeat steps (A—B) for j=2,3,4. In sum, each pseudoword pattern pwj was made up of 24 quadrants (sub-patterns) of size 5x5 that had been “cut and pasted” from the word patterns, plus one empty 5x5 square. Each sub-pattern in pwj was located just where it was in the original word, and each word pattern wi contributed the same number of sub-patterns (six) to each pwj. Thus, this algorithm produces pseudoword patterns that preserve part of the original features of the words (the total number of active cells in each pattern is preserved, and subsets from each of the wi are reproduced in each pwj) while, at the same time, mixing the four words in a random and balanced way. 4.1.2 Experiment Set 3 – Results Presentation of patterns not previously stored in the network (pseudowords) produced, on average, a smaller initial response in the network than the one obtained with learnt patterns (words), and led to only partial activation of the cell assemblies. Figure 4.1 below summarises the results from eight different networks, obtained from Experiment Set 2 and each trained using a different set of four word patterns. In the graph, the average total network response to a word (learnt pattern) or pseudoword presentation is plotted against time (in simulation steps). The total network activity was calculated as the sum, across the six areas, of the output values (or “firing rates”) of all the E-cells. Bars indicate standard errors of the mean (SEM). 63 0 10 20 30 1 11 21 31 41 51 simulation time step to ta l s um o f c el ls ' o Figure 4.1. Simulated cortical response to spoken words and pseudowords. The graph plots the average total network activity (sum of all cells’ firing rates in the entire network, averaged across 32 different trials using eight different networks) following presentation of a word or pseudoword pattern to Area 1 (auditory cortex). Note the delayed and reduced peak of the pseudoword curve compared with the word response. CA responses to Area-1 stimulation with pseudowords 0 10 20 30 1 11 21 31 41 51 simulation time-step su m o f c el ls ' o ut pu t (w ith in C A ) Maximally active CA 2nd best CA 3rd best CA 4th best CA Figure 4.2. CA-specific response to pseudowords. The graph shows the average (SEM) response of the four different CAs following auditory (Area 1) stimulation with a pseudoword pattern, averaged across eight networks (cf. Fig. 3.12). Network response to words and pseudowords 40 50 ut pu t Words Pseudo-words 64 Figure 4.2 plots the average response of each of the four CAs (identified using threshold γ=0.45) to stimulation with a pseudoword pattern. Unlike the response produced by a word (Fig. 3.13), in which essentially only one CA was activated, here, all four CAs initially responded, although to different degrees. After about 10 steps, the maximally stimulated CA “prevails” over the other three and becomes strongly active, while activity in the other CAs quickly falls to zero (although some activation continues to reverberate in their circuits). Note that the peak of the activity of the CA responding most strongly is still (on average) significantly smaller than the peak of the CA’s activation following stimulation with a word (cf. Fig. 3.13). Network response to Area-1 stimulation with words 0 5 10 15 1 11 21 31 41 51 simulation time-step to ta l s um o f c el ls ' o ut pu t Area 1 (A1) Area 2 (AB) Area 3 (PB) Area 4 (PF) Area 5 (PM) Area 6 (M1) Figure 4.3. Area-specific network responses to stimulation of the auditory cortex (Area 1) with learnt word patterns, averaged across 32 pattern-network pairs. The sum of the six curves equals to the word response plotted in Fig. 4.1 (red curve). Figures 4.3 and 4.4 break down the total network responses to words and pseudowords plotted in Fig. 4.1 into area-specific contributions (as a function of time). The difference between the two responses appears to be caused mostly by reduced activation amplitudes in Areas 2, 3, 4 and 5 following pseudoword stimulation. The peaks of these curves also appear to be delayed in time. Apart from 65 this delay, the amount of activation produced in the motor area (Area 6) is relatively unaffected by the lexical status of the stimulus. Network response to Area-1 stimulation with pseudowords 0 5 10 15 1 11 21 31 41 51 simulation time-step to ta l s um o f c el ls ' o ut pu ts Area 1 Area 2 Area 3 Area 4 Area 5 Area 6 Figure 4.4. Area-specific network responses to stimulation of the auditory cortex (Area 1) with pseudoword patterns, averaged across 32 pattern-network pairs. The sum of the six curves equals the pseudoword response plotted (in blue) in Fig. 4.1. 4.1.3 Experiment Set 3 – Interim Discussion The implemented neural-network model of the language cortex can straightforwardly replicate a feature of language processing in the human brain – namely, that within certain experimental conditions, spoken words stimuli elicit stronger brain responses in the left perisylvian language cortex than meaningless pseudowords never heard before (see Sec. 1.1, Fig. 1.2). The critical feature, according to the present simulations, is that the distributed representations that had emerged for the learned patterns amplify cortical activation due to reverberant (feedforward and feedbackward) connections within the word cell assembly. Notice that the conjecture that information about pseudoword stimuli propagates through synapses that have a mean strength significantly lower, on average, than those mediating word information is not entirely correct. In fact, a pseudoword pattern is built by combining smaller sub-word patterns extracted from the four words; thus, when four words or the corresponding four pseudoword patterns are presented, overall 66 the same neuronal populations are being stimulated. However, the wave of activity generated by each pseudoword produces, overall, much less (and delayed) activation, particularly in the central areas. In what follows, we explain the neuronal mechanisms underlying these different responses. As pointed out in the previous Chapter, words CAs behave as discrete, non-linear, “all-or-nothing” functional units which, if stimulated above threshold, become fully active (see Sec. 3.2.3). What happens when a pseudoword is presented as input to the auditory area of the network? Recall that a pseudoword pattern consists of a combination of different subparts of the four word patterns. Hence, upon presentation of a pseudoword, the cells belonging to the four different CAs that happen to be present in the pseudoword are activated in Area 1. Thus, all four word CAs (see Fig. 4.2) are simultaneously (but partially) stimulated, and activity starts to reverberate in their circuits. However, due to the presence of non-specific (and local) inhibition mechanisms, the different CAs simultaneously activated start to inhibit each other, in a “winner-takes-all” manner (refer to Sec. 1.4 and 1.5). This transient period of competition surfaces in the graphs plotted in Fig. 4.1; in particular, the pseudoword curve (in blue) is “s” shaped, i.e., it exhibits a rapid change of convexity that starts to appear at around 5 simulation time-steps after stimulus onset. This effect is due to the fact that, during that period, several co-activated CAs are competing, “pushing” each other down and causing a temporary reduction in (or a reduced rate of increase of) the total network output. Subsequently to this transient competition, the most strongly active CA emerges as a “winner” and continues, for some time, to increase and feed on its internal activity (see Fig. 4.2). However, this process stops (on average) well before the CA has reached full activation (compare with Fig. 3.13). This is due to the initial period of competition, during which the CAs inhibit each other, with the result that the activity flow is delayed and global inhibition acts as a “break”. After peaking, activation plateaus and reverberates within the CA circuits for a few time steps, until the dispersion of activation eventually leads to the CA switching “off” (at ~ 40 steps after stimulus offset). The initial competition between the four CAs also explains the delay in the activation peak of the response to pseudowords: the words curve peaks earlier as a word activates just one CA, and the competing CAs simultaneously stimulated by a pseudoword (which would act as sources of inhibition and alter and delay the normal 67 course of CA activation) remain silent in the case of words (see Fig. 3.13). The above discussion highlights the crucial role that the area-specific (global) inhibition, implementing here the “item level” type of competition between lexical representations (see Sec. 1.5), plays in the network activation dynamics. The question of how, exactly, the strength of the non-specific inhibition (the model correlate of the amount of attentional resources) affects the observed simulation results was addressed in the last set of experiments, Experiment Set 4. 4.2 Experiment Set 4 – Modelling effects of Lexicality and Attention Having implemented a model of the left perisylvian cortex, trained it with a set of words, and shown that it could replicate the pattern of responses to words and pseudowords observed in MMN experiments, it was finally possible to simulate and predict the effects of attention on lexical processes, addressing one of the main research questions that motivated this work (see Sec. 1.1). In particular, this set of experiments aimed at using the model to replicate and explain the different patterns of neurophysiological responses observed in N400 and MMN experiments. The hypothesis was that the reverse patterns of neurophysiological data are the result of the different attentional conditions under which these responses are elicited. Consistent with the biased competition model of attention (Duncan, 2006), attention to speech was simulated by reducing the strength of the global (non-specific) feedback inhibition circuits (which corresponds to greater availability of processing resources)15, and attention away from speech by increasing it (and, thus, reducing processing capacity). 4.2.1 Experiment Set 4 – Methods As in the previous Experiments, word and pseudoword perception was simulated in the model by stimulating the auditory cortex (Area 1) of eight trained networks with well-learnt, familiar word and unknown pseudoword patterns (see Section 4.1.1). 15 As discussed in Sec. 1.5, the more available attentional resources, the more competing representations can be coactive, the less “object level” competition between lexical representations; this situation is induced in the model by reducing the strength of the area- specific inhibition circuits, or feedback inhibition (FI). 68 The network was tested under different conditions simulating different attentional loads, induced by systematically varying a single parameter in the model, namely, the strength of the FI loops (α5 in Appendix A, see also Sec. 2.2.3). Thus, we investigated the effects of attention modulation on the timing and magnitude of the responses to familiar vs. unknown speech stimuli by presenting, for four time-steps, words and pseudowords patterns to Area 1 at increasing levels of FI. We repeated the stimulation at four different levels of FI (0.90, 1.05, 1.20 and 1.25) and measured the total network activity during the following 50 time steps. 4.2.2 Experiment Set 4 – Results Figure 4.5 shows the results produced by the network when it was used to simulate brain responses to word and pseudoword stimuli under different amounts of attentional resources. The graphs plot the total network output as a function of (simulation) time. The main point to note is the difference between the top and bottom graphs. In the top graph, weak FI (high attention) produces larger responses to pseudowords than to words, with a “late” peak of the difference between the curves (at 20 simulation time-steps). In the bottom graph, strong FI (low attention) produces the opposite effect (larger responses to words than to pseudowords), with an “early” peaking difference (around 9-10 time-steps). Hence, the modulation of FI strength (or attention) in the network produces a pattern of results that reflects the experimental data discussed in the introduction (Sec. 1.1); in particular, the top graph reflects the characteristics (relative magnitude and latency) of a classical N400 response (Fig. 1.1), while the bottom graph more closely resembles the features of the MMN response (Fig. 1.2). A second important point to note is that the “swap” in the sign (and change in latency) of the word/pseudoword difference caused by the increase in FI is the result of a strong reduction in the amplitude (and change in shape) of the pseudoword (dotted) curves, and not of an increase in the amplitude of the word response (solid curves). Indeed, if anything, the maximum average amplitude of the word responses appears to be attenuated as well, going from about 45 for FI=0.90 to about 35 for FI=1.25. 69 0 20 40 60 453015 0 LOW ATTENTION ( = Strong feedback) To ta l n et w or k ac tiv at io n To ta l n et w or k ac tiv at io n 0 20 40 60 FI strength =1.05 0 20 40 60 To ta ln et w or k ac tiv at io n 0 20 40 60 To ta ln et w or k ac tiv at io n HIGH ATTENTION ( = Weak feedback) FI strength =0.90 FI strength =1.20 FI strength =1.25 Simulation time-step Figure 4.5. See caption on next page. 70 Figure 4.5 (previous page). Network simulations of brain response to word (solid lines) and pseudoword (dotted lines) stimuli under different amounts of attentional resources (FI strength). The total network activation (in abscissa) is computed as the sum of the output values of all the E-cells of the network at a specific time point. Responses are averaged across eight different networks (vertical bars are SEM). The “auditory” stimulation pattern was present only until t=4. Increasing levels of FI strength simulated decreasing amounts of attentional resources available. 4.3 Experiment Sets 3 & 4 – Discussion Experiment Set 3 was replicated in Experiment Set 4 (compare Fig. 4.1 with the graph obtained for FI = 1.20 in Fig. 4.5; Experiment Set 3 used FI = 1.23). These results demonstrate the ability of the network to replicate the lexicality effects on the neurophysiological responses to spoken items documented in a number of MMN studies in which subjects’ attention was directed away from speech (Korpilahti, Krause, Holopainen, & Lang, 2001; Pettigrew et al., 2004; Pulvermüller et al., 2001; Pulvermüller & Shtyrov, 2006; Shtyrov & Pulvermüller, 2002), and allowed us to identify and explain, at the level of cortical circuits, the brain mechanisms which may be responsible for the observed effects (see Sec. 4.1.3). One point that needs clarifying for both experiment sets concerns the fact that the differences observed in the network simulations are not obtained using an oddball stimulation paradigm, which is normally required to elicit the MMN response. How can one claim to be simulating the MMN response if the network is not being stimulated using the oddball paradigm? We take the view that MMN indexes not only automatic processes of change detection but, in addition, reflects the automatic activation of memory traces (Näätänen, 2001; Pulvermüller et al., 2001; Pulvermüller & Shtyrov, 2006). According to this view, the MMN paradigm represents just one way to visualize the physiological side of memory traces. The simulations are not, indeed, aimed at directly replicating the MMN response per se, but the neural processes that underlie and govern the activation of memory traces in the cortex, and which are reflected in the MMN. The simulations predict that these mechanisms are such that words>pseudowords difference should become significant early in the response, and that this should always happen, if subjects are distracted. 71 Experiment Set 4 shows that variation of the amount of area-specific feedback inhibition (FI) of the network modulates the relative magnitude and latency of the simulated brain responses to words and pseudowords. More precisely, weak FI (corresponding to high attention and excitability) produced – on average – late activation differences, with a stronger response to pseudowords than to words. In contrast, strong FI, simulating suppression and a lack of attentional resources, lead to early activation differences, with a stronger response to words than to pseudowords. Thus, the network behaviour replicates the divergent neurophysiological data presented in Section 1.1 (see Fig. 4.6 below), as the N400 response presents a late (around 400ms) difference, with relatively larger responses to pseudowords, while the MMN exhibits an early (100-250ms) difference, with larger responses to words. We shall now explain the underlying mechanisms that make the neural network respond in this particular way. Figure 4.6 Real and simulated N400 and MMN brain responses. (A): Typical N400 response to spoken words and pseudowords (from Fig. 1.1). Note the larger N400 amplitude to pseudowords. (B): Magnetic Mismatch Negativity (MMN) response to words and pseudowords (adapted from (Pulvermüller et al., 2001, their Fig. 4)). Note the larger MMN amplitude to words. (C-D): Simulated brain responses to word and pseudoword stimuli under different amounts of attention (from Fig. 4.5). Left: FI=0.90; right: FI=1.20. 72 4.3.1 Explaining the Influences of Lexicality and Attention The network behaviour during the first 8-10 time steps is analogous to that observed in Experiment Set 3 (see Fig. 4.1). As described, when a pseudoword is presented to Area 1, the four CAs are simultaneously (but partially) stimulated, and, as they gather activation, they begin to inhibit each other. What happens afterwards depends entirely on the strength of the FI loop. In case of weak FI, there is weak competition between the CAs; thus, the activity in the maximally active CA is not significantly affected by the activity of the other CAs (indeed, the “wobble” in the pseudoword curve is barely noticeable when FI=0.90). Hence, as exemplified by Fig. 4.2, after a brief period of competition, the “winning” CA will resume its progress16 towards full activation, reached at around 20 simulation time-steps. Unlike in Experiment Set 3, however, here the CA becomes fully active (“on”), as the weak global inhibition is not sufficient (on average) to prevent it from reaching activation threshold. Nevertheless, albeit brief, the transient period of competition still affects the spreading of activation within the CA, making it peak later than it would have if it had been stimulated in isolation. Simultaneously, activity in the other CAs is suppressed; due to the presence of strong self-excitatory loops within the CAs circuits and the weak FI, however, this activity does not immediately disappear, but continues to reverberate and is still present in one (or more) non- winning CAs when the winner CA reaches full activation. At that point, the total network output is the result of the activity of the maximally active CA (at its peak) plus the residual activation in the other CAs. This makes the peak of the total network response to a pseudoword larger than that to a word: all the rest being equal, the total activation due to one fully active CA is (on average) smaller than the total activation due to one fully active CA plus one or more partially active CAs. The possible psycholinguistic correlate of this computational process may be the activation of several neighbours of a stimulus pseudoword. Let us now consider the case of strong FI (this was the case in Experiment Set 3). If the level of FI is sufficiently high, the co-activated CAs inhibit each other so strongly that they will be prevented from entering the unstable positive-feedback state that 16 This now takes place in complete absence of the input stimulus, which lasts only 4 steps. 73 leads to their full activation. As a result, the total network response to a pseudoword, consisting of the sum of the activities produced by only partially active CAs, remains (on average) below the total response to a word (as exemplified by Fig. 4.2). While attention modulation induced a large variation in the amplitude of the pseudoword curves, word responses do not appear to be significantly affected by attention. At the basis of this phenomenon are the strong and reciprocal connections that form the word CAs. As mentioned in Sec. 3.2.3, such positive-feedback circuits produce a non-linear behaviour in the CAs, such that, when activation threshold is reached, the CA ignition is largely independent of the level of attention/inhibition. As Hebb wrote, when igniting the cell assembly is “acting briefly as a closed functional system” (Hebb, 1949, p. xix). This functional discreteness explains the relative stability of the responses to words under variable inhibition. In contrast, as pseudowords activate several CAs but only partially, the reduced (below threshold) activity is strongly dependent on inhibition level, extinguishing under low attention and resuming full activation if FI is low. The possible psycholinguistic correlates of these processes may be, in the case of a pseudoword stimulus, the lack of recognition of any lexical item under distraction, and, in the case of words, the ability to automatically recognize and respond to familiar items even when heavily distracted. An example of this phenomenon, known as attentional capture (or “cocktail party”) effect, is our ability to automatically detect the sound of our own name even under conditions of inattention (Moray, 1959; Wood & Cowan, 1995). 4.3.2 Fit of model predictions and neurophysiological data The model simulates the cortical sources that generate electric potentials and magnetic fields at the surface of the head. Therefore, strictly speaking, the predictions and explanations apply at the level of brain activation, not at that of event-related potentials and fields (ERP/Fs). However, the differential activation to words and pseudowords revealed by ERP/Fs is also manifest at the level of sources localised in the perisylvian region (e.g., Hauk, Davis, Ford, Pulvermüller & Marslen-Wilson (2006); Pulvermüller et al. (2001)). Thus, larger (smaller) words/pseudowords responses or ERP/Fs are assumed to be generated by correspondingly larger (smaller) 74 underlying sources. This assumption is supported by experimental evidence reported in Chapter 5. Furthermore, other works have adopted the same approach and successfully modelled EEG/MEG signals as the average depolarisation of pyramidal cells (e.g., David & Friston (2003)). Interestingly, the time course of the simulated peak differences between word and pseudoword responses roughly reflects the one exhibited by experimental data. In fact, in the model, early differences (see Figure 4.5, bottom graph) peak at around 7-8 time-steps after stimulus onset (which is at step 2 in all cases), while the late differences peak at 18 time-steps after stimulus onset (Fig. 4.5, top graph). If we assume that the MMN response peaks at about 120ms after stimulus onset, one ∆t in the simulation corresponds to 120/7≈17ms, and the simulations predict a late peak (in presence of attention) at around 18*17ms = 306ms. If, on the other hand, we work from the assumption that the N400 response peaks at 400ms, then one ∆t corresponds to 400/18ms ≈ 22ms, and the simulations predict an early peak (when attention is directed away) at around 7*22ms = 154ms. Although these calculations should be taken with caution as they are the result of simple extrapolations, they do provide some evidence for the ability of the model to make predictions of the correct order of magnitude on the spatio-temporal patterns of cortical activation. In view of the above, one simulation time-step ∆t can be considered to correspond approximately to 20ms. 4.3 Summary and main contributions Chapter 2 described the implementation of a neuroanatomically grounded neural- network model of the left-perisylvian language cortex, and its use to simulate brain processes of early language learning. Chapter 3 described the formation of sets of strongly interconnected circuits across cortical areas in the network, which we referred to as cell assemblies. Building on these results, this Chapter simulated activation of the language cortex when meaningful familiar words (learnt patterns) and senseless unknown pseudowords are presented as input under different amounts of attention. The model simulations replicate both MMN and N400 brain responses to words and pseudowords, typically observed under different experimental conditions, suggesting that these opposite results can be explained by the modulatory effects of attention on the cortical responses to pseudoword (and not to word) stimuli. The main original contributions of the work described in this chapter are the following: (1) the 75 model is the first one to reconcile and mechanistically explain, at the cortical-circuit level and by means of a single set of neurobiological principles, existing experimental results previously not well-understood; (2) the model points to the level of area- specific feedback inhibition as a basis for the brain mechanisms of attention, and makes strong predictions on how and why this cognitive process modulates the magnitude of event-related brain responses to speech stimuli. In particular, according to the simulation results, attention modulation should be able to bring out both types of responses (N400 and MMN, i.e., words up vs. pseudowords up) in the same experiment. In other words, attention modulation should make the MMN bigger to words when subjects’ attention is directed away from speech, but produce the reversed effect (MMN larger to pseudowords) when subjects are paying attention to the – same – speech stimuli. Crucially, the model also predicts that the amount of attentional resources available should significantly modulate the brain responses to pseudowords, but not to words, which should be relatively unaffected by changes in attention. The experimental testing of these critical predictions is the object of the next Chapter. 76 Chapter 5 – Neurophysiology of Attention and Language interactions: an MEG study This Chapter describes the use of magneto-encephalography (MEG) techniques to test the novel predictions of the model of the language cortex that were generated by the simulations described in Chapter 3 and Chapter 4. 5.1 Introduction The network simulations presented in Chapter 4 explain the opposite neurophysiological activation patterns to words and pseudowords seen in N400 and MMN experiments. The explanation rests on the fact that words activate discrete cell assemblies whose strong internal connections guarantee that activation is largely independent of external inhibition level (Hebb, 1949; Pulvermüller, 1999). Pseudoword stimuli, in contrast, activate several competing representations and global inhibition determines the degree to which their activations may co-exist: with attention to stimuli, the model response is therefore larger to pseudowords than to words, but under limited attentional resources (stronger inhibition) pseudoword responses are reduced below the level of word responses (see Fig. 4.6). Although the model provides a tentative explanation of N400 and MMN results, it attributes the difference to a single factor (attention), and it is this statement that needs testing in new critical neurophysiological experiments. Comparing typical tasks used to record the N400 and the passive oddball paradigm, where the lexical MMN enhancement is seen, there are differences in memory requirements, lexico-semantic processing, context processing, variability and repetition of stimuli and, of course, attentional demands which make it impossible to attribute with certainty neurophysiological differences to a single psychological variable. Here, we used MEG to test the predictions of the model, namely, that keeping all other features constant, focussed attention to speech is the critical variable leading to the reversal of the neurophysiological lexicality effect. A second prediction was that such inversion 77 is mainly produced by the (strong) modulation of the pseudoword response, whereas the word response stays relatively stable (see Fig. 4.5). In order to administer this critical experiment, we used variants of the oddball task. To precisely control for stimuli properties, we applied an orthogonal design where the same sounds were played in word and pseudoword contexts. In addition, attention was also varied orthogonally, so that, for each lexical context, the same sounds were processed while attention was either directed (1) to speech, or (2) away from speech. 5.2 Materials and Methods 5.2.1 Subjects Twenty four healthy right-handed (Oldfield, 1971) monolingual native speakers of English (9 women) aged 20-41 years participated in all parts of the experiment. They had no record of neurological diseases, vision or hearing problems, and reported no history of drug abuse. All subjects gave their written informed consent to participate in the experiment and were paid for their participation. The experiments were performed in accordance with the Helsinki Declaration. Ethics approval had been issued by the Cambridge Psychology Research Ethics Committee (CPREC). 5.2.2 Design The processing of spoken words and pseudowords was studied in two tasks carried out in separate sessions, referred to as “Attend” and “Ignore” sessions (or conditions). Attention was manipulated in the two sessions by instructing subjects to either focus completely on the auditory stimuli (Attend condition) or on a silent video (Ignore condition). The auditory stimuli were identical across the two sessions; each session consisted of two blocks; block and session order was counterbalanced across subjects. As clarified by Table 5.1, we adopted an orthogonal design: across the two blocks, lexicality and acoustic-phonetic features of the auditory stimuli were varied independently of each other (see details below). 78 5.2.3 Instructions Subjects were seated in front of a screen on which the silent film was being projected; during the recording, acoustic stimuli were delivered binaurally to the subjects. In the Ignore session, subjects were asked to ignore the auditory stimuli and concentrate on the video; they were made aware that at the end of the recording they would be given a test on the contents of the movie to assess whether they had paid attention to the video. In the Attend session, subjects had to focus their attention on the acoustic stimuli and react to some of them by pressing a button with their left index finger; they were asked to ignore the movie but not close their eyes. In order to become familiar with this task, subjects were given a 15-minute training prior to the beginning of the recording. 5.2.4 Tests Perceptual and cognitive properties of the stimuli which could, in principle, affect neurophysiological activity and confound the results were assessed through a questionnaire posed at the end of the second session. All subjects rated (1) whether they could easily understand the recording, (2) whether they would consider the stimuli to be frequently used in everyday language, (3) whether the stimuli made sense, (4) whether they reminded subjects of an action they could perform themselves, (5) whether the stimuli were imageable, and (6) whether they reminded them of bodily sensations. At the end of each session, subjects were asked to rate (on a scale from 1 to 7) the amount of attention that they had paid to the sounds and silent video during the session, and had to answer 10 multiple-choice questions on the contents of the film. 5.2.5 Stimuli preparation and delivery Digital recordings (sampling rate 44.1 kHz) of a large sample of the items [baj], [paj], [hajp], [hajt], [hajk] and *[hajg] spoken in random order by a female native English speaker were acquired in a soundproof room. From this set we chose a pair of CV syllables [baj] and [paj] and extracted the syllable-final phonemes [p], [t], [k] and [g]. The full set of stimuli used in the experiment (including the two critical words [bajt] (bite) and [pajp] (pipe) and pseudowords *[bajp] and *[pajt]) were obtained by cross- splicing the same recordings of the coda consonants [p], [t], [k], [g] onto both CV 79 syllables [baj], [paj] (see Table 5.1 and Fig. 5.1). This avoided differential coarticulation cues and minimized acoustic differences between the stimuli. The two chosen CV syllables had the same F0 frequency (272Hz), and were carefully adjusted to have equal duration (330ms) and average sound energy (root- mean-square (RMS) power; −9.4dB). The chosen samples of the critical phonemes [p], [t] had the same length (75ms) and similar envelopes; their amplitudes were also normalized to match for averaged RMS power (−36.6dB). The silent closure time between CV end and onset of the plosion of the final stop consonant was adjusted to a value typical for English unvoiced (80ms) and voiced (30ms) stops. The [k] and [g] plosions were also presented after an exceptionally long closure time (230ms and 180ms, respectively), a phenomenon occurring naturally in the geminate stops of some languages (e.g., Finnish, Italian). The pseudowords containing such “artificial” geminates were used as target stimuli in the Attend condition; this was intended to make the detection of targets more challenging for the monolingual native English speakers. Block A Block B Context For the analysis and generation of the acoustic stimuli, we used the CoolEdit 2000 program (Syntrillium Software Corp., AZ). The stimuli were delivered at a [bajp] pseudoword 0 622 [pajp] word 22 605 [bajt] word 18 2601 [pajt] pseudoword 0 2558 [t] Coda CV C [baj] [paj] [p] Table 5.1. Orthogonal variation of acoustic-phonetic features and lexicality across blocks for the four critical items. Numbers indicate word (left) and trigram (right) frequency (per million) for that item (CELEX Lexical Database (Baayen, Piepenbrock, & van Rijn, 1993)). 80 comfortable hearing level through plastic tubing attached to foam earplugs using the MEG Etymotic system, based on ER·3A insert earphones (Etymotic Research, Inc., IL). The delivery was controlled by a personal computer running E-prime software (Psychology Software Tools, Inc., Pittsburgh, PA). Bl oc k B DEV1 DEV2 Bl oc k A STD 80ms [baj] [bajt] *[bajp] 330ms 485ms [paj] 80ms 485ms [pajp] 1.0s STD DEV3 STD DEV1 STD DEV2 STD DEV5 STD DEV2 …… time *[pajt] Figure 5.1. Stimulation paradigm and stimuli of interest. Top: schematic illustration of the oddball design used for the presentation of the auditory stimuli (STD = standard, DEV = deviant stimuli; horizontal axis represents time). Bottom: waveforms of the standard and deviant stimuli of interest, with respective durations and phonetic representation. 5.2.6 Procedures The auditory stimuli were delivered using an oddball design. The stimulus onset asynchrony between two consecutive items was 1000ms. Conforming to Näätänen and colleagues’ optimal paradigm (Näätänen, Pakarinen, Rinne, & Takegata, 2004), the frequently-occurring standard stimulus (STD) constituted 55% of a block sequence; four different deviant stimuli (DEV1-4), each with 10% frequency, were randomly presented in alternation with the standard (see Figure 5.2, top). A fifth deviant stimulus (DEV5) filled the remaining 5% of the sequence: this was one of the two possible targets that the subjects had been instructed to respond to (each 2.5% frequency). Each block sequence contained 1920 stimuli in total, providing 32 minutes of auditory stimulation. 81 During each session recorded in the Attend condition, subjects were provided online feedback on their performance (hit rate and number of false alarms) at four different times (in the middle and at the end of each of the two blocks) to ensure their attention to the stimuli, at which point auditory and visual stimulation was temporarily suspended. In the Ignore condition sessions, auditory and visual stimulation was also suspended briefly at the same time points (during which the condition of the subjects was assessed). 5.2.7 MEG Recording Throughout the experiment, the brain’s magnetic activity was continuously recorded using a 306-channel Vectorview MEG system (Elekta Neuromag, Helsinki, FI) with passband 0.10–330 Hz and 1KHz sampling rate. To enable the removal of artifacts introduced by head movements, the position of the subject’s head with respect to the recording device was tracked throughout the session. In order to do so, magnetic coils were attached to the head and their position (with respect to a system of reference determined by three standard points: nasion, left and right pre-auricular) was digitized using the Polhemus Isotrak digital tracker system (Polhemus, Colchester, VT). To allow the off-line reconstruction of the head model, an additional set of points randomly distributed over the scalp was also digitized. During the recording, the position of the magnetic coils was continuously tracked (continuous HPI, 5Hz sampling rate), providing information on the exact position of the head in the dewar. 5.2.8 MEG Data Processing For each subject, MEG channel, block and condition, we applied the following preprocessing steps: (a) The continuous raw data from the 306 channels where pre-processed off-line using MaxFilterTM software (Elekta Neuromag, Helsinki), which minimises possible effects of magnetic sources outside the head as well as sensor artifacts using a Signal Space Separation method (Taulu & Kajola, 2005; Taulu, Kajola, & Simola, 2004). MaxFilter was applied with spatio-temporal filtering and head-movement compensation, which corrected for within-block motion artifacts. 82 (b) Using the MNE Suite (Martinos Center for Biomedical Imaging, Charlestown, MA), stimulus-triggered event-related fields (ERFs) starting at 100ms before stimulus onset and ending 500ms after offset were computed from the MaxFiltered data for each stimulus of interest ([baj], [paj], [bajt], *[bajp], *[pajt], [pajp]). Epochs containing gradiometer, magnetometer or EOG peak-to-peak amplitudes larger than 3000fT/cm, 6500fT or 150µV, respectively, were rejected. Only ERFs with a minimum of 100 accepted trials were used (this led to the exclusion of four subjects). The responses to the (deviant) stimuli ending in [k] or [g] were excluded from the analysis because of their acoustic similarity to the target stimuli. (c) In each block, the magnetic MMNs were obtained by subtracting the averaged response to the CV sound presented as standard stimulus from that to the CVC deviant stimuli: in block A, the ERF to the standard [baj] was subtracted from the ERFs to the deviants [bajt] and *[bajp]; similarly, in block B, [paj] was subtracted from *[pajt] and [pajp]. (d) The resulting magnetic MMN and standard curves were detrended, filtered on 2– 20 Hz and baseline-corrected. For the MMN responses, the baseline used was the 80ms silent closure period preceding the onset of the plosion of the syllable-final (coda) stop consonant (point at which standard and deviant stimuli differed for the first time – see Fig. 5.1); this time interval (330 to 410 ms after standard stimulus onset) will below be referred to as “pre-coda baseline”. For the responses to the standard CV stimuli, the 100 ms preceding stimulus onset were used as baseline (“pre-stimulus baseline”). (e) The amplitude of the local magnetic gradient response was calculated for each local pair of orthogonal gradiometers as the square-root of the summed squares (SRS) of their amplitudes. The resulting SRS data were used in the statistical analysis and for producing grand-average data. Matlab 6.5 programming environment (Matlab 6.5 – MathWorks, Boston, MA) was used for preprocessing steps (c)-(e). Finally, in order to estimate the cortical sources underlying the magnetic MMN, we applied a minimum-norm current estimation (MCE) technique (Hämäläinen, Hari, Ilmoniemi, Knuutila, & Lounasmaa, 1993; Ilmoniemi, 1993), L1 MCE (Uutela, Hämäläinen, & Somersalo, 1999), which minimizes the sum of the rectified current amplitudes over the whole brain, and previously has been shown to produce a realistic 83 and robust set of generators in experiments on spoken language processing (Pulvermüller, Shtyrov, & Ilmoniemi, 2003, , 2005). Using the MCE Matlab toolbox (Elekta Neuromag, Helsinki), MCEs were calculated for the across-subject averaged MMN responses for each Stimulus type (word or pseudoword), Condition and time point (in 20-millisecond time-steps), and projected on a triangularized gray matter surface of an averaged brain (Uutela, Hämäläinen, & Somersalo, 1999). 5.2.9 Statistical Analysis Statistical analyses were performed on local magnetic gradient responses. Using the maximal local SRS of the standard responses in the Ignore condition, we computed signal-to-noise ratios (SNR) as the ratio between the peak in the 0–150ms interval post stimulus onset and the peak in the pre-stimulus baseline. Only datasets with SNR larger than 5 were included in further analyses. Loci with the largest MMN gradient vector amplitudes were entered in the analyses. These were located above the left hemisphere’s temporal and fronto-central areas (see Sec. 5.3). For each locus, the averages of the local SRS of the magnetic MMN were computed for the 60-ms window around the peak of the maximal local SRS response. To ascertain the effects of attention on the brain responses to lexical items, we also computed the average local SRS of the ERFs to the standard stimuli in the two conditions during six different time windows: pre-stimulus baseline (-100–0ms), pre- coda baseline (330–410ms), the 80-ms window 500–580ms centred around the MMN main peak, and three additional windows centred at the times at which the standard responses displayed three prominent peaks (see Sec. 5.3, Results). Window widths were adjusted to the width of the half maximum of the respective peak (30, 40 and 60 ms). The time-averaged SRS values obtained from each of the critical recording locations, subjects, stimulus types and conditions were subjected to repeated- measures analyses of variance (ANOVAs). ANOVA tests with the factors Attention (Attend vs. Ignore), Lexicality (word vs. pseudoword), Stimulus (coda [p] vs. [t]) and Region-of-Interest (ROI, further split into “Anterior-Posterior” and “Lateral-Central” factors, with two and up to four levels, respectively) were computed on the data extracted from the MMN curves; additional ANOVAs with the factors Attention, Stimulus ([baj] vs. [paj]) and ROI were calculated on the local SRS extracted from the 84 responses to the standard stimuli, one for each time window of interest. Significant interactions were investigated further using additional t tests for planned comparisons. 5.3 Results 5.3.1 Behavioral ANOVA tests on the attention ratings data (Fig. 5.2) revealed a significant 2-way interaction of the factors Condition (Attend vs. Ignore) and Modality-Attended (Sound vs. Video) (F(1,15)=134.2, p<0.00001). There was also a main effect of Modality (F(1,15)=10.8, p<0.01). During the Attend condition, average hit rate was 70.2% (SE=4.3%). After the Ignore condition, on average subjects answered correctly 80.6% (SE=3.0%) of the questions about the video; percent correct answers dropped to 47.5% (SE=7.1%) after the Attend condition, confirming different levels (t(15)=5.15, p< 0.0001) of attention to the input stimuli, as expected. Figure 5.2. Average (SEM) attention ratings (1=“Absent”, 7=“Complete”) for 16 subjects. Note the significant difference in the amount of attention to Sound between the two conditions. Figure 5.3 plots the ratings of the critical stimuli that subjects provided at the end of the experiment. While the two deviant pseudowords *[bajp], *[pajt] never differed significantly between each other or from zero, the word [bajt] was judged to be more action- (t(15)=4.45, p<0.0005) and body-related (t(15)=7.69, p<0.000005) than [pajp]. Within each lexical pair, no significant differences emerged for frequency, meaningfulness, comprehensibility and imageability ratings. Although frequency might appear marginally higher for [bait] than for [pajp] (t(15)=1.706, p = 0.109, n.s.), frequencies of these words according to the CELEX psycholinguistic database 85 (Baayen, Piepenbrock, & van Rijn, 1993) show a trend in the opposite direction (18 bite- and 22 pipe-occurrences per million), not confirming the ratings. With the exception of action and bodily semantic relatedness ratings, the psycholinguistic features of the stimulus words were thus well matched. Figure 5.3. Average (SEM) ratings of critical stimuli across 16 subjects. Subjects indicated Frequency of use, Action-relatedness, Meaningfulness, Comprehensibility, Imageability, and relatedness to Body sensations. 5.3.2 MEG results Figure 5.4 plots the local magnetic gradient response as SRS of the magnetic MMN to pseudowords (blue) and words (red) in the “attend” condition for all loci (averaged across 16 subjects)17, highlighting the left perisylvian locations exhibiting largest amplitudes that were used in the statistical analysis. Figure 5.5 plots the local magnetic gradient response as SRS for standard stimuli and MMN data recorded from one of these loci. During the first 400ms responses to the two standards differed (see top graph); differences tended to disappear at times greater than 400ms. Due to the different acoustic-phonetic features of the stimuli, the MMNs to the coda [p] and [t] (see Fig. 5.5, Inset) peaked, at the locus with largest amplitudes, at 137 and 115 ms. post coda onset (on average), respectively. When grouped by condition (Fig. 5.5, bottom graph), the standard curves suggest a main effect of attention, which was investigated in the statistical analysis (see below). 86 Figure 5.4. Local magnetic gradient vector amplitude (SRS) of magnetic MMN to pseudowords (blue) and words (red) in the “attend” condition (averaged across 16 subjects; top: frontal; bottom: occipital). Each graph shows the amplitude of the local SRS in time (see text). The vertical axis indicates the coda onset time (410ms post stimulus-onset). Note the left- > right-hemisphere differences, clearest at left perisylvian loci. A three-way ANOVA with the factors Attention, Stimulus and ROI carried out on the SRS of the responses to the standard stimuli revealed a main effect of Attention already in the pre-stimulus baseline (-100–0ms), with the responses in the Attend condition larger than in the Ignore condition (Attention main effect; F(1,15)=5.91, p<0.03). An analogous effect (F(1,15)= 7.15, p<0.02) was also present in the pre-coda baseline of the MMN curves (330–410ms). As these effects emerged in the analysis of local magnetic gradient vector amplitudes after baseline correction had been performed on the data from each channel (SQUID) individually, they must be due to a stronger variability (fluctuation around the zero line) of the magnetic signals in the Attend condition. In order to test for effects of attention over and above the baseline 17 Four subjects did not fulfil the SNR criterion (see Methods) and were therefore 87 fluctuation, we subtracted the (time-averaged) local SRS value in the pre-stimulus baseline (-100–0) from the (time-averaged) local SRS of the responses to the standards at time windows 58–88, 93–133, 156–216, 330–410 (pre-coda baseline) and 500–580 (MMN main peak) ms after stimulus onset. Figure 5.5. Local magnetic gradient amplitude (SRS) of standard stimuli and magnetic MMN (averaged across 16 subjects) at a representative location. Top graph: responses to the standard stimuli [baj], [paj] (averaged across conditions); note the absence of differences during the MMN main-peak window (120–150ms post coda-onset). Inset (top-right): magnetic MMN of the four deviant stimuli, grouped by coda stimulus ([p] or [t]). Note the delay between the early peaks of the two curves, at approximately 60-90 and 120-150 ms post coda onset. Bottom graph: standard responses grouped by Condition (collapsing [baj] and [paj]); note the divergence of the two curves, particularly evident at time ~150-200ms (third peak). discarded. 88 Three-way ANOVAs (Attention x Stimulus x ROI) on the corrected standard magnetic field gradients revealed a significant interaction of these three factors (Table 5.2, top) in the 156–216 ms interval only (third peak of the standard responses in Fig. 5.5) with greater attention effects for [baj] than for [paj] (between conditions) at loci exhibiting larger signals. Time Effect F (degr. freedom) ε p remark Standard Peak III ([156, 216] ms post stim. onset) AP LC AP * LC AP * BP AP * LC * BP ATT * LC ATT * LC * BP ATT * AP * LC * BP F(1, 15)=37.8 F(3, 45)=32.7 F(3, 45)=15.0 F(1, 15)=10.5 F(3, 45)=5.62 F(3, 45)=3.41 F(3, 45)=4.15 F(3, 45)=3.02 1.00 .526 .762 1.00 .672 .648 .747 .781 p < .001 p < .001 p < .001 p < .01 p < .01 p < .05 p < .02 p < .04 see Fig. 5.5, Bottom plot MMN Main Peak (~[100,150] ms post coda-onset) AP LC LEX AP * LEX LC * LEX ATT * LEX AP * PT * ATT AP * PT * LEX AP * LC * PT * LEX AP * PT * ATT * LEX F(1, 15)=12.3 F(3, 45)=18.1 F(1, 15)=4.84 F(1, 15)=6.87 F(3, 45)=6.96 F(1, 15)=5.36 F(1, 15)=10.6 F(1, 15)=15.5 F(3, 45)=3.33 F(1, 15)=6.48 1.00 .577 1.00 1.00 .560 1.00 1.00 1.00 .715 1.00 p < .005 p < .001 p < .05 p < .02 p < .007 p < .04 p < .006 p < .002 p < .03 p < .03 see Fig. 5.6 Table 5.2. Statistical results: local magnetic gradient vector strengths at 8 high-amplitude loci (see Fig. 5). Legend: ATT=Attention; LEX = Lexicality; PT=coda Stimulus ([p], [t]); BP=CV Stimulus ([baj],[paj]); AP=anterior-posterior; LC=laterality; ε=Greenhouse- Geisser’s epsilon (p was corrected if Mauchly’s test indicated non-spherical data). 89 No significant effects of attention emerged in the other intervals considered. A similar correction was done on the MMN data by subtracting the pre-coda baseline from the MMN, which left all critical effects reported below unchanged. Figure 5.6. Local SRS of magnetic MMN to words ([bajt], [pajp]) and pseudowords (*[bajp], *[pajt]), averaged across 16 subjects. (A) Average of the eight loci exhibiting largest responses (refer to Fig. 5). (B) Average of the four superior (dorsal) high-amplitude locations. The bar plots on the right show the respective average (SEM) values during the 60ms interval around the peak. Note the larger peak of the MMN to pseudowords than to words in the Attend condition and the opposite pattern (words > pseudowords) emerging in the Ignore condition. (C) Responses predicted by the neural-network model simulations (Fig. 4.6.(C-D)). Solid lines: Attend; dotted lines: Ignore. Red: words; blue: pseudowords. Statistical analysis of the magnetic MMN revealed a significant interaction between Lexicality and Attention. In particular, a four-way ANOVA (Attention x Lexicality x 90 Stimulus x ROI) was performed on the data extracted from the MMN curves for the two quadruplets of high-amplitude loci (see Fig. 5.4) in the left hemisphere. The results are reported in Table 5.2 (lower half), and plotted in Figure 5.6. Figure 5.6.(A) plots the local SRS of the magnetic MMN at the eight high-amplitude locations, illustrating the Attention-by-Lexicality interaction. Additional tests confirmed that in the Attend condition, the peak of the magnetic MMN was larger to pseudowords than that to words (simple effect of Lexicality; t(15)=2.43, p<0.02). Interestingly, these dynamics were largely due to a modulation of the pseudoword response (Attention simple effect; t(15)=2.39, p<0.02), whereas the magnetic MMN to words did not differ significantly between Attend and Ignore (t(15)=1.02, p>0.1; n.s.). When analysing the superior and inferior quadruplets of the eight critical loci separately, the interaction of Attention and Lexicality was confirmed (superior quadruplet: F(1,15)=4.58, p<0.05; inferior quadruplet: F(1,15)=5.06, p<0.04) with stronger MMN gradient responses to pseudowords than words in the attend condition and, in the superior quadruplet only, stronger word than pseudoword responses in the Ignore condition (simple effect of Lexicality; t(15)=1.91, p<0.04) (Fig. 5.6.(B)). Responses were generally larger at anterior and lateral loci, and to pseudowords than to words (see Table 5.2). There was also an interaction of ROI (anterior-posterior), Stimulus, Attention, and Lexicality, due to the pseudoword-word differences in the Attend condition being most pronounced at anterior loci for the coda [t], and the differences for the [p] being equally large across anterior and posterior locations. Later time intervals revealed a significant Attention-by-Lexicality interaction at 250– 300ms post coda onset (F(1,15)=4.93, p<0.05), with larger magnetic gradient to pseudowords than to words in the Attend condition (as for the earlier time window). At times 300–400ms, a main effect of Attention (F(1,15)=10.1, p<0.01) was found. Source strengths calculated for a Region of Interest centred at the left posterior- superior sylvian fissure (radii: x=30mm, y=30mm, z=25mm) once again confirmed stronger pseudoword sources than those underlying words when attention was directed to speech, and the reverse pattern when ignoring speech (see Figure 5.7). 91 Figure 5.7. Cortical sources underlying magnetic MMN in the left hemisphere for words and pseudowords (averaged across 16 subjects). Left: sources distribution and average intensity during MMN peak (130-150ms post coda onset). Right: sum of all source strengths within the Region of Interest including posterior perisylvian cortical areas at t=140ms (red: words; blue: pseudowords). 5.3 Discussion Attention changed the neurophysiological response to spoken words and pseudowords in different ways. Whereas neuromagnetic responses were larger to attended pseudowords than to unattended pseudowords, brain processes induced by spoken words only showed minimal changes with attention. This result confirms the predictions of the model (see Fig. 4.5). Larger responses to words than to pseudowords in the Ignore condition replicates the previously documented dynamics of the MMN in the passive oddball paradigm (Endrass, Mohr, & Pulvermüller, 2004; Korpilahti, Krause, Holopainen, & Lang, 2001; Kujala et al., 2002; Näätänen, 2001; Pettigrew et al., 2004; Pulvermüller, 2001; Pulvermüller et al., 2001; Pulvermüller & Shtyrov, 2006; Pulvermüller, Shtyrov, Kujala, & Näätänen, 2004; Shtyrov, Pihko, & Pulvermüller, 2005; Shtyrov & Pulvermüller, 2002). The reverse effect in the Attend condition (larger responses to pseudowords than to words), a strong prediction of the 92 model that could not be foreseen on the basis of the above MMN studies, resembles the pattern seen in the N400 component and its magnetic correlate (Halgren et al., 2002; Holcomb & Neville, 1990; Maess, Herrmann, Hahne, Nakamura, & Friederici, 2006; Pulvermüller et al., 1996), which usually emerges when subjects attend to words. These previously unexplained reverse dynamics of N400 and MMN to familiar and unfamiliar stimuli can now be attributed to a single psychological variable, the locus of attention. The explanation of these results is based on the simulations obtained in Chapters 3 and 4: the responses to familiar words exhibit relative stability under different attentional load as the strong connections that form the cortical circuits (CAs) representing words ensure that the (non-linear) activation spreading within them is largely unaffected by the level of competition (attentional resources). On the other hand, responses to unfamiliar, unrepresented linguistic items (pseudowords) show strong attention dependence, explained by the different degrees of competition (induced by the different amounts of available attentional resources) between the multiple memory circuits activated by a non-matching stimulus. In sum, the discreteness of processing in learned neuronal circuits and the absence of corresponding discrete circuits for unfamiliar items together explain the differential effects of attention on word and pseudoword brain responses observed in the present study. We note that attention effects on standard stimuli were present only at times greater than 150ms after stimulus onset. This is in line with reports on visual object processing that attention effects in MEG responses to faces and houses emerged at post stimulus-onset latencies larger than 170ms (Furey et al., 2006). However, significant effects of attention on the magnetic correlate of the Mismatch Negativity, MMN, to pseudowords – but not words – were seen already at ~100-150 ms after the relevant acoustic change (onset of plosion of [p] or [t]) was present in the input. This contradicts earlier claims that the MMN is largely independent of attention for the specific case of pseudowords, but confirms this statement for words, for which a memory circuit has been set up in the brain (see (Näätänen, 2001)). The model predicts that a similar difference will emerge for spectrotemporally rich unfamiliar sounds and matched learned sounds for which a memory circuit has been set up. The explanation lies in the nature of the underlying neuronal memory trace activated, 93 which appears to be both distributed and discrete. Previous research documenting a reduced MMN to unfamiliar complex sounds compared with familiar ones so far partly support this suggestion (Näätänen et al., 1997; Schröger, Näätänen, & Paavilainen, 1992). The results exhibit larger MMN responses to pseudowords than to words in the Attend condition at around ~130ms and in the 250–300ms interval post coda onset. As the N400 is usually computed from word onset, which here started 410ms before the coda, our effects emerge between ~540-710ms after stimulus onset. This time range is later than that typically reported for the N400 component; however, such increased latency may be due to the absence of co-articulation effects in our stimuli: indeed, had information about subsequent phonemes been present in earlier parts of the word, the word-pseudoword difference might have become manifest earlier, possibly at and even before 400ms post spoken-word onset (Holcomb & Neville, 1990). A phonetic signal detection task was used here to direct attention towards speech processing, while a video watching task was administered to direct their attention away from speech. Behavioural results were used to confirm high attention levels and to ascertain specificity of attention to one modality. However, alternative paradigms to direct attention exist. Previous research has shown that depending on the task used to direct attention and kind of stimuli presented, attention effects may be different (Cristescu & Nobre, 2008; Hohlfeld, Mierke, & Sommer, 2004; Pulvermüller, Shtyrov, Hasting, & Carlyon, 2008; Sabri et al., 2008). The phonetic task that was used drew attention to fine acoustic detail of single spoken words, while the visual task did so to aspects of the visual environment. In future studies, it will be worthwhile to examine the role of different tasks directing attention to different linguistic aspects (phonological, lexical, semantic) of the speech stimuli and observe any related neurophysiological changes. 5.4 Summary and main contributions A novel MEG experiment was administered to test the crucial predictions of the model of the language cortex implemented in Chapter 2, namely, that focussed attention to speech is the critical variable leading to the reversal of the 94 neurophysiological lexicality effect, and that such inversion is mainly produced by the modulation of the pseudoword response, whereas the word response stays relatively stable. Both predictions were confirmed by the experimental results. The original contributions of this Chapter are: (i) experimental evidence confirming the validity of the model and supporting the correctness of the theoretical account upon which it was built, and (ii) a novel MEG study and original neurophysiological data on the effects of attention on spoken language processing. 95 Chapter 6 – Summary and Conclusions The overall aim of this research was to investigate the neuronal mechanisms at the basis of language acquisition and processing, and the interactions of language and attention processes in the human brain. One of the main objectives was to shed light on the nature of knowledge representation in the brain, focussing on language: we were interested in clarifying the functional nature (discrete vs. non-discrete activation) and anatomical characteristics (local vs. distributed networks) of the cortical traces underlying lexical representations. Research in neurophysiology reveals different brain responses if the stimuli presented in input consist of either (i) familiar and meaningful units (e.g., words, faces, objects) or (ii) equivalently complex but unfamiliar, meaningless items (pseudowords, scrambled faces, imaginary objects). In the area of language research, familiar words and senseless pseudowords lead to different patterns of responses: the N400, a negative-going ERP peaking around 400ms after stimulus onset, is larger for pseudowords than for matched words. The opposite result, however (larger early brain responses to words compared with pseudowords) has also been reported, in particular, in the Mismatch Negativity MMN, an early automatic brain response elicited under distraction using an oddball stimulation paradigm. These diverging patterns of results were, until now, left unexplained by psycholinguistic accounts. The above questions were addressed here by combining neurocomputational modelling and neuroimaging (MEG) experimental methods. The results of the simulations in Chapter 3 provide proof-of-principle evidence that, as previously conjectured only at theoretical level (Braitenberg, 1978; Hebb, 1949; Pulvermüller, 1999), speech-related co-activation of neurons in IF and ST cortex can lead, in presence of Hebbian learning, to the formation of strongly connected word cell assemblies that are distributed over these areas and exhibit discrete levels of activation (“on-off”). Subsequently to the spontaneous formation of such word representations (resulting from purely biologically realistic mechanisms of synaptic 96 plasticity), the model was capable of replicating the neurophysiological effects of lexicality normally observed in MMN experiments (larger responses to words than to pseudowords). In order to account for the opposite pattern (N400) of data, the network responses were investigated under different processing conditions, obtained by modulating the strength of the non-specific (global) cortical inhibition, the model correlate of attentional load. We found that variation of the inhibition differentially modulated the simulated brain response to words and pseudowords, producing either an N400- or an MMN-like response depending on the amount of available attentional resources. In addition to providing a unifying explanatory account (at cortical level) of divergent experimental observations, the model made precise, crucial predictions on the effects of attention on the magnitude of ERPs to lexical items, which were tested in a novel MEG experiment (Chapter 5). The experimental results confirmed the model’s predictions, providing evidence in support of the neurophysiological validity of the model. The original contributions of this work are: (i) a neurobiologically realistic model of language acquisition and processing, unique with respect to the level of neuroanatomical, connectivity and neurobiological detail (Chapter 2); (ii) proof-of-principle simulation results in support of the theory according to which speech-related co-activation of neurons in IF and ST cortex lead, in the presence of (neurobiologically plausible) Hebbian learning, to the formation of word cell assemblies distributed over these areas and associating sensory-motor activation patterns (Chapter 3); (iii) a working model that explains and unifies existing experimental results that were not accounted for by current psycholinguistic theories (Chapter 4); (iv) a mechanistic explanation, at the level of cortical circuits, of how and why attention modulates the magnitude and latency of event- related brain responses to speech stimuli (Chapter 4); (v) novel MEG data on the effects of attention on spoken language processing, and (vi) experimental evidence supporting the mechanistic correctness of the theoretical account upon which this work is built (Chapter 5). 97 In particular, the results presented here provide evidence in support of the hypothesis that words, similar to other units of cognitive processing (e.g., objects, faces), are represented in the human brain as distributed and discrete action-perception circuits. Existing theoretical and computational accounts of knowledge representation in the brain explain memory either as the discrete activation of localist elements, or on the basis of fully distributed, graded-activation patterns (see Sec. 1.1). These two accounts make different predictions about the functional nature (discrete vs. graded activation) and cortical characteristics (local vs. distributed networks) of the knowledge representations in the brain: localist accounts predict local activity differences between words and pseudowords, and relative stability of brain responses to words as compared to variability of pseudoword responses with attention level; distributed theories predict widespread activity differences, but, as their linguistic representations typically lack functional discreteness, they fall short of reproducing and explaining lexicality differences as a function of attention. In view of the simulation results presented in Chapters 3 and 4, and neurophysiological evidence (Chapter 5), neither of these two approaches appears to be entirely correct. We have shown in Chapter 3 that functionally discrete and distributed action-perception circuits can emerge spontaneously in the cortex as a result of synaptic plasticity, and do not need to be assumed a priori. Overcoming the limitations and combining the advantages of localist and distributed approaches, the model implemented here predicts and explains the formation of lexical representations consisting of strongly interconnected, distributed (but anatomically distinct) cortical circuits. These circuits behave as coherent, discrete-activation units, and allow two or more lexical circuits to remain active at the same time (as in the case of pseudoword processing). The simulations in Chapter 4 showed how the discreteness of the cell assemblies predicted and explained the relative stability of lexical representation activation under different amounts of processing resources (attention); the absence of discrete processing devices for unfamiliar (non-represented) items predicted substantial attention-dependence of pseudoword brain responses. The experimental results described in Chapter 5 confirmed these predictions and provided evidence in support of the existence of discrete and distributed networks representing lexical items in the brain. 98 The model presented here, of course, is not exempt from limitations. For example, it does not account for psycholinguistics phenomena related to word frequency, similarity, or meaning; it does not model cortical areas belonging to the somatosensory speech region (see Fig. 1.4); it exhibits the learning of only a small number of sensory-motor patterns; and it incorporates only up to a certain level of neurobiological detail (e.g., different types of ion channels, neurotransmitters or synaptic receptors were not included) and neuroanatomical connectivity (some of the “jumping” connections between cortical areas were not implemented). Nevertheless, in spite of such simplifying assumptions, it was sufficiently complex to account for and mechanistically explain, at the cortical-circuit level, the cognitive and neurophysiological processes of interest. To conclude, although many issues still remain to be addressed, this work represents a first step towards a better understanding, at the level of the neuronal circuits, of the complex neurophysiological mechanisms at work during word acquisition and spoken language processing under variable attentional demands. It is hoped that this research will open up new perspectives in the theoretical and empirical investigation of high- level cognitive brain processes. 99 Appendix A This appendix presents details of the network model. Figure 2.2 displays the generic cortical area model. Our simulations use six such areas in sequence with identical structure and dynamics, and mutual connections between adjacent areas (see Fig. 2.1(b)). Each area comprises two mutually-connected layers of excitatory neurons (E) and inhibitory cells (I). Their dynamics is given by the following equations: In Eq. (1) to (4), VE and VI are the membrane potentials of the excitatory (E-) and inhibitory (I-) cells on a grid, with x = (x1, x2), 0 ≤ x1, x2 < 25 representing one cell location. We use cyclic boundary conditions. The membrane dynamics is modelled by low-pass filters with time-constants τE and τI , respectively. The φ(x,t) and φS(t) variables represent cell-intrinsic adaptation and area-specific inhibition (see Sec. 2.2.1), respectively. Their dynamics is low-pass, too, with time constants τa and τS. Time-constants and time t are in arbitrary time units. Dynamic equations are integrated using a simple Euler scheme with step size ∆t (Press et al., 1992). Excitatory cells are graded response neurons with sigmoid output functions (reflecting firing rates) ƒE(x,t). We identify ƒE[VE(x,t) – φ(x,t)] with O(x,t) as defined 100 in Eq. (2.2), Sec. 2.2.1, where the parameter φ in Eq. (2.2) corresponds now with the space and time dependent adaptation variable φ(x,t) in (1-4). As (3) shows, φ(x,t) computes a gliding average of the output firing rates of the E-cells, such that φ(x,t) gets higher the more strongly a cell is activated. φ(x,t) in turn affects the rates of the cells suppressively, acting as a cell-intrinsic dynamic threshold (see also Eq. (2.2)). The inhibitory cells (“interneurons”) are also graded response neurons, but have semi- linear rate function ƒI such that ƒI(x)=x if x>0, and ƒI(x)=0 elsewhere. Note that I-cells were not endowed with an adaptation mechanism; consistently with biology, their main task is to control the activity in the E-cell subnetwork locally. The term φS(t) in (4) is an additional slow inhibitory process (time-constant τS >> τE) that provides area-specific activity control by inhibiting all E-cells within one area in equal amounts, proportional to the total within-area activity. This has the net effect of introducing competition between functional representations (cell assemblies) distributed across cortical areas, restricting activity to the most strongly excited ones (see Sec. 1.5). The η(x,t) in (1) are further identical and independent Gaussian white noise processes N(0,1) (Kloeden & Platen, 1995) with noise amplitude σ set to 1.04. Symbols ⊗ in (1) to (4) denote spatial convolution with cyclic boundary conditions in order to avoid boundary effects (simply put, each “convolution” calculates, for each neuron x, the scalar product between its input weights – projection kernel – and its presynaptic cells’ outputs). Ranges of the connectivity kernels kFF , kFB, kREC and kINH are indicated in Fig. 2.2, Sec. 2.2.3. The inhibitory kernel kINH is identical for all I- cells, i.e., a shift-invariant 2D-Gaussian with standard-deviation 2 (lattice units, i.e., cells) and amplitude 0.295. The precise nature of the initialisation of the excitatory kernels as well as the learning rule according to which they change over time is described in the main text, Section 2.2. Inputs IFF(x,t) and IFB(x,t) in (1) are from earlier and subsequent areas, respectively. For the second to fifth area they are the fields of firing rates O(x,t) of the E-cells in the previous and subsequent area, but for the first and last areas external inputs are provided as 0/1-bit patterns (clamped input currents). Finally, the factors αi, i=1,..,5 control the relative weight of feedforward, feedback, recurrent, and fast and slow inhibitory synaptic inputs into the excitatory cells. The 101 network function does not depend crucially on the time-constants and connection weights as long as stable operation can be guaranteed. Parameters used were τE =2.5, τI =5, τa=15, τS =37, ∆t=0.5, α1=α2=α3=α4=5, α5=0.9, αa=0.026, σ =1.04. During the testing, α5 (the area-specific inhibition feedback, or FI – see Sec. 2.2.3) was varied between 0.90 and 1.25 (see Fig. 4.5). 102 Appendix B The networks and parameters used for the experiments described in Chapters 3 and 4 are the result of a phase of preliminary simulations aimed at calibrating the model’s behaviour. In these studies, a set of inter-related problems often prevented CA formation in the network: (i) CA Merging. The different CAs that developed for the four pairs of input patterns often merged together during the training, becoming, in the worst case, a single CA that responded to any of the four stimuli (see (Milner, 1996)). This problem was a symptom of the network inability to learn to “discriminate” between input patterns that produced overlapping network activations. For this type of discrimination to take place, the sets of links connecting two overlapping CAs should be gradually weakened (or at least not strengthened). (ii) CA overgrowth. During the training, if the number of cells that were strongly activated by one of the stimuli in one area exceeded a certain threshold (around 10-15% of the total number of cells in one area), an unstable positive-feedback loop developed, whereby stronger and stronger responses would follow each new presentation of a given stimulus, leading to the “overgrowth” of one of the CAs. This CA would rapidly extend and cover most of the network, causing widespread unphysiological states of saturated activation (notice that overgrowth of one CA often caused merging, and vice versa). (iii) Contact. For the binding between two co-activated patterns in Area 1 and 6 to occur, it is necessary that the two “waves” of activity produced are strong enough to reach the middle areas (3 and 4); in addition, these two waves must either (1) jointly activate a common set of E-cells, or (2) co- activate two disjoint (but loosely connected) sets of E-cells that will, as a consequence, become strongly linked. Put it simply, for CAs to develop, the two opposite waves of activity have to make “contact” with each other in the middle of the network. This did not always happen, as the way in which activity from the input areas propagated is strongly influenced by the radius of the within- and between-area projections (parameter ρ in Eq. 103 (2.5)). In particular, smaller projection sizes cause more “focussed” and stronger propagation of activity towards the middle areas; however, if the projection sizes are too small, neither of the conditions (1) or (2) above is likely to be satisfied. (iv) CA Off-switching. During the training, in some cases some CAs became “over-stable”: i.e., once activated, they would remain active even after the removal of the input stimulus; activity would last for a period of time that depended on the strengths of the links and degree of “reciprocity” existing between the E-cells that formed the CA. Although reverberation (memory) was one of the desirable property of the CAs, having over-permanent CAs activation was not. When a CA did not switch off after stimulus removal and remained active even upon arrival of a new stimulus, merging typically occurred (due to co-activation of the two CAs). To prevent this phenomenon, the arrival of a new stimulus must automatically trigger the off-switching of any currently active CA. The key parameters that determined whether this would happen were the strengths of the global and local inhibitory circuits (i.e., of the links between I-cells and E-cells of one area). If global and/or local inhibition were sufficiently strong, the incoming waves of activity induced by a new stimulus and corresponding CA produced sufficient inhibition to “disrupt” the activation of any other CA. Of course, too much inhibition prevented CAs from developing at all (as the activity in input was “filtered” by the first two areas and would not reach the middle). (v) Unbalanced CAs (a.k.a. “pre-inhibition problem”). During training, when a new stimulus was presented to the network, the level of global inhibition in all the areas had to be sufficiently low so that the incoming waves of activity from areas 1 and 6 could reach the middle. However, since the very beginning of the training some of the stimulus patterns caused a slightly higher response in the network than others (this was due to the random nature of the input patterns and network connectivity). Higher activity allowed more learning, and some CAs quickly became “stronger” than others. A stronger network response, in turn, caused more global 104 inhibition in the network after stimulus presentation, which meant that the pattern presented next was less likely to induce the formation of a CA. This further enhanced the already present differences between CAs strengths, causing an unbalance in their size, with some CAs becoming much larger and stronger than others, and some being entirely prevented from developing. Lengthening the periods of time during which no input was presented in order to allow the global inhibition to decay before the arrival of the new stimulus did not solve this problem: some CAs produced more inhibition than others, and if too much time was allowed to pass, the level of inhibition would drop so much that (i) the sudden arrival of the next input pattern would “over-excite” the network, causing CA overgrowth, or (ii) the random noise present in the network produced “spontaneous” activation of one of the CAs, causing an overlap (and consequent merging) between the spontaneously activated CA and the incoming stimulus. Some of these problems often co-occurred and had to be addressed simultaneously and using a combination of strategies, as described below. In order to address the issues of overgrowth, off-switching and unbalanced CAs, we attempted several parameter changes. First of all, we reduced the maximum strength of the synaptic weights (restricting the weight range to [0, 0.2] instead of the original [0, 1.0]) and increased the strength of the local and global inhibition (parameters α4 and α5 in Appendix A), so as to and trigger the off-switching of the previously activated CA and prevent overgrowth by limiting the total amount of activity allowed within one area at any one time. However, an increase in the overall inhibition level prevented not only overgrowth and over-permanent CAs but also CA formation, and caused unbalanced CAs. The solution that we adopted was to make the presentation of a new stimulus subject to the level of mid-area global inhibition being lower than a specific threshold. This significantly reduced the impact that the strength of the response to one pattern-pair had on the learning of the subsequent one. Notice that in order to prevent overgrowth and off-switching, this threshold could not be set arbitrarily low. Secondly, we reduced the time constant τS (see Appendix A) of the FI- cells (i.e., increased the “speed” of the global-inhibition response) by 70%, so that 105 even sudden “surges” of activity within one area would be quickly suppressed. The response of the global-inhibition cell associated to an area (i.e., the speed with which the input affected the cell’s membrane potential) was originally very slow when compared to that of normal E- and I-Cells (time constants of 2.5 and 5.0, respectively). A very slow response of the global inhibition mechanism meant that activity within an area was essentially unrestricted during the period of time in which the associated cell’s activation was still low; a fast response was required to prevent overactivation within one area, potentially leading to CA overgrowth. Thirdly, we increased the radius ρ of the within- and between-area excitatory projections (see Equation 2.5) to 15 and 19 cells, respectively. A larger radius (a) helped preventing overactivation by making activity more “dispersed”, (b) increased the probability of the “contact” conditions (1),(2) being satisfied, and (c) allowed linking of co-active cells that were normally too far apart to be bound together, increasing the general pattern completion ability of the network. On the other hand, it also meant a higher probability of overlap between patterns and of their merging into a single CA. Thus the parameter ρ had to be carefully chosen to achieve a good compromise between costs and benefits. While the above changes addressed the issues of overlearning/overgrowth, off- switching and unbalanced CAs, they left the problem of CA merging basically unsolved. In order to deal with this obstacle, first of all we randomized the order of pattern presentation during the training (a sequence of patterns that repeats always identically is likely to encourage the merging of patterns that are adjacent in the training sequence). Secondly, we reduced the density of the network connectivity (determined by parameters k and σ in Eq. (2.5)) so as to minimize the probability of CA overlap. Of course, while excessive density caused merging and overlearning, excessive sparseness might re-introduce the problems of contact or unbalanced CAs. Indeed, the projection radius ρ and density of the connectivity had to be calibrated in conjunction, as they determine the average number of synaptic links of a single E-cell: if the density increases, radius must decrease for the total number of synapses to remain constant, and vice versa. Choosing (and maintaining constant) the total number of synapses is important to avoid over-activation and overgrowth. 106 In spite of these changes, the problem of merging was still pervasive. As mentioned in Sec. 3.1.3, the key to addressing this lay in the learning rule used to train the network. 107 Abbreviations ABS Artola-Bröcher-Singer BA Brodmann’s area BCM Bienenstock-Cooper-Munro EEG Electro-encephalography EPSP Excitatory post-synaptic potential ERF/P Event-related field/potential E-cell Excitatory cell FI Feedback inhibition fMRI functional magnetic resonance imaging IF Inferior frontal / prefrontal cortex IPSP Inhibitory post-synaptic potential I-cell Inhibitory cell LTD Long-term depression LTP Long-term potentiation MEG Magneto-encephalography MMN Mismatch Negativity N400 Negative component of ERP peaking at around 400ms PFC Prefrontal cortex SRS Square-root of the summed squares SEM Standard error of the mean SQUID Superconducting Quantum Interference Devices ST Superior temporal gyrus/sulcus WTA Winner-take-all 108 References Abraham, W. C., & Bear, M. F. (1996). Metaplasticity: the plasticity of synaptic plasticity. Trends Neurosci, 19(4), 126-130. Alho, K., Woods, D. L., Algazi, A., & Näätänen, R. (1992). Intermodal selective attention: II. Effects of attentional load on processing of auditory and visual stimuli in central space. Electroencephalography and Clinical Neurophysiology, 82, 356-368. Allport, D. A. (1980). Attention and performance. In G. Claxton (Ed.), Cognitive psychology: New directions (pp. 112-153). London: Routledge and Kegan Paul. Amir, Y., Harel, M., & Malach, R. (1993). Cortical hierarchy reflected in the organization of intrinsic connections in macaque monkey visual cortex. J Comp Neurol, 334(1), 19-46. Artola, A., Bröcher, S., & Singer, W. (1990). Different voltage-dependent thresholds for inducing long- term depression and long-term potentiation in slices of rat visual cortex. Nature, 347, 69-72. Artola, A., & Singer, W. (1993). Long-term depression of excitatory synaptic transmission and its relationship to long-term potentiation. Trends in Neurosciences, 16, 480-487. Baayen, H., Piepenbrock, R., & van Rijn, H. (1993). The CELEX lexical database (CD-Rom). University of Pennsylvania, PA: Linguistic Data Consortium. Baddeley, A. (1986). Working memory. Oxford: Oxford University Press. Barber, H. A., & Kutas, M. (2007). Interplay between computational models and cognitive electrophysiology in visual word recognition. Brain Res Brain Res Rev, 53(1), 98-123. Bear, M. F. (1995). Mechanism for a sliding synaptic modification threshold. Neuron, 15(1), 1-4. Bentin, S., Kutas, M., & Hillyard, S. A. (1995). Semantic processing and memory for attended and unattended words in dichotic listening: behavioral and electrophysiological evidence. J Exp Psychol Hum Percept Perform, 21(1), 54-67. Bi, G. Q., & Poo, M. M. (2001). Synaptic modification by correlated activity: Hebb's postulate revisited. Annual Review of Neuroscience, 24, 139-166. Bienenstock, E. L., Cooper, L. N., & Munro, P. W. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience, 2, 32-48. Braitenberg, V. (1978). Cell assemblies in the cerebral cortex. In R. Heim & G. Palm (Eds.), Theoretical approaches to complex systems (Vol. 21, pp. 171-188). Berlin: Springer. Braitenberg, V. (2001). Brain size and number of neurons: an exercise in synthetic neuroanatomy. Journal of Computational Neuroscience, 10(1), 71-77. Braitenberg, V., & Schüz, A. (1998). Cortex: statistics and geometry of neuronal connectivity (2 ed.). Berlin: Springer. Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press. Bundesen, C. (1990). A Theory of Visual-Attention. Psychological Review, 97(4), 523-547. Bundesen, C., Habekost, T., & Kyllingsbaek, S. (2005). A neural theory of visual attention: Bridging cognition and neurophysiology. Psychological Review, 112(2), 291-328. 109 Buonomano, D. V., & Merzenich, M. M. (1998). Cortical plasticity: from synapses to maps. Annual Review in Neuroscience, 21, 149-186. Catani, M., Jones, D. K., & Ffytche, D. H. (2005). Perisylvian language networks of the human brain. Annals of Neurology, 57(1), 8-16. Chelazzi, L., Duncan, J., Miller, E. K., & Desimone, R. (1998). Responses of neurons in inferior temporal cortex during memory-guided visual search. J Neurophysiol, 80(6), 2918-2940. Chelazzi, L., Miller, E. K., Duncan, J., & Desimone, R. (1993). A neural basis for visual search in inferior temporal cortex. Nature, 363(6427), 345-347. Christiansen, M. H., & Chater, N. (1999). Connectionist natural language processing: The state of the art. Cognitive Science, 23(4), 417-437. Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci, 3(3), 201-215. Corchs, S., & Deco, G. (2002). Large-scale neural model for visual attention: integration of experimental single-cell and fMRI data. Cereb Cortex, 12(4), 339-348. Crepel, F., & Jaillard, D. (1991). Pairing of pre- and postsynaptic activities in cerebellar Purkinje cells induces long-term changes in synaptic efficacy in vitro. J Physiol, 432(1), 123-141. Cristescu, T. C., & Nobre, A. C. (2008). Differential modulation of word recognition by semantic and spatial orienting of attention. Journal of Cognitive Neuroscience, 20(5), 787-801. David, O., & Friston, K. J. (2003). A neural mass model for MEG/EEG: coupling and neuronal dynamics. Neuroimage, 20(3), 1743-1755. Dayan, P., & Abbott, L. F. (2001). Theoretical Neuroscience: computational and mathematical modeling of neural systems. Cambridge, MA: MIT Press. Dayan, P., & Sejnowski, T. J. (1993). The Variance of Covariance Rules for Associative Matrix Memories and Reinforcement Learning. Neural Computation, 5(2), 205-209. Deco, G., & Rolls, E. T. (2005a). Attention, short-term memory, and action selection: A unifying theory. Progress in Neurobiology, 76(4), 236-256. Deco, G., & Rolls, E. T. (2005b). Neurodynamics of biased competition and cooperation for attention: A model with spiking neurons. Journal of Neurophysiology, 94(1), 295-313. Deco, G., Rolls, E. T., & Horwitz, B. (2004). "What'' and "where'' in visual working memory: A computational neurodynamical perspective for integrating fMRI and single-neuron data. Journal of Cognitive Neuroscience, 16(4), 683-701. Dehaene, S., Sergent, C., & Changeux, J. P. (2003). A neuronal network model linking subjective reports and objective physiological data during conscious perception. Proceedings of the National Academy of Sciences of the United States of America, 100(14), 8520-8525. Dell, G. S. (1986). A Spreading-Activation Theory of Retrieval in Sentence Production. Psychol Rev, 93(3), 283-321. Dell, G. S., Chang, F., & Griffin, Z. M. (1999). Connectionist models of language production: lexical access and grammatical encoding. Cognitive Science: A Multidisciplinary Journal, 23(4), 517 - 542. Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annu Rev Neurosci, 18, 193-222. 110 Douglas, R. J., & Martin, K. A. (2004). Neuronal circuits of the neocortex. Annu Rev Neurosci, 27, 419-451. Duncan, J. (1980). The Locus of Interference in the Perception of Simultaneous Stimuli. Psychological Review, 87(3), 272-300. Duncan, J. (1996). Competitive brain systems in selective attention. International Journal of Psychology, 31(3-4), 3343-3343. Duncan, J. (2006). EPS Mid-Career Award 2004 - Brain mechanisms of attention. Quarterly Journal of Experimental Psychology, 59(1), 2-27. Duncan, J., & Humphreys, G. W. (1989). Visual-Search and Stimulus Similarity. Psychological Review, 96(3), 433-458. Eggert, J., & van Hemmen, J. L. (2000). Unifying framework for neuronal assembly dynamics. Physical Review E, 61(2), 1855-1874. Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2-3), 195-225. Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking Innateness: a connectionist perspective on development: MIT Press. Endrass, T., Mohr, B., & Pulvermüller, F. (2004). Enhanced mismatch negativity brain response after binaural word presentation. European Journal of Neuroscience, 19(6), 1653-1660. Fadiga, L., Craighero, L., Buccino, G., & Rizzolatti, G. (2002). Speech listening specifically modulates the excitability of tongue muscles: a TMS study. European Journal of Neuroscience, 15(2), 399- 402. Freeman, W. J. (1978). Models of the dynamics of neural populations. Electroencephalogr Clin Neurophysiol Suppl(34), 9-18. Friedrich, C. K., Eulitz, C., & Lahiri, A. (2006). Not every pseudoword disrupts word recognition: an ERP study. Behav Brain Funct, 2, 36. Fry, D. B. (1966). The development of the phonological system in the normal and deaf child. In F. Smith & G. A. Miller (Eds.), The genesis of language (pp. 187-206). Cambridge, MA: MIT Press. Fukai, T., & Tanaka, S. (1997). A simple neural network exhibiting selective activation of neuronal ensembles: From winner-take-all to winners-share-all. Neural Computation, 9(1), 77-97. Furey, M. L., Tanskanen, T., Beauchamp, M. S., Avikainen, S., Uutela, K., Hari, R., et al. (2006). Dissociation of face-selective cortical responses by attention. Proceedings of the National Academy of Sciences of the United States of America, 103(4), 1065-1070. Fuster, J. M. (1995). Memory in the cerebral cortex. Cambridge, MA: MIT Press. Fuster, J. M. (1997). The prefrontal cortex: anatomy, physiology, and neuropsychology of the frontal lobe (3 ed.). New York: Raven Press. Fuster, J. M. (2003). Cortex and mind: Unifying cognition. Oxford: Oxford University Press. Gabbott, P. L., Somogyi, J., Stewart, M. G., & Hamori, J. (1986). GABA-immunoreactive neurons in the dorsal lateral geniculate nucleus of the rat: characterisation by combined Golgi-impregnation and immunocytochemistry. Exp Brain Res, 61(2), 311-322. 111 Garagnani, M., Wennekers, T., & Pulvermüller, F. (2007). A neuronal model of the language cortex. Neurocomputing, 70, 1914–1919. Garagnani, M., Wennekers, T., & Pulvermüller, F. (2008). A neuroanatomically grounded Hebbian- learning model of attention-language interactions in the human brain. Eur J Neurosci, 27(2), 492- 513. Gaskell, M. G., Hare, M., & Marslen-Wilson, W. D. (1995). A connectionist model of phonological representation in speech perception. Cognitive Science: A Multidisciplinary Journal, 19(4), 407 - 439. Gilbert, C. D., & Wiesel, T. N. (1983). Clustered Intrinsic Connections in Cat Visual-Cortex. Journal of Neuroscience, 3(5), 1116-1133. Grossberg, S. (1976a). Adaptive Pattern-Classification and Universal Recoding .1. Parallel Development and Coding of Neural Feature Detectors. Biological Cybernetics, 23(3), 121-134. Grossberg, S. (1976b). Adaptive Pattern-Classification and Universal Recoding .2. Feedback, Expectation, Olfaction, Illusions. Biological Cybernetics, 23(4), 187-202. Guenther, F. H., Ghosh, S. S., & Tourville, J. A. (2006). Neural modeling and imaging of the cortical interactions underlying syllable production. Brain Lang, 96(3), 280-301. Halgren, E., Dhond, R. P., Christensen, N., Van Petten, C., Marinkovic, K., Lewine, J. D., et al. (2002). N400-like magnetoencephalography responses modulated by semantic context, word frequency, and lexical class in sentences. Neuroimage, 17(3), 1101-1116. Hämäläinen, M. S., Hari, R., Ilmoniemi, R. J., Knuutila, J., & Lounasmaa, O. V. (1993). Magnetoencephalography - theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of Modern Physics, 65, 413-497. Hauk, O., Davis, M. H., Ford, M., Pulvermüller, F., & Marslen-Wilson, W. D. (2006). The time course of visual word recognition as revealed by linear regression analysis of ERP data. Neuroimage, 30(4), 1383-1400. Hebb, D. O. (1949). The organization of behavior. New York: John Wiley. Hirsch, J. C., Barrionuevo, G., & Crepel, F. (1992). Homo- and heterosynaptic changes in efficacy are expressed in prefrontal neurons: an in vitro study in the rat. Synapse, 12(1), 82-85. Hohlfeld, A., Mierke, K., & Sommer, W. (2004). Is word perception in a second language more vulnerable than in one's native language? Evidence from brain potentials in a dual task setting. Brain and Language, 89(3), 569-579. Holcomb, P. J., & Neville, H. J. (1990). Auditory and visual semantic priming in lexical decision: a comparision using event-related brain potentials. Language and Cognitive Processes, 5, 281-312. Hopfield, J. J. (1982). Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proceedings of the National Academy of Sciences of the United States of America- Biological Sciences, 79(8), 2554-2558. Hubel, D. (1995). Eye, brain, and vision (2 ed.). New York: Scientific American Library. Husain, F. T., Tagamets, M. A., Fromm, S. J., Braun, A. R., & Horwitz, B. (2004). Relating neuronal dynamics for auditory object processing to neuroimaging activity: a computational modeling and an fMRI study. Neuroimage, 21(4), 1701-1720. Ilmoniemi, R. J. (1993). Models of source currents in the brain. Brain Topography, 5(4), 331-336. 112 James, W. (1890). The principles of psychology. New York: Holt. Jansen, B. H., & Rit, V. G. (1995). Electroencephalogram and Visual-Evoked Potential Generation in a Mathematical-Model of Coupled Cortical Columns. Biological Cybernetics, 73(4), 357-366. Jefferys, J. G. R., Traub, R. D., & Whittington, M. A. (1996). Neuronal networks for induced '40 Hz' rhythms. Trends in Neurosciences, 19(5), 202-208. Jin, X., Mathers, P. H., Szabo, G., Katarova, Z., & Agmon, A. (2001). Vertical bias in dendritic trees of non-pyramidal neocortical neurons expressing GAD67-GFP in vitro. Cereb Cortex, 11(7), 666-678. Joanisse, M. F., & Seidenberg, M. S. (1999). Impairments in verb morphology after brain injury: A connectionist model. Proceedings of the National Academy of Sciences of the United States of America, 96(13), 7592-7597. Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences of the United States of America, 97(22), 11793-11799. Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall. Kandel, E. R., Schwartz, J. H., & Jessell, T. M. (2000). Principles of neural sciences (4 ed.). New York: McGraw-Hill, Health Professions Division. Katz, L. C., & Shatz, C. J. (1996). Synaptic activity and the construction of cortical circuits. Science, 274(5290), 1133-1138. Kirkwood, A., Rioult, M. C., & Bear, M. F. (1996). Experience-dependent modification of synaptic plasticity in visual cortex. Nature, 381(6582), 526-528. Knoblauch, A., & Palm, G. (2002). Scene segmentation by spike synchronization in reciprocally connected visual areas. I. Local effects of cortical feedback. Biol Cybern, 87(3), 151-167. Kohonen, T. (1984). Self-organisation and associative memory. Berlin: Springer. Kohonen, T., & Makisara, K. (1989). The Self-Organizing Feature Maps. Physica Scripta, 39(1), 168- 172. Korpilahti, P., Krause, C. M., Holopainen, I., & Lang, A. H. (2001). Early and late mismatch negativity elicited by words and speech-like stimuli in children. Brain and Language, 76(3), 332-339. Krichmar, J. L., Seth, A. K., Nitz, D. A., Fleischer, J. G., & Edelman, G. M. (2005). Spatial navigation and causal analysis in a brain-based device modeling cortical-hippocampal interactions. Neuroinformatics, 3(3), 197-221. Kujala, A., Alho, K., Valle, S., Sivonen, P., Ilmoniemi, R. J., Alku, P., et al. (2002). Context modulates processing of speech sounds in the right auditory cortex of human subjects. Neurosci Lett, 331(2), 91-94. Kutas, M., & Hillyard, S. A. (1980). Event-related brain potentials to semantically inappropriate and surprisingly large words. Biol Psychol, 11(2), 99-116. Lamme, V. A., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends Neurosci, 23(11), 571-579. Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences, 22, 1-75. Linsker, R. (1988). Self-Organization in a Perceptual Network. Computer, 21(3), 105-117. 113 Maess, B., Herrmann, C. S., Hahne, A., Nakamura, A., & Friederici, A. D. (2006). Localizing the distributed language network responsible for the N400 measured by MEG during auditory sentence processing. Brain Res, 1096(1), 163-172. Makris, N., Meyer, J. W., Bates, J. F., Yeterian, E. H., Kennedy, D. N., & Caviness, V. S. (1999). MRI- Based topographic parcellation of human cerebral white matter and nuclei II. Rationale and applications with systematics of cerebral connectivity. Neuroimage, 9(1), 18-45. Malenka, R. C., & Bear, M. F. (2004). LTP and LTD: An embarrassment of riches. Neuron, 44(1), 5- 21. Malenka, R. C., & Nicoll, R. A. (1999). Neuroscience - Long-term potentiation - A decade of progress? Science, 285(5435), 1870-1874. Mao, Z. H., & Massaquoi, S. G. (2007). Dynamics of winner-take-all competition in recurrent neural networks with lateral inhibition. IEEE transactions on neural networks 18(1), 55-69. Maunsell, J. H., & Newsome, W. T. (1987). Visual processing in monkey extrastriate cortex. Annu Rev Neurosci, 10, 363-401. McClelland, J. L., & Elman, J. L. (1986). The Trace model of speech perception. Cognitive Psychology, 18, 1-86. McClelland, J. L., & Rumelhart, D. E. (1985). Distributed memory and the representation of general and specific information. Journal of Experimental Psychology: General, 114, 159-188. Mikkulainen, R., Bednar, J., Choe, Y., & Sirosh, J. (Eds.). (2005). Computational Maps in the Visual Cortex: Springer. Miller, E. K., Gochin, P. M., & Gross, C. G. (1993). Suppression of Visual Responses of Neurons in Inferior Temporal Cortex of the Awake Macaque by Addition of a 2nd Stimulus. Brain Research, 616(1-2), 25-29. Miller, K. D. (1996). Synaptic economics: Competition and cooperation in synaptic plasticity. Neuron, 17(3), 371-374. Miller, K. D., & Mackay, D. J. C. (1994). The Role of Constraints in Hebbian Learning. Neural Computation, 6(1), 100-126. Miller, R., & Wickens, J. R. (1991). Corticostriatal cell assemblies in selective attention and in representation of predictable and controllable events: a general statement of corticostriatal interplay and the role of striatal dopamine. Concepts in Neuroscience, 2, 65-95. Milner, P. M. (1996). Neural representation: some old problems revisited. Journal of Cognitive Neuroscience, 8, 69-77. Moran, J., & Desimone, R. (1985). Selective Attention Gates Visual Processing in the Extrastriate Cortex. Science, 229(4715), 782-784. Moray, N. (1959). Attention and Dichotic Listening: Affective cies and the influence of instructions. Quarterly Journal of Experimental Physiology, 11, 56-60. Mountcastle, V. B. (1997). The columnar organization of the neocortex. Brain, 120, 701-722. Näätänen, R. (2001). The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology, 38(1), 1-21. Näätänen, R., Gaillard, A. W., & Mäntysalo, S. (1978). Early selective-attention effect on evoked potential reinterpreted. Acta Psychologica, 42, 313-329. 114 Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., et al. (1997). Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature, 385(6615), 432-434. Näätänen, R., Pakarinen, S., Rinne, T., & Takegata, R. (2004). The mismatch negativity (MMN): towards the optimal paradigm. Clin Neurophysiol, 115(1), 140-144. Navon, D., & Gopher, D. (1979). Economy of the Human-Processing System. Psychological Review, 86(3), 214-255. Norris, D. (1994). Shortlist - a Connectionist Model of Continuous Speech Recognition. Cognition, 52(3), 189-234. Nunez, P. (1974). Brain Wave-Equation - Model for Eeg. Electroencephalogr Clin Neurophysiol 37(4), 426-426. O'Reilly, R. C. (1998). Six principles for biologically based computational models of cortical cognition. Trends in Cognitive Sciences, 2(11), 455-462. Oldfield, R. C. (1971). The assessment and analysis of handedness: the Edinburgh Inventory. Neuropsychologia, 9, 97-113. Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607-609. Otten, L. J., Rugg, M. D., & Doyle, M. C. (1993). Modulation of Event-Related Potentials by Word Repetition - the Role of Visual Selective Attention. Psychophysiology, 30(6), 559-571. Page, M. (2000). Connectionist modelling in psychology: a localist manifesto. Behav Brain Sci, 23(4), 443-467; discussion 467-512. Palm, G. (1982). Neural assemblies. Berlin: Springer. Palm, G. (1987). Associative memory and threshold control in neural networks. In J. L. Casti & A. Karlqvist (Eds.), Real brains, artificial minds (pp. 165-179). New York: North-Holland. Pandya, D. N., & Yeterian, E. H. (1985). Architecture and connections of cortical association areas. In A. Peters & E. G. Jones (Eds.), Cerebral cortex. Vol. 4. Association and auditory cortices (pp. 3- 61). London: Plenum Press. Parker, G. J., Luzzi, S., Alexander, D. C., Wheeler-Kingshott, C. A., Ciccarelli, O., & Lambon Ralph, M. A. (2005). Lateralization of ventral and dorsal auditory-language pathways in the human brain. Neuroimage, 24(3), 656-666. Patterson, K., Nestor, P. J., & Rogers, T. T. (2007). Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience, 8(12), 976-987. Penke, M., & Westermann, G. (2006). Broca's area and inflectional morphology: evidence from broca's aphasia and computer modeling. Cortex, 42(4), 563-576. Petkov, C. I., Kayser, C., Augath, M., & Logothetis, N. K. (2006). Functional imaging reveals numerous fields in the monkey auditory cortex. PLoS Biol, 4(7), e215. Petrides, M., & Pandya, D. N. (2002). Comparative cytoarchitectonic analysis of the human and the macaque ventrolateral prefrontal cortex and corticocortical connection patterns in the monkey. Eur J Neurosci, 16(2), 291-310. 115 Pettigrew, C. M., Murdoch, B. E., Ponton, C. W., Finnigan, S., Alku, P., Kei, J., et al. (2004). Automatic auditory processing of english words as indexed by the mismatch negativity, using a multiple deviant paradigm. Ear Hear, 25(3), 284-301. Phaf, R. H., Vanderheijden, A. H. C., & Hudson, P. T. W. (1990). Slam - a Connectionist Model for Attention in Visual Selection Tasks. Cognitive Psychology, 22(3), 273-341. Plaut, D. C., & Gonnerman, L. M. (2000). Are non-semantic morphological effects incompatible with a distributed connectionist approach to lexical processing? Language and Cognitive Processes, 15(4- 5), 445-485. Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding normal and impaired word reading: computational principles in quasi-regular domains. Psychological Review, 103, 56-115. Plunkett, K., & Marchman, V. (1993). From rote learning to system building: acquiring verb morphology in children and connectionist nets. Cognition, 48(1), 21-69. Press, W. H., Teukolski, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical Recipes in C: the art of scientific computing. (2nd ed.): Cambridge University Press. Pulvermüller, F. (1992). Constituents of a neurological theory of language. Concepts in Neuroscience, 3, 157-200. Pulvermüller, F. (1999). Words in the brain's language. Behavioral and Brain Sciences, 22, 253-336. Pulvermüller, F. (2001). Brain reflections of words and their meaning. Trends in Cognitive Sciences, 5(12), 517-524. Pulvermüller, F. (2003). The neuroscience of language. Cambridge: Cambridge University Press. Pulvermüller, F. (2007). Brain processes of word recognition as revealed by neurophysiological imaging. In G. Gaskell (Ed.), Oxford Handbook of Psycholinguistics (pp. 119-140). Oxford: Oxford University Press. Pulvermüller, F., Eulitz, C., Pantev, C., Mohr, B., Feige, B., Lutzenberger, W., et al. (1996). High- frequency cortical responses reflect lexical processing: An MEG study. Electroencephalogr Clin Neurophysiol, 98(1), 76-85. Pulvermüller, F., Kujala, T., Shtyrov, Y., Simola, J., Tiitinen, H., Alku, P., et al. (2001). Memory traces for words as revealed by the mismatch negativity. Neuroimage, 14(3), 607-616. Pulvermüller, F., & Preissl, H. (1991). A cell assembly model of language. Network: Computation in Neural Systems, 2, 455-468. Pulvermüller, F., & Shtyrov, Y. (2006). Language outside the focus of attention: the mismatch negativity as a tool for studying higher cognitive processes. Progress in Neurobiology, 79(1), 49- 71. Pulvermüller, F., Shtyrov, Y., Hasting, A. S., & Carlyon, R. P. (2008). Syntax as a reflex: Neurophysiological evidence for early automaticity of grammatical processing. Brain Lang, 104(3), 244-253. Pulvermüller, F., Shtyrov, Y., & Ilmoniemi, R. J. (2003). Spatio-temporal patterns of neural language processing: an MEG study using Minimum-Norm Current Estimates. Neuroimage, 20, 1020-1025. Pulvermüller, F., Shtyrov, Y., & Ilmoniemi, R. J. (2005). Brain signatures of meaning access in action word recognition. Journal of Cognitive Neuroscience, 17(6), 884-892. 116 Pulvermüller, F., Shtyrov, Y., Kujala, T., & Näätänen, R. (2004). Word-specific cortical activity as revealed by the mismatch negativity. Psychophysiology, 41(1), 106-112. Rabinovich, M. I., Huerta, R., Volkovskii, A., Abarbanel, H. D., Stopfer, M., & Laurent, G. (2000). Dynamical coding of sensory information with competitive networks. Journal of Physiololgy, Paris, 94(5-6), 465-471. Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of "what" and "where" in auditory cortex. Proc Natl Acad Sci U S A, 97(22), 11800-11806. Raz, A., & Buhle, J. (2006). Typologies of attentional networks. Nat Rev Neurosci, 7(5), 367-379. Reddy, L., & Kanwisher, N. (2006). Coding of visual objects in the ventral stream. Current Opinion in Neurobiology, 16(4), 408-414. Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nat Neurosci, 2(11), 1019-1025. Rilling, J. K., Glasser, M. F., Preuss, T. M., Ma, X., Zhao, T., Hu, X., et al. (2008). The evolution of the arcuate fasciculus revealed with comparative DTI. Nat Neurosci, 11(4), 426-428. Rioult-Pedotti, M. S., Friedman, D., & Donoghue, J. P. (2000). Learning-induced LTP in neocortex. Science, 290(5491), 533-536. Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological mechanisms underlying the understanding and imitation of action. Nature Reviews. Neuroscience, 2(9), 661-670. Rockel, A. J., Hiorns, R. W., & Powell, T. P. (1980). The basic uniformity in structure of the neocortex. Brain, 103(2), 221-244. Rogers, T. T., Lambon Ralph, M. A., Garrard, P., Bozeat, S., McClelland, J. L., Hodges, J. R., et al. (2004). Structure and deterioration of semantic memory: a neuropsychological and computational investigation. Psychol Rev, 111(1), 205-235. Rogers, T. T., & McClelland, J. L. (1994). Semantic cognition. Cambridge, MA: MIT Press. Rolls, E. T., & Deco, G. (2002). Computational Neuroscience of Vision: Oxford University Press. Rolls, E. T., & Tovee, M. J. (1995). Sparseness of the Neuronal Representation of Stimuli in the Primate Temporal Visual-Cortex. Journal of Neurophysiology, 73(2), 713-726. Romanski, L. M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P. S., & Rauschecker, J. P. (1999). Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nat Neurosci, 2(12), 1131-1136. Rumelhart, D. E., Hinton, G., & Williams, R. (1986). Learning internal representations by backpropagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: explorations in the mircrostructure of cognition. Cambridge, MA: MIT Press. Sabri, M., Binder, J. R., Desai, R., Medler, D. A., Leitl, M. D., & Liebenthal, E. (2008). Attentional and linguistic interactions in speech perception. Neuroimage, 39(3), 1444-1456. Sato, T. (1989). Interactions of Visual-Stimuli in the Receptive-Fields of Inferior Temporal Neurons in Awake Macaques. Experimental Brain Research, 77(1), 23-30. Schneider, W. X. (1995). VAM: a neuro-cognitive model for visual attention control of segmentation, object recognition and space-based motor actions. Visual Cognition, 2, 331-376. 117 Schröger, E., Näätänen, R., & Paavilainen, P. (1992). Event-related potentials reveal how non-attended complex sound patterns are represented by the human brain. Neuroscience Letters, 146, 183-186. Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. (2000). Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123 Pt 12, 2400-2406. Segalowitz, S. J., & Zheng, X. (2008). An ERP study of category priming: Evidence of early lexical semantic access. Biol Psychol. Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychol Rev, 96(4), 523-568. Sejnowski, T. J. (1977). Storing Covariance with Nonlinearly Interacting Neurons. Journal of Mathematical Biology, 4(4), 303-321. Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145-168. Sereno, S. C., Rayner, K., & Posner, M. I. (1998). Establishing a time line for word recognition: evidence from eye movements and event-related potentials. NeuroReport, 13, 2195-2200. Shastri, L. (2001). Biological Grounding of Recruitment Learning and Vicinal Algorithms in Long- Term Potentiation. In J. Austin, S. Wermter & D. Willshaw (Eds.), Emergent neural computational architectures based on neuroscience, Lecture Notes in Computer Science (Vol. 2036, pp. 348-367). Berlin: Springer-Verlag. Shastri, L., & Ajjanagadde, V. (1993). From simple associations to systematic reasoning: a connectionist representation of rules, variables and dynamic bindings using temporal synchrony. Behavioral and Brain Sciences, 16, 417-494. Shtyrov, Y., Pihko, E., & Pulvermüller, F. (2005). Determinants of dominance: Is language laterality explained by physical or linguistic features of speech? Neuroimage, 27(1), 37-47. Shtyrov, Y., & Pulvermüller, F. (2002). Neurophysiological evidence of memory traces for words in the human brain. Neuroreport, 13, 521-525. Somogyi, P., Cowey, A., Halasz, N., & Freund, T. F. (1981). Vertical organization of neurones accumulating 3H-GABA in visual cortex of rhesus monkey. Nature, 294(5843), 761-763. Song, S., Miller, K. D., & Abbott, L. F. (2000). Competitive Hebbian learning through spike-timing- dependent synaptic plasticity. Nature Neuroscience, 3(9), 919-926. Sperling, G. A. (1960). The information available in brief visual presentation. Psychological Monographs, 74, 498(11). Stanton, P. K., & Sejnowski, T. J. (1989). Associative long-term depression in the hippocampus induced by hebbian covariance. Nature, 339(6221), 215-218. Stevens, C. F. (1989). How Cortical Interconnectedness Varies with Network Size. Neural Computation, 1(4), 473-479. Suffczynski, P., Kalitzin, S., Pfurtscheller, G., & Lopes da Silva, F. H. (2001). Computational model of thalamo-cortical networks: dynamical control of alpha rhythms in relation to focal attention. Int J Psychophysiol, 43(1), 25-40. Szabo, M., Almeida, R., Deco, G., & Stetter, M. (2004). Cooperation and biased competition model can explain attentional filtering in the prefrontal cortex. Eur J Neurosci, 19(7), 1969-1977. 118 Szymanski, M. D., Yund, E. W., & Woods, D. L. (1999). Phonemes, intensity and attention: differential effects on the mismatch negativity (MMN). J Acoust Soc Am, 106(6), 3492-3505. Tagamets, M. A., & Horwitz, B. (1998). Integrating electrophysiological and anatomical experimental data to create a large-scale model that simulates a delayed match-to-sample human brain imaging study. Cerebral Cortex, 8(4), 310-320. Taulu, S., & Kajola, M. (2005). Presentation of electromagnetic multichannel data: The signal space separation method. Journal of Applied Physics, 97(12), 124905. Taulu, S., Kajola, M., & Simola, J. (2004). Suppression of interference and artifacts by the Signal Space Separation Method. Brain Topogr, 16(4), 269-275. Tsumoto, T. (1992). Long-term potentiation and long-term depression in the neocortex. Progress in Neurobiology, 39(2), 209-228. Turrigiano, G. G., Leslie, K. R., Desai, N. S., Rutherford, L. C., & Nelson, S. B. (1998). Activity- dependent scaling of quantal amplitude in neocortical neurons. Nature, 391(6670), 892-896. Turrigiano, G. G., & Nelson, S. B. (2004). Homeostatic plasticity in the developing nervous system. Nat Rev Neurosci, 5(2), 97-107. Uutela, K., Hämäläinen, M., & Somersalo, E. (1999). Visualization of magnetoencephalographic data using minimum current estimates. Neuroimage, 10(2), 173-180. Walley, R. E., & Weiden, T. D. (1973). Lateral Inhibition and Cognitive Masking - Neuropsychological Theory of Attention. Psychological Review, 80(4), 284-302. Watkins, K. E., & Paus, T. (2004). Modulation of motor excitability during speech perception: the role of Broca's area. J Cogn Neurosci, 16(6), 978-987. Watkins, K. E., Strafella, A. P., & Paus, T. (2003). Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia, 41(8), 989-994. Wendling, F., Bellanger, J. J., Bartolomei, F., & Chauvel, P. (2000). Relevance of nonlinear lumped- parameter models in the analysis of depth-EEG epileptic signals. Biological Cybernetics, 83(4), 367-378. Wennekers, T., & Palm, G. (2007). Modelling generic cognitive functions with operational Hebbian cell assemblies. In M. L. Weiss (Ed.), Neural Network Research Horizons (pp. 225-294): Nova Science. Wennekers, T., Sommer, F., & Aertsen, A. (2003). Theories in bioscience - Editorial: Cell assemblies. Theory in Biosciences, 122(1), 1-4. Westermann, G., & Miranda, E. R. (2004). A new model of sensorimotor coupling in the development of speech. Brain and Language, 89(2), 393-400. Wickens, J. R. (1993). A theory of the striatum. Oxford: Pergamon Press. Willshaw, D. J., Buneman, O. P., & Longuet-Higgins, H. C. (1969). Non-holographic associative memory. Nature, 222(197), 960-962. Wilson, H. R., & Cowan, J. D. (1973). A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue. Kybernetik, 13, 35-80. Wilson, S. M., Saygin, A. P., Sereno, M. I., & Iacoboni, M. (2004). Listening to speech activates motor areas involved in speech production. Nat Neurosci, 7(7), 701-702. 119 Woldorff, M. G., Hillyard, S. A., Gallen, C. C., Hampson, S. R., & Bloom, F. E. (1998). Magnetoencephalographic recordings demonstrate attentional modulation of mismatch-related neural activity in human auditory cortex. Psychophysiology, 35, 283-292. Wood, N., & Cowan, N. (1995). The Cocktail Party Phenomenon Revisited - How Frequent Are Attention Shifts to Ones Name in an Irrelevant Auditory Channel. Journal of Experimental Psychology-Learning Memory and Cognition, 21(1), 255-260. Woods, D. L., Alho, K., & Algazi, A. (1992). Intermodal selective attention. I. Effects on event-related potentials to lateralized auditory and visual stimuli. Electroencephalogr Clin Neurophysiol, 82(5), 341-355. Young, M. P. (2000). The architecture of visual cortex and inferential processes in vision. Spat Vis, 13(2-3), 137-146. Yuille, A. L., & Geiger, D. (2003). Winner-Take-All Mechanisms. In M. Arbib (Ed.), The Handbook of Brain Theory and Neural Networks (pp. 1056-1060). Cambridge, MA: MIT Press. Zatorre, R. J., Meyer, E., Gjedde, A., & Evans, A. C. (1996). PET studies of phonetic processing of speech: review, replication, and reanalysis. Cereb Cortex, 6(1), 21-30. Zipser, D., Kehoe, B., Littlewort, G., & Fuster, J. (1993). A spiking network model of short-term active memory. Journal of Neuroscience, 13(8), 3406-3420.