Dimensions of cognition, behaviour, and mental health in struggling learners: A spotlight on girls

Abstract Background Fewer girls than boys are identified as struggling at school for suspected problems in attention, learning and/or memory. The objectives of this study were to: i) identify dimensions of cognition, behaviour and mental health in a unique transdiagnostic sample of struggling learners; ii) test whether these constructs were equivalent for boys and girls, and; iii) compare their performance across the dimensions. Methods 805 school‐aged children, identified by practitioners as experiencing problems in cognition and learning, completed cognitive assessments, and parents/carers rated their behaviour and mental health problems. Results Three cognitive [Executive, Speed, Phonological], three behavioural [Cognitive Control, Emotion Regulation, Behaviour Regulation], and two mental health [Internalising, Externalising] dimensions distinguished the sample. Dimensions were structurally comparable between boys and girls, but differences in severity were present: girls had greater impairments on performance‐based measures of cognition; boys were rated as having more severe externalising problems. Conclusions Gender biases to stereotypically male behaviours are prevalent among practitioners, even when the focus is on identifying cognitive and learning difficulties. This underscores the need to include cognitive and female‐representative criteria in diagnostic systems to identify girls whose difficulties could go easily undetected.

mixed sample of struggling learners with and without diagnosed difficulties who were identified by health and education practitioners as having problems in attention, learning and/or memory.

WHY MORE BOYS THAN GIRLS?
One possibility for the high male: female sex ratio might relate to a reliance on diagnostic systems that have emerged from descriptions and categorisations of overt behaviours (discussed in Kreiser & White, 2014;Mowlem, Agnew-Blais, et al., 2019). Criteria related to overt and externalising behaviours, such as hyperactivity for ADHD, are likely recognised more easily by practitioners and tolerated less well by parents and teachers (Gaub & Carlson, 1997). Add to that socially constructed gender-biased or stereotypical views of boys as being disruptive, the problem becomes apparent: overt behaviours are more likely to raise concern and expected to be more prevalent in boys. Indeed, studies have shown that boys are rated as more disruptive (e.g., Sciutto et al., 2004).
Another explanation is that symptom manifestations differ between boys and girls. The way in which girls express their difficulties may preclude diagnoses, or make their challenges more difficult to detect. Girls with ADHD often present with predominantly inattentive and internalising symptoms (Biederman et al., 2002;Levy et al., 2005;Rucklidge & Tannock, 2001). In contrast, boys typically present with hyperactive/impulsive symptoms and externalising behaviours (Abikoff et al., 2002;Quinn, 2008). Similarly, some autistic 2 girls use behavioural camouflaging strategies, appearing less autistic in social interactions (Dean et al., 2017;Hull et al., 2020) and more able from others' perspectives (Hiller et al., 2014), meaning they are less likely to receive a diagnosis. The same is also true in schools, where boys are more likely to manifest externalising behaviours and receive a referral for special education assessment (Dhuey & Lipscomb, 2010).

SEX DIFFERENCES IN AT-RISK DEVELOPMENTAL POPULATIONS
Extant research investigating sex differences in cognition, behaviour and mental health in developmental populations has produced mixed results (e.g., Duvall et al., 2020;Gershon, 2002;Gur & Gur, 2017;Gur & Gur, 2016;Mandy et al., 2012;Mayes et al., 2020;Rucklidge, 2008). While some meta-analyses report that girls show more cognitive impairments than boys (see Gershon, 2002;Gur & Gur, 2016;Gur & Gur, 2017;Hull et al., 2017), other large scale studies report no differences between boys and girls (e.g., Duvall et al., 2020), or alternatively report strengths in language and memory for girls, and spatial tasks and speed for boys (Gur & Gur, 2016. Similarly, while externalising behaviours such as conduct problems, hyperactivity and aggression are reported to be more common in boys (e.g., Biederman et al., 2002;Mandy et al., 2012;Willcutt & Pennington, 2000), and internalising problems such as anxiety and depression more so in girls (Gershon, 2002;Hull et al., 2017;Kreiser & White, 2014), this is not always the case (e.g., Lecavalier, 2006;Mayes et al., 2020Murphy et al., 2009;Rucklidge, 2008). In their review of sex differences in ADHD, for example, Rucklidge (2008) noted that, although aggression and externalising behaviours were generally more common in boys than in girls, these findings were not always consistent across studies. Mayes et al. ( , 2020 made similar observations for autistic children; boys and girls did not differ in externalising and internalising problems. One striking and unifying feature is that much of this literature is dominated by the study of clinical samples of diagnosed children (e.g. discussed in Kreiser & White, 2014;Mowlem, Agnew-Blais, et al., 2019, Mowlem, Rosenqvist, et al., 2019. This likely biases our understanding towards a male phenotype, both because clinical samples typically include a greater number of boys, and because girls included in such samples are likely to present with overt behaviours more similar to those typically expressed in boys (Mowlem, Rosenqvist, et al., 2019). Recruiting selective samples based on the presence of a particular diagnosis means that we do not understand sex differences either in children with milder needs, or in those with more complex and co-occurring needs who are often excluded from studies on the basis that their comorbid problems are considered a confound.
Developmental problems are increasingly studied using transdiagnostic approaches. These aim to identify dimensions of difficulty that occur across individuals irrespective of diagnostic status (Cuthbert & Insel, 2013). The dimensions studied focus on characteristics and mechanisms that may not align with any conventional diagnostic category (Astle et al., 2021). A spectrum of study designs can yield transdiagnostic insights, including those that recruit via functional definition. These relax recruitment criteria to sample broader populations of individuals with additional needs who would not necessarily be represented in diagnostic-based approaches: they replace diagnostic criteria-based selection with sampling based on functionally defined needs (Astle et al., 2021). This approach offers an important alternative way to understand sex differences in struggling Key points � Our understanding of sex differences in at-risk developmental populations is dominated by studies of clinical samples, likely biasing knowledge towards a male phenotype � Male and female phenotypes were characterised in a large transdiagnostic sample of children identified by practitioners as experiencing problems in cognition and learning, irrespective of diagnostic status � Practitioners recognised more boys than girls as struggling � Dimensions of cognition, behaviour and mental health were structurally invariant between boys and girls, but performance-based cognitive problems were more severe in girls, and behavioural difficulties and externalising problems greater in boys � These findings illustrate the profile of struggling girls and highlight systemic and implicit biases in the fields of healthcare and education that need to be addressed to provide appropriate support learners. Rather than focussing on children with diagnoses informed by diagnostic criteria, which may be biased towards stereotypically male behaviours, transdiagnostic sampling based on functional needs provides the opportunity to recruit children with a broader range of developmental and learning difficulties (e.g., Casey et al., 2014): it provides a way to explore sex differences in the common struggling learner, who may not conform in presentation to standard "malebiased" diagnostic criteria, as well as children who have existing diagnoses.

THE CURRENT STUDY
The current study adopts a transdiagnostic, functional needs-based, approach to characterise male and female phenotypes in a large mixed sample of children described as struggling in school. The goal was to recruit a highly heterogeneous sample of children varying in both the severity and nature of their learning-related problems, which was not biased towards classically "male" behaviours. This could not be achieved using the recruitment methods usually applied in the sex differences literature: depending exclusively on children with recognized disorders through specialist clinics would exclude children with difficulties that are not captured by diagnostic rubrics, which are likely to be girls who are struggling, but who do not present with overt behavioural problems.
The sample included children who were identified as experiencing problems in attention, learning and/or memory by education and health professionals. It included children with relatively mild problems judged to be compromising their academic progress, who would likely not meet diagnostic thresholds, in addition to many children whose more marked problems would: some children had a single diagnosis, others had multiple diagnoses, but the majority were undiagnosed despite coming to the attention of a professional for experiencing difficulties that were affecting their school progress. By adopting a transdiagnostic perspective, we were therefore able to include children who are not currently represented in the literature on sex differences, specifically those with milder problems who are unlikely to meet diagnostic thresholds, those with presentations that did not fit the "male-biased" behaviours defined by diagnostic criteria, and those with complex and co-occurring problems. This enabled us to: i) test whether recruitment based on functional needs rather than diagnostic status replicated the high boy:girl ratio documented in studies using diagnosis-based recruitment, and ii) explore whether there are sex differences in the types and severity of problems experienced by this broader population of children who are struggling.
Consistent with data-driven approaches adopted across transdiagnostic studies (e.g., Holmes et al., 2021;Kotov et al., 2017;Mercier et al., 2018;Reininghaus et al., 2019;Sokolova et al., 2017) a latent variable approach was used to identify dimensions of difficulty in the whole sample, and then to test whether these dimensions differed in structure and severity between boys and girls. Identifying dimensions side-steps debates about which of two different measures sharing common variance represents a core deficit or difference, and instead identifies the major sources of variance across all measures in a dataset. In this case, the broad dimensions of cognition, mental health and behaviour that may or may not differ between boys and girls.
We broadly classified multiple individual tasks and behaviour ratings into three domains: cognition, behaviour, and mental health.
Performance-based tasks capturing the processing efficiency of cognitive abilities in structured conditions were used to index function in the cognitive domain. The tasks selected for inclusion were those measures that were administered to the whole sample, and which were included in a study that previously identified the cognitive dimensions differentiating performance in this sample (see Holmes et al., 2020). This earlier study identified three cognitive dimensions, executive function, processing speed and phonological processing, using age-normed scores (Holmes et al., 2020). Here we included the same tasks and participants but use age-regressed raw scores in place of the age-normed scores used by Holmes et al. (2020) because some of the measures factor sex into their age standardization (Mayes et al., 2020). In the interests of replication, we adopted the same analytic approach as Holmes et al. (2020) to identify cognitive dimensions using these scores.
Despite some content overlap, parent ratings of behaviour were categorised a priori as either behaviour or mental health (see Table S1), based on both common uses of the measures in research (Alloway et al., 2009;Fink et al., 2015;Patalay et al., 2015) and their use in clinical and educational practice. Scales used widely to measure externalising and internalising symptoms were classified as mental health, while those capturing symptoms associated with cognitive or neurodevelopmental difficulties were classified as behaviour. The parent ratings of behaviour included observations of the children's cognitive behaviours in everyday settings. We use the terminology of our test instruments to describe the observed measures of cognitive function throughout, meaning the same terms (e.g., executive functions or working memory) are used to refer to both objectively measured cognitive abilities that we have classed as "cognitive" and to subjective ratings of cognitive behaviours that we have classed as "behaviour". Despite this overlap in terminology, we conceptualise objective cognitive task performance and everyday cognitive behaviours as separate constructs, consistent with an extensive literature suggesting they provide non-overlapping information, and that functioning in the two domains makes independent contributions to clinical and academic problems (Soto et al., 2020;Toplak et al., 2013).
Existing evidence for differences in the manifestation of cognitive, behavioural and mental health problems between boys and girls with neurodevelopmental problems is mixed. This, combined with the unique nature of our sample, motivated our choice to conduct all analyses in a data-driven and exploratory fashion. We, therefore, had no specific predictions about whether the factor compositions would be similar for male and female struggling learners, or whether the severity of impairments would differ between boys and girls on specific dimensions.

Procedure and measures
The cognitive, behavioural and mental health data from the Centre for Attention, Learning and Memory (CALM) cohort were used (see Table S1 for a description of the tasks). Recruitment details and DIMENSIONS OF COGNITION, BEHAVIOUR, AND MENTAL HEALTH IN STRUGGLING LEARNERS -3 of 13 testing procedures are described in the study protocol (Holmes et al., 2019). Ethical approval was granted by the National Health Service (REC: 13/EE/0157). Parents/caregivers provided written consent and child verbal assent was obtained.
Raw scores were used in all analyses as some measures factor sex into their age standardization (Mayes et al., 2020). To control for age, raw scores were regressed on age and the residuals were used.
Higher raw scores were associated with better performance for the cognitive tasks, but greater severity for the behavioural and mental health questionnaires. Residuals for Rapid Naming, Simple Reaction Time (SRT; TEA-Ch2; Manly et al., 2016) and Prosocial Behaviour (SDQ; Goodman, 1997) were reverse coded to streamline the interpretation of respective cognitive and mental health measures.
Missing data were imputed with a full information maximum likelihood estimator for all models (Rosseel, 2012).
Performance was close to age-appropriate levels for Mr X and Cancellation (see Table 1). All other cognitive scores were approximately one standard deviation below age-normed population means.
Behavioural problems were elevated for the whole sample (see Table S3), with the exception of Organisational problems (BRIEF; Gioia et al., 2000). The majority of mental health ratings on the RCADS-P (Chorpita et al., 2000) were elevated, but still within agetypical and subclinical bounds (i.e. RCADS-P T score less than 65).

Analysis plan
Analyses were conducted in four steps: exploratory factor analysis (EFA), confirmatory factor analysis (CFA), multigroup CFA with measurement invariance (Steenkamp & Baumgartner, 1998;van de Schoot et al., 2013), and comparisons of latent means (intercepts). A detailed description of this approach is provided in the Supporting Information. Parallel analysis was used to determine the maximum number of factors to extract in EFAs for the cognitive, behavioural and mental health data. Parallel analysis involves simulations that create random datasets with properties similar to the true data: estimated numbers of factors are extracted and compared to a permuted baseline, and extraction is stopped when eigenvalues fall within the 95% confidence interval of eigenvalues from the simulated data, revealing the optimal number of factors to extract from the true data. For the EFA, factor structures were considered interpretable if they provided a good fit to the data and there was a minimum of two primary loadings per latent construct (note that two loadings are considered acceptable with large sample sizes, Costello & Osborne, 2005). The labelling of the factors reflected the constellation of the highest loading variables. For the cognitive domain (performance-based tasks), the factor structure and labelling was based on a previous study using the same cohort data and cognitive tasks (see Holmes et al., 2020). All analyses were conducted using R version 4.0.3 using the Psych (2.0.12; Revelle, 2020), Lavaan (0.6-7; Rosseel, 2012) and semTools packages (0.5-4, Jorgensen et al., 2020).

Descriptive statistics
Descriptive statistics for boys and girls are presented in Table 1.
Additional descriptive statistics for the whole sample are provided in Table S3. Correlations between the measures are provided in Table S4-S6.
Considerably more boys than girls were referred: 552 boys and 253 girls. Comparisons between boys and girls revealed girls performed more poorly than boys on the majority of cognitive measures (see Table 1). Boys were rated higher than girls on most of the behaviour rating scales, except WM (BRIEF; p = 0.05), Organisation  Table 1). Boys also had elevated ratings on mental health subscales measuring conduct problems, hyperactivity, and prosocial behaviours (all SDQ; all ps < 0.05; see Table 1).  Table S7). Despite the good fit, this model was difficult to interpret because the fourth factor representing visual STM had only one indicator and the factor reflecting phonological processing and attention had loadings from measures with little in common. For these reasons, a three-factor model was tested.

Dimensions of cognition, behaviour and mental health
Fit statistics for a three-factor solution indicated that this model was a good fit, χ2 (42) = 105.5, p < 0.001, RMSEA = 0.044 (90% confidence interval [CI] = 0.033, 0.054), CFI = 0.947, RMSR = 0.030, see Table S7). The first factor was most strongly associated with measures that draw on executive resources (Dot Matrix, Backward Digit Recall, Mr X and Matrix Reasoning). The second factor was linked mostly to speeded tasks or tasks that were completed under time constraints (SRT, Rapid Naming, Cancellation, and Vigil). This factor was also linked to tasks that were not speeded (Alliteration, Delayed Recall, and Following Instructions), but that might be performed better if they are performed quickly (e.g., due to less forgetting time). The third factor was associated with measures

Behaviour
Parallel and EFA analyses identified a three-factor solution as an acceptable fit to the behavioural data (see Table S8), χ2  Table S10).
Group differences in latent means were explored by comparing the freely estimated and constrained models (see Figure 1). The

Behaviour
The behavioural model met conditions for configural and metric invariance but not for scalar invariance (see Table S10). Modification indices for subtest mean scores revealed discrepancies for Organisation, Planning and WM. Allowing these intercepts to vary freely between groups improved the model fit and partial scalar invariance was achieved, Δχ2(6) = 11.69, p = 0.07 (see Figure 2).

Mental health
For mental health, there was no significant deterioration of model fit with increasing constraints. The conditions of configural, metric and scalar invariance were met indicating that the overall structure, loadings and intercepts were similar across groups (see Table S10 and Figure 3).
Comparisons between boys and girls revealed that constraining intercepts for the Externalising factor significantly degraded the fit,

DISCUSSION
This is the first study, to our knowledge, to adopt a transdiagnostic dimensional approach to understanding sex differences in children with developmental difficulties. The key findings were that more boys than girls were referred, and while dimensions of cognition, behaviour and mental health were invariant across boys and girls, cognitive problems were more severe in girls and behavioural difficulties and externalising problems greater in boys.

Prevalence of boys and girls
Prevalence estimates indicating a high boy:girl ratio for developmental difficulties are drawn predominantly from studies of clinical populations (e.g. autistic children or those with ADHD, discussed in Kreiser & White, 2014;Mowlem, Agnew-Blais, et al., 2019, Mowlem, Rosenqvist, et al., 2019. Using a novel transdiagnostic sampling frame, which was based on functional need and aimed to represent the full spectrum of children with learning-related difficulties, including those with problems that are less likely to fit with the behaviours described by diagnostic criteria, we also found a high boy: girl ratio, with twice as many boys than girls referred. This might reflect implicit gender biases and stereotyping (e.g., discussed in Anderson, 1997), or different manifestations, drivers, and expressions of difficulties in boys and girls (e.g., Dhuey & Lipscomb, 2010;Hiller et al., 2014), but we suspect it is also related to practitioners using heuristics for diagnostic criteria that emphasize overt behaviours.
Referrers to this study identified children based on observations of cognitive and learning problems. Despite this, health and education practitioners referred more children with behavioural difficulties than cognitive problems, suggesting they were spotting and raising concern for overt behaviours more easily than cognitive problems. This bias towards male-focussed diagnostic criteria makes it less likely for a girl to be diagnosed with conditions such as ASD (Lai et al., 2015), and here we see it extends to the broader population of children struggling at school. Moving forward it will be important to decrease these biases towards diagnostic criteria, and increase knowledge of the female phenotype among professionals involved in referrals, to ensure we meet the needs of girls who are struggling.

Differences between boys and girls
Three broad dimensions underpinning performance on cognitive tasks were associated with measures that were largely spatial or Processing). These factors correspond to those previously identified in a study investigating the cognitive dimensions of learning in the same children from the CALM cohort (Holmes et al., 2020), which used age-normed scores that factor sex into their standardisation. Using age-regressed raw scores so as not to mask sex differences, this study shows again that the key constructs that distinguish cognitive abilities in typically developing children and adults also differentiate cognitive test performance in struggling learners.
The cognitive dimensions were invariant across boys and girls, indicating that the overall latent structure of cognitive skills does not differ between sexes. There were no differences in scores on the phonological or speed dimensions, but girls were more impaired on the executive function dimension. These data support the notion that girls must show greater cognitive deficits for educational or health practitioners to notice their struggles (e.g., Dworzynski et al., 2012;Gaub & Carlson, 1997). They also suggest the biggest driver of problems for girls in our sample is performance-based executive function difficulties. Executive functions are associated with learning outcomes in typical and neurodiverse groups (e.g., Peng et al., 2018;Swanson & Ashbaker, 2000). However, there is evidence that girls may draw more on these resources than boys: girls take a more effortful, mastery-based approach to learning that draws on general higher-order cognitive skills, while boys draw more on domainspecific knowledge and skills during learning (Brunner et al., 2008;Kenney-Benson et al., 2006). If this is the case, then impairments in executive function problems might be expected to have a more significant impact on girls' school progress, and this might explain why the girls referred to the CALM cohort were characterised by more severe executive function problems than the boys.  Castellanos et al., 2006). According to these models, ADHD symptoms arise as a consequence of impairments in two neurobiological pathways: one serves cool cognitive functions such as working memory, planning and switching, and the other hot executive functions that contribute to hyperactivity/impulsivity and emotional-reward dysregulation (Zelazo & Müller, 2002). The third dimension, Emotional Regulation, resembles part of a broader selfregulation concept, which is also linked to increased risks for ADHD (Walcott & Landau, 2004) and other psychopathologies (e.g., McLaughlin et al., 2011;Röll et al., 2012). For mental health, two dimensions emerged, internalising and externalising. These align with models of child psychopathology (Achenbach, 1966;McElroy, et al., 2018;Patalay et al., 2015).
Dimensions of behaviour and mental health were the same for boys and girls, but the severity of their impairments differed on specific dimensions. There were no sex differences on the internalising symptoms dimension, consistent both with other recent findings from the same cohort  and with evidence from other developmental populations (e.g., Mayes et al., 2020). Symptoms on this dimension were elevated for both boys and girls. Internalising problems have been linked to stressful and negative life events (Kim et al., 2003;March-Llanes et al., 2017), which are likely common among our sample, and may explain why symptoms were elevated for both sexes. These elevated levels may explain why there were no sex differences. Overall, girls had fewer externalising problems and fewer difficulties across all three dimensions of behaviour than boys. This could mean that externalising symptoms and overt behaviours commonly associated with ADHD are genuinely more prevalent and manifest in boys (e.g., Abikoff et al., 2002). Alternatively, elevated problem behaviours in boys could reflect socially constructed gender-biased or stereotypical views of boys as being disruptive, and the application and use of diagnostic criteria that emphasise overt behaviours (Hiller et al., 2014;Mowlem, Agnew-Blais, et al., 2019).

Limitations
While there are many strengths to this study, several limitations need to be acknowledged. Our novel sampling approach broadens the study of sex differences in neurodevelopmental populations to include a more representative sample than is typical, but there are some drawbacks. First, our recruitment approach relied on practitioner referral, opening the possibility of gender bias in referrals.
Second, while critical to addressing the study goals, it is unclear whether our findings will generalize to samples recruited using different selection criteria. In terms of assessments, we made a priori choices about the classification of measures as cognitive, behavioural, or mental health considering differences in measurement type (objective task performance or subjective questionnaire rating) and their categorisation and use in both previous studies and in practice.
It is possible that classifying our measures in a different way would produce different results, although the sex differences observed at the individual task-level align with the patterns of differences observed at the dimensional level providing confidence in the primary outcomes. A final issue concerns the labelling of latent factors.
For simplicity and clarity, labels were assigned to each factor, as is standard practice in the field. The labels reflected the hypothesized dimension underlying differences in performance, based on the constellation of tasks or subscale scores with the highest loadings on each factor, but they do not reflect a rigid mapping between each measure and each factor. For example, the second factor in the cognitive model is labelled processing speed because the tasks loading most highly on this factor were either speeded tasks (scores were based on RTs) or completed under time constraints. Labelling this factor as processing speed does not imply that either the Alliteration or Following Instructions tasks are measures of processing speed. The challenge of assigning appropriate labels to latent constructs is not unique to this study and does not detract from the benefits of having theory-guided labels to aid the interpretability of our findings.

CONCLUSION
This study shows that when health and education professionals identify children with cognitive and learning problems, they recognise more boys than girls. Despite this, girls who were referred showed greater difficulties on performance-based measures than boys, with significantly greater impairments in executive functioning. They exhibited fewer externalising problems and were rated as having fewer behavioural cognitive difficulties than boys. These results underscore the need to include cognitive and female-representative criteria in diagnostic systems. Including these criteria, and/or routinely administering performance-based cognitive assessments in schools may help to identify girls whose difficulties could easily go undetected. By raising awareness of the profile of struggling girls, and drawing attention to the systemic and implicit bias present in the fields of both healthcare and education, we have the potential to increase the likelihood that girls' difficulties will be recognised. female". We acknowledge that sex can interact with gender in different ways; while some individuals' identities are informed by both sex and gender, others are not (Gur & Gur, 2016;Lai et al., 2015). However, we use the term "sex" throughout the paper, as the majority of neurodevelopmental studies, and indeed our own, record biological sex at birth.