These authors contributed equally to this work
There are many monitoring environments, such as railway control, in which lapses of attention can have tragic consequences. Problematically, sustained monitoring for rare targets is difficult, with more misses and longer reaction times over time. What changes in the brain underpin these ‘vigilance decrements’? We designed a multiple-object monitoring (MOM) paradigm to examine how the neural representation of information varied with target frequency and time performing the task. Behavioural performance decreased over time for the rare target (monitoring) condition, but not for a frequent target (active) condition. This was mirrored in neural decoding using magnetoencephalography: coding of critical information declined more during monitoring versus active conditions along the experiment. We developed new analyses that can predict behavioural errors from the neural data more than a second before they occurred. This facilitates pre-empting behavioural errors due to lapses in attention and provides new insight into the neural correlates of vigilance decrements.
When people monitor displays for rare targets, they are slower to respond and more likely to miss those targets relative to frequent target conditions (
To date, most vigilance and rare target studies have used simple displays with static stimuli. Traditional vigilance tasks, inspired by radar operators in WWII, require participants to respond to infrequent visual events on otherwise blank screens, and show more targets are missed as time on task increases (
Despite these efforts, modern environments (e.g., rail and air traffic control) have additional challenges not encapsulated by these measures. This includes multiple moving objects, potentially appearing at different times, and moving simultaneously in different directions. When an object moves in the space, its neural representation has to be continuously updated so we can perceive the object as having the same identity. Tracking moving objects also requires considerable neural computation: in addition to spatial remapping, for example, we need to predict direction, speed, and the distance of the object to a particular destination. These features cannot be studied using static stimuli; they require objects that shift across space over time. In addition, operators have complex displays requiring selection of some items while ignoring others. We therefore need new approaches to study vigilance decrements in situations that more closely resemble the real-life environments in which humans are now operating. Developing these methods will provide a new perspective on fundamental questions of how the brain implements sustained attention in moving displays, and the way in which monitoring changes the encoding of information compared with active task involvement. These new methods may also provide avenues to optimise performance in high-risk monitoring environments.
The brain regions involved in maintaining attention over time has been studied using functional magnetic resonance imaging (fMRI), which measures changes in cerebral blood flow (
Other fMRI studies of vigilance have focused on the default mode network, composed of discrete areas in the lateral and medial parietal, medial prefrontal, and medial and lateral temporal cortices such as posterior cingulate cortex (PCC) and ventral anterior cingulate cortex (vACC), which is thought to be active during ‘resting state’ and less active during tasks (
Detecting changes in brain activation that correlate with lapses of attention can be particularly challenging with fMRI, given that it has poor temporal resolution. Electroencephalography (EEG), which records electrical activity at the scalp, has much better temporal resolution, and has been the other major approach for examining changes in brain activity during sustained attention tasks. Frequency band analyses have shown that low-frequency alpha (8–10.9 Hz) oscillations predict task workload and performance during monitoring of simulated air traffic (static) displays with rare targets, while frontal theta band (4–7.9 Hz) activity predicts task workload only in later stages of the experiment (
Understanding the neural basis of decreases in performance over time under vigilance conditions is not just theoretically important, it also has potential real-world applications. In particular, if we could identify a reliable neural signature of attentional lapses, then we could potentially intervene prior to any overt error. For example, with the development of autonomous vehicles, being able to detect when a driver is not engaged, combined with information about a potential threat, could allow emergency braking procedures to be initiated. Previous studies have used physiological measures such as pupil size (
In this study, we developed a new task, multiple-object monitoring (MOM), which includes key features of real-life situations confronting human operators in high-risk environments. These features include moving objects, varying levels of target frequency, and a requirement to detect and avoid collisions. A key feature of our MOM task is that it allows measurement of the specific decrements in performance during vigilance (sustaining attention in a situation where only infrequent responses are needed) separate from more general decreases in performance simply due to doing a task for an extended period. Surprisingly, this is not typically the case in vigilance tasks. We recorded neural data using the highly sensitive method of magnetoencephalography (
Participants completed the MOM task during which they monitored several dots moving on visible trajectories towards a centrally presented fixed object (
(
In the first block of trials (i.e., the first 110 s, excluding the two practice blocks), participants missed 29% of targets in the Active condition and 40% of targets in the Monitoring condition. However, note the number of targets in any single block is necessarily very low for the Monitoring (for a single block, there are 16 targets for Active but only two targets for Monitoring). The pattern becomes more robust over blocks, and
The percentage of miss trials (
We used multivariate pattern analysis (i.e., decoding) to extract two types of information from MEG data about each dot’s movement on the screen: information about the
With so much going on in the display at one time, we first needed to verify that we can successfully decode the major aspects of the moving stimuli, relative to chance. The full data figures and details are presented in Supplementary materials: We were able to decode both
As the behavioural results showed (
(
Two left graphs: Attended dots; Two right graphs: Unattended (‘distractor’) dots. (
Left and right panels show the results without (repeated from
(
In contrast, there was no sustained main effect of Target Frequency on the same
There was also no sustained main effect of the Time on Task on information about the
The same analysis for the representation of the task-relevant
Although eye-movements should not drive the classifiers due to our design, it is still important to verify that the results replicate when standard artefact removal is applied. We can also use eye-movement data as an additional measure, examining blinks, saccades and fixations for effects of our attention and vigilance manipulations.
First, to make sure that our neural decoding results replicate after eye-related artefact removal, we repeated our analyses on the data after eye-artefact removal, which provided analogous results to the original analysis (see the decoding results with and without artefact removal in
Second, we conducted a post hoc analysis to explore whether eye movement data showed the same patterns of vigilance decrements and therefore could explain our decoding results. We extracted the proportion of eye blinks, saccades, and fixations per trial as well as the duration of those fixations from the eye-tracking data for
Together, these results suggest that while vigilance conditions had little or no impact on coding of the
Using graph-theory-based univariate connectivity analysis, it has been shown that the connectivity between relevant sensory areas and ‘vigilance-related’ cognitive areas changes prior to lapses in attention (behavioural errors;
(
Results showed strong evidence (Bayes factor ANOVA, BF = 6.3e21) for higher informational connectivity for trials with Attended compared to Unattended dots, and moderate evidence for higher connectivity in Active compared to Monitoring conditions (Bayes factor ANOVA, BF = 3.4;
We also compared the connectivity for the
The results presented in
First, we evaluated the representation of the less relevant information – the
(
(
For the
We then repeated the same procedure on the representation of the most task-relevant
In principle, the average decoding levels could be composed of ‘all or none’ misses or graded drops in information, and it is possible that on some miss trials there is a good representation but the target is missed for other reasons (e.g., a response-level error). As neural data are noisy and multivariate decoding needs cross-validation across subsamples of the data, and because each trial, at each distance, can only be classified correctly or incorrectly by a two-way classifier, we tend not to compare the decoding accuracies in a trial-by-trial manner, but rather on average (
Please note that the results presented so far were from
Finally, we asked whether we could use this information to predict the behavioural outcome of each trial. To do so, we developed a new method that classified trials based on their behavioural outcomes (
(
Left column shows the result for the early and right shows the result for the late blocks. (
The prediction accuracy of behavioural outcome was above chance level (68% vs. 50%; BF >10) even when the dot had only been on the screen for 80 ms, which corresponds to our furthest distance #15 (1160ms prior to deflection point;
The prediction of behavioural outcome (
This study developed new methods to gain insights into how attention, the frequency of target events, and the time doing a task affect the representation of information in the brain. Our new MOM task evoked reliable specific vigilance decrements in both accuracy and RT in a situation that more closely resembles real-life modern tasks than classic vigilance tasks. Using the sensitive analysis method of MVPA, we showed that neural coding decreased for relevant task information more when targets were infrequent than frequent (at longer task durations), providing neural correlates of the behavioural vigilance decrements. We also developed a novel informational brain connectivity analysis, which showed that the correlation between information coding across peri-occipital and peri-frontal areas varied with different levels of attention, target frequency, and the time on the task. Finally, we utilised our recent error data analysis to predict forthcoming behavioural misses with high accuracy based on the neural data. In the following sections, we explain each of these findings in detail and compare them with relevant literature.
First, the MOM task includes key features of real-world monitoring situations that are not usually part of other vigilance tasks (e.g.,
Second, the high sensitivity of MVPA to extract information from neural signals allowed us to investigate the temporal dynamics of information processing along the time course of each trial. The manipulation of attention showed a strong overall effect with enhanced representation of both the less important
One explanation for the decrease in decoding accuracy for task-relevant information could be that when people monitor for rare targets, they process or encode the relevant sensory information
It is important to note that previous studies have tried other physiological/behavioural measures to determine participants’ vigilance or alertness, such as pupil size (
Third, our information-based brain connectivity method showed weaker connectivity between the peri-frontal attentional network and the peri-occipital visual areas of the brain in the unattended and monitoring conditions (
Our connectivity method follows the recent major recent shift in literature from univariate to multivariate informational connectivity analyses (
Fourth, building upon our recently developed method of error analysis (
Our error prediction results showed a large decline in the crucial task-relevant (i.e.,
The overall goal of this study was to understand how neural representation of dynamic displays was affected by attention and target frequency, and whether reliable changes in behaviour over time could be predicted on the basis of neural patterns. We observed that the neural representation of critically relevant information in the brain decreases over time, especially when targets are infrequent. This neural representation was particularly poor on trials where participants missed the target. We used this observation to predict behavioural outcome of individual trials and showed that we could accurately predict behavioural outcome more than a second before action was needed. These results provide new insights about how vigilance decrements impact information coding in the brain and propose an avenue for predicting behavioural errors using novel neuroimaging analysis techniques.
We tested 21 right-handed participants (10 male, 11 female, mean age = 23.4 years [SD = 4.7 years], all Macquarie University students) with normal or corrected to normal vision. The Human Research Ethics Committee of Macquarie University approved the experimental protocols and the participants gave informed consent before participating in the experiment. We reimbursed each participant AU$40 for their time completing the MEG experiment, which lasted for 2 hr including setup.
We recorded neural activity using a whole-head MEG system (KIT, Kanazawa, Japan) with 160 coaxial first-order gradiometers, at a sampling rate of 1000 Hz. We projected the visual stimuli onto a mirror at a distance of 113 cm above participants’ heads while they were in the MEG. An InFocus IN5108 LCD back projection system (InFocus, Portland, Oregon, USA), located outside the magnetically shielded room, presented the dynamically moving stimuli, controlled by a desktop computer (Windows 10; Core i5 CPU; 16 GB RAM; NVIDIA GeForce GTX 1060 6 GB Graphics Card) using MATLAB with Psychtoolbox 3.0 extension (
The task was to avoid collisions of relevant moving dots with the central object by pressing the space bar if the dot passed a deflection point in a visible predicted trajectory without changing direction to avoid the central object (see
The stimuli were moving dots in one of two colours that followed visible trajectories and covered a visual area of 3.8 × 5° of visual angle (dva;
Target dots deviated from the visible trajectory at the deflection point and continued moving towards the central object. The participant had to push the space bar to prevent a ‘collision’. If the response was made before the dot reached the centre of the object, the dot deflected, and this was counted as a ‘hit’. If the response came after this point, the dot continued straight, and this was counted as a ‘miss’, even if they pressed the button before the dot totally passed through central object.
The time from dot onset in the periphery to the point of deflection was 1226 ± 10 (mean ± SD) ms. Target (and distractor event) dots took 410 ± 10 (mean ± SD) ms to cross from the deflection point to the collision point. In total, each dot moved across the display for 2005 ± 12 (mean ± SD) ms before starting to fade away after either deflection or travel through the object. The time delay between the onsets of different dots (ISI) was 1660 ± 890 (mean ± SD) ms. There were 1920 dots presented in the whole experiment (~56 min). Each 110 s block contained 64 dots, 32 (50%) in red, and 32 (50%) in green, while the central static object and trajectories were presented in white on a black background.
There were two target frequency conditions. In ‘Monitoring’ blocks, target dots were ~6.2% of cued-colour dots (2 out of 32 dots). In ‘Active’ blocks, target dots were 50% of cued-colour dots (16 out of 32 dots). The same proportion of dots in the non-cued colour failed to deflect; these were distractors (see
The time between the appearance of target dots varied unpredictably, with distractors and correctly deflecting dots (events) intervening. In Monitoring blocks, there was an average time between targets of 57.88 (±36.03 SD) s. In Active blocks, there was an average time between targets of 7.20 (±6.36 SD) s.
Feedback: On target trials, if the participant pressed the space bar in time, this ‘hit’ was indicated by a specific tone and deflection of the target dot. There were three types of potential false alarm, all indicated by an error tone and no change in the trajectory of the dot. These were if the participant responded: (1) too early, while the dot was still on the trajectory; (2) when the dot was not a target and had been deflected automatically (‘event’ in
MEG data were filtered online using band-pass filters in the range of 0.03–200 Hz and notch-filtered at 50 Hz. We then imported the data into MATLAB and epoched them from −100 to 3000 ms relative to the trial onset time. We performed all the analyses once without and once with standard eye-artefact removal (post hoc, explained below) to see if eye movements and blinks had a significant impact on our results and interpretations. Finally, we down-sampled the data to 200 Hz for the decoding of our two key measures:
There are two practical reasons that the effects of eye-related artefacts (e.g. eye-blinks, saccades, etc.) should not be dominantly picked up by our classification procedure. First, the decoding analysis is time-resolved and computed in small time windows (5 ms and 80 ms, for
We measured the information contained in the multivariate (multi-sensor) patterns of MEG data by training a linear discriminant analysis (LDA) classifier using a set of training trials from two categories (e.g., for the
We decoded two major task features from the neural data: (1) the
We decoded left vs. right
For the decoding of
Note that the ‘
To evaluate possible modulations of brain connectivity between the attentional networks of the frontal brain and the occipital visual areas, we used a simplified version of our recently developed RSA-based informational connectivity analysis (
Connectivity was calculated separately for
Next, we asked what information was coded in the brain when participants missed targets. To study information coding in the brain on
For these error data analyses, the number of folds for cross-validation were determined based on the proportion of
We developed a new method to predict, based on the most task-relevant information in the neural signal, whether or not a participant would press the button for a target dot in time to deflect it on a particular trial. This method includes three steps, with the third step being slightly different for the left-out testing participant vs. the other 20 participants. First, for every participant, we trained 105 classifiers using ~80% of
To give more detail on the second and third steps, when the validation/testing dots were at distance #15, we averaged the accuracies of the 14 classifiers trained to classify dots at distance #15 from all other distances. Accordingly, when the dot reached distance #14, we also included and averaged accuracies from classifiers which were trained to classify distance #14 from all other distances leading to 27 classifier accuracies. Therefore, by the time the dot reached distance #1, we had 105 classifier accuracies to average and predict the behavioural outcome of the trial. Every classifier’s accuracies were either 1 or 0 corresponding to correct or incorrect classification of dot’s distance, respectively. Note that accumulation of classifiers’ accuracies, as compared to using classifier accuracy on every distance independently, provides a more robust and smoother classification measure for deciding on the label of the trials. The validation set, which was different from the testing set, allowed us to set the decision threshold based on the validation data within each subject and from the 20 participants and finally test our prediction classifiers on a separate testing set from the 21st individual participant, iteratively. The optimal threshold was 1.54 (±0.2) times the SD below the decoding accuracy on the validation set across participants.
To determine the evidence for the null and the alternative hypotheses, we used Bayes analyses as implemented by Krekelberg (
Specifically, for the behavioural data, we asked whether there was a difference between Active and Monitoring conditions in terms of miss rates and RTs. Accordingly, we calculated the Bayes factor as the probability of the data under alternative (i.e., difference) relative to the null (i.e., no difference) hypothesis in each block separately. In the decoding, we repeated the same procedure to evaluate the evidence for the alternative hypothesis of a difference between decoding accuracies across conditions (e.g., Active vs. Monitoring and Attended vs. Unattended) vs. the null hypothesis of no difference between them, at every time point/distance. To evaluate evidence for the alternative of above-chance decoding accuracy vs. the null hypothesis of no difference from chance, we calculated the Bayes factor between the distribution of actual accuracies obtained and a set of 1000 random accuracies obtained by randomising the class labels across the same pair of conditions (null distribution) at every time point/distance.
To evaluate the evidence for the alternative of main effects of different factors (Attention, Target Frequency, and Time on Task) in decoding, we used Bayes factor ANOVA (
The priors for all Bayes factor analyses were determined based on Jeffrey-Zellner-Siow priors (
This work was funded by an Australian Research Council (ARC) Discovery Project grant to ANR and AW (DP170101780). AW was supported by an ARC Future Fellowship (FT170100105) and MRC intramural funding SUAG/052/G101400. H K-R was supported by Newton International Fellowship from Royal Society (NIF\R1\192608). We thank Denise Moerel, Mark Wiggins, Jeremy Wolfe, and William Helton for contributions to an earlier design of the MOM task.
No competing interests declared
Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Methodology, Project administration, Writing - review and editing
Conceptualization, Formal analysis, Supervision, Funding acquisition, Methodology, Project administration, Writing - review and editing
Human subjects: The Human Research Ethics Committee of Macquarie University approved the experimental protocols and the participants gave informed consent before participating in the experiment. The approval identifier is 52020297914411.
We have shared the Magnetoencephalography data (i.e. time series) as well as behavioral data in Matlab '.mat' format on the Open Science Framework website at
The following dataset was generated:
Our first analysis was to verify that our analyses could decode the important aspects of the display, relative to chance, given the overlapping moving stimuli. Here, we give the detailed results of this analysis.
We started with the information about the
All conditions were decodable above chance until at least 385 ms post-stimulus onset (BF > 3;
The most task-relevant feature of the motion is the distance between the moving dot and the central object, with the deflection point of the trajectories being the key decision point. We therefore tested for decoding of
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
In our modern work environment there are many situations where humans have to pay sustained attention in order to catch infrequent computer errors, such as while monitoring railway systems. Combining a novel multiple-object monitoring task with computationally sophisticated analyses of human magnetoencephalography (MEG) data, Karimi-Rouzbahani and colleagues find that increasing the rarity of targets leads to a worse neural representation of a crucial target feature (distance to a potential collision). They were also able to predict whether participants would catch or miss a target based on their neural data, which may prove a first step towards developing methods to pre-empt such potentially disastrous errors.
Thank you for submitting your article "Neural signatures of vigilance decrements predict behavioural errors before they occur" for consideration by
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
As the editors have judged that your manuscript is of interest, but as described below that additional analyses are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at
Summary:
Karimi-Rouzbahani and colleagues investigate vigilance and sustained monitoring, using a multiple-object monitoring task in combination with magnetoencephalography (MEG) recordings in humans to investigate the neural coding and decoding-based connectivity of vigilance decrements. Using computationally sophisticated multivariate analyses of the MEG data, they found that increasing the rarity of targets led to weaker decoding accuracy for the crucial feature (distance to an object), and weaker decoding was also found for misses compared to correct responses.
While the reviewers agreed the study was interesting, they also had concerns about the approach and the interpretation of the results.
Essential revisions:
1. The introduction makes it clear that the authors acknowledge that there may be multiple sources of interference contributing to declining vigilance over time: the encoding of sensory information, appropriate responses to the stimuli, or a combination of both. In the introduction, it would help if the authors review how infrequent targets affect response patterns. In addition, it would help if the theoretical approach and assumptions of the authors were explicitly stated. For instance, the a priori assumptions surrounding the connectivity analysis should be acknowledged and discussed in the interpretation of the pattern of results (e.g., p. 32, line 658). Specifically, the focus on connectivity between frontal and occipital areas seems to assume the effects are related to sensory processing alone, but this does not preclude other influences. For instance, effects could also occur on response patterns. These considerations should be added as caveats to the interpretation.
2. It is not clear what role eye fixations play here. Participants could freely scan the display, so the retinotopic representations would change depending on where the participants fixate, but at the same time the authors claim that eye position did not matter. Materials and methods, Page 11: The authors state that "We did not perform eye-blink artefact removal because it has been shown that blink artefacts are successfully ignored by multivariate classifiers as long as they are not systematically different between decoded conditions (Grootswagers et al., 2017)." This is not a sufficiently convincing argument. Firstly, the cited paper makes a theoretical argument rather than showing this empirically. Secondly, even if this were true, the frequency of eye-related artefacts seems to be of crucial importance for a paradigm that involves moving stimuli (and no fixation). There could indeed be systematic differences between conditions that are then picked up by the classifier (i.e. if more eye-blinks are related to tiredness and in turn decreased vigilance). The authors should show that their results replicate if standard artefact removal is performed on the data.
Relatedly, on page 16 the authors claim that "If the prediction from the MEG decoding was stronger than that of the eye tracking, it would mean that there was information in the neural signal over and above any artefact associated with eye movement." This statement is problematic: Firstly, such a result might only mean that prediction from MEG decoding is stronger than decoding from eye-movements, but not relate to "artefacts" in general, to which blinks would also count. Secondly, given that the signal underlying both analyses is entirely different (and the number of features), it is not valid to directly compare the results between these analyses. More detailed analyses of fixations and fixation duration on targets and distractors might indeed be strongly related to behaviour. What is decodable at a given time might just be driven by what participants are looking at.
3. One key finding was that while classifying the direction of the dots was modulated by attention, it was insensitive to many features that were captured by a classifier trained to decode the distance from the deflection. This is surprising since both are spatial features that seem hard to separate. In addition, the procedures to decode direction vs distance were very different. Do these differences still hold if the procedure used to train the two classifiers is more analogous or matched?
4. The distance classifier was trained using only correct trials. Then in the testing stage, it was generalized to either correct or miss trials. While there is a rationale for using correct trials only, could the decoding of error prediction be an artifact of the training sample, reflecting the fact that misses were not included in the training set?
5. By accumulating classifiers across time, it looks like classifier prediction improves closer to deflection. However, this could also be due to the fact that the total amount of information provided to the classifier increased. Is there a way to control for the total amount of information at different timepoints (e.g., by using a trailing window lag rather than accumulation), or contrast the classifier that derives from accumulating information with the classifier trained moment-by-moment?
6. Predicting miss trials: The implicit assumption here is that there is "less representation" for miss trials compared to correct trials (e.g., of distance to object). But even for miss trials, the representation is significantly above chance. However, maybe the lower accuracy for the miss trials resulted from on average more trials in which the target was not represented at all rather than a weaker representation across all trials. This would call into questions the interpretation of a decline in coding. In other words, on a single trial, a representation might only be present (but could result in a miss for other reasons) or not present (which would be the case for many miss trials), and the lower averages for misses would then be the result of more trials in which the information was completely absent.
It could be that the results of the subsequent analysis (predicting misses and correct responses before they occur) are in conflict with this more pessimistic interpretation. If we understand this correctly, here the classifier predicts Distance to Object for each individual trial, and Figure 6B shows that while there is a clear difference between the correct and miss trials, the latter can still be predicted above chance level but never exceed the threshold? If this is true for all single trials, this would indeed speak for a weak but "unused" representation on miss trials. But for this the authors need to show how many of the miss trials per participant had a chance-level accuracy (i.e. might be truly unrepresented), and how many were above chance but did not exceed the threshold (i.e. might have been "less represented").
7. The relationship between the vigilance decrement and error prediction. Is vigilance decrement driving the error prediction? That is, if errors increase later on, and the signal goes down, then maybe the classifier is worse. Alternatively, maybe the classifier predictions do not necessarily monotonically decrease throughout the experiment. Is the classifier equally successful at predicting errors early and late?
8. When decoding distance, active decoding declines from early to late, even though performance does not decline (or even slightly improves from early to late). This discrepancy seems hard to explain. Is this decline in classification driven by differences in the total signal from early to late?
9. Classifier performance was extremely high almost immediately after trial onset. Does the classifier perform at chance before the trial onset, or does this reflect sustained but not stimulus-specific information?
10. The connectivity analysis appears to be just a correlation of decoding results between two regions of interest. This means, if one "region" allows for decoding the distance to the object, the other one does too. However, this alone does not equal connectivity. It could simply mean that patterns across the entire brain allow for decoding the same information. For example, it would not be surprising to find that both ROIs correlate more strongly for correct trials (i.e. the brain has obviously represented the relevant information) than for errors (i.e. the brain has failed to represent the information), without this necessarily being related to connectivity at all. The more parsimonious interpretation here is that information might have been represented across all channels at this time. The authors show no evidence that only these two (arbitrarily selected) "regions" encode the information while other do not. To show evidence for meaningful connectivity, (a) the spread of information should be limited to small sub-regions, and (b) the decoding results in one "region" should predict the results in another region in time (as for DCM).
11. The display of the results is very dense, and it not always clear whether decoding for a specific variable was above chance or not. The authors often focused on relative differences, making it difficult to fully understand the meaning of the full pattern of results. The Bayes-factor plots in the decoding results figures are so cramped that it is very difficult to actually see the individual dots and to unpack all of this (e.g., Figure 3). Could this complexity be somehow reduced, maybe by dividing the panels into separate figures? The two top panels in Figure 3B should also include the chance level as in A. It looks like the accuracy is very low for unattended trials, which is only true in comparison to attended trials, but (as also shown in Supplementary Figure 1) it was clearly also encoded in unattended trials, which is very important for interpreting the results.
12. While this is methodologically interesting work, there is no convincing case made for what exactly the contribution of this study is for theories of vigilance. It seems that the findings can be reduced to that a lack of decodability of relevant target features from brain activity predicts that participants will miss the target. This alone, however, does not seem to be very novel. Even if the issues above are addressed, the study only demonstrates that with less attention to the target, there is less evidence of representations of the relevant features of targets in the brain. The authors also find the expected decrements for rare targets and when participants do not actively monitor the targets. How do these findings contribute to "theories of vigilance", as claimed by the authors?
Essential revisions:
1. The introduction makes it clear that the authors acknowledge that there may be multiple sources of interference contributing to declining vigilance over time: the encoding of sensory information, appropriate responses to the stimuli, or a combination of both. In the introduction, it would help if the authors review how infrequent targets affect response patterns.
We added the relevant information about response patterns to the Introduction as below:
“To date, most vigilance and rare target studies have used simple displays with static stimuli. […] Overall, vigilance decrements in terms of poorer performance can be seen in both accuracy and in reaction times, depending on the task.”
In addition, it would help if the theoretical approach and assumptions of the authors were explicitly stated. For instance, the a priori assumptions surrounding the connectivity analysis should be acknowledged and discussed in the interpretation of the pattern of results (e.g., p. 32, line 658). Specifically, the focus on connectivity between frontal and occipital areas seems to assume the effects are related to sensory processing alone, but this does not preclude other influences. For instance, effects could also occur on response patterns. These considerations should be added as caveats to the interpretation.
We have now carefully reviewed the manuscript to be sure our assumptions and approach for the connectivity analyses are explicit. We have added the suggested material to the interpretation of the pattern of results, and acknowledge the potential for other influences on the connectivity results as caveats to our interpretation.
We now limit our discussion of the connectivity results as relevant to evaluating sensory aspects of information encoding (in the Materials and methods section) as below:
“There are a few considerations in the implementation and interpretation of our connectivity analysis. First, it reflects the similarity of the way a pair of brain areas encode “distance” information during the whole trial. This means that we could not use the component of time in the evaluation of our connectivity as we have implemented elsewhere (Karimi-Rouzbahani et al., 2019; Karimi-Rouzbahani et al., 2020). Second, rather than a simple correlation of magnitudes of decoding accuracy between two regions of interest, our connectivity measure reflects a correlation of the patterns of decoding accuracies across conditions (i.e., distances here). Finally, our connectivity analysis evaluates sensory information encoding, rather than other aspects of cognitive or motor information encoding, which might have also been affected by our experimental manipulations.”
We now provide the rationale and our predictions about the impact of visual and auditory attention on our connectivity metric (in the Results section) based on the literature, as below.
“In line with attentional effects on sensory perception, we predicted that connectivity between the frontal attentional and sensory networks should be lower when not attending (vs. attending; Goddard et al., 2019). Behavioural errors were also previously predicted by reduced connection between sensory and ‘vigilance-related’ frontal brain areas (Ekman et al., 2012; Sadaghiani et al., 2015). Therefore, we predicted a decline in connectivity when targets were lower in frequency, and with increased time on task, as these led to increased errors in behaviour, specifically under vigilance conditions in our task (i.e., late blocks in Monitoring vs. late blocks in Active; Figure 2).”
We have toned down our conclusions (a) and added the possibility that other factors could also contribute to our vigilance decrement effects in the Discussion, as below (b).
a) “One explanation for the decrease in decoding accuracy for task-relevant information could be that when people monitor for rare targets, they process or encode the relevant sensory information less effectively as the time passes, relative to conditions in which they are actively engaged in completing the task.”
b) “Apart from sensory information coding and sensory-based informational connectivity, which were evaluated here and provide plausible neural correlates for the vigilance decrement, there may be other correlates we have not addressed. Effects on response-level selection, for example, independently or in conjunction with sensory information coding, could also affect performance under vigilance conditions, and need further research.”
2. It is not clear what role eye fixations play here. Participants could freely scan the display, so the retinotopic representations would change depending on where the participants fixate, but at the same time the authors claim that eye position did not matter.
We did not mean to claim that eye position doesn’t matter at all, but rather that our design ensures minimal effect of eye-related artefacts on the classifiers. We have carefully revised the manuscript to ensure this is clear (detailed response and additional analyses below).
Materials and methods, Page 11: The authors state that "We did not perform eye-blink artefact removal because it has been shown that blink artefacts are successfully ignored by multivariate classifiers as long as they are not systematically different between decoded conditions (Grootswagers et al., 2017)." This is not a sufficiently convincing argument. Firstly, the cited paper makes a theoretical argument rather than showing this empirically. Secondly, even if this were true, the frequency of eye-related artefacts seems to be of crucial importance for a paradigm that involves moving stimuli (and no fixation). There could indeed be systematic differences between conditions that are then picked up by the classifier (i.e. if more eye-blinks are related to tiredness and in turn decreased vigilance). The authors should show that their results replicate if standard artefact removal is performed on the data.
We appreciate the point here. There are theoretical and practical arguments that eye-related artefacts should not drive our effects, but to be sure we also now present our results with standard artefact removal as well.
Overall increases in eye-related artefacts (such as blinks) over time-on-task would not be an issue, as our design relies on comparisons between Active and Monitoring, and so any general effects should have negligible impact. But for these comparisons, there may indeed be differences in the number of eye blinks – and in fact, these conditions involve different levels of attentional recruitment, which has previously shown to correlate with the frequency of eye blinks (Nakano et al., 2013). Thus, we certainly do not want to claim that eye-related artefacts do not matter at all, but, importantly, there are two practical reasons that the effects of eye blinks should not be dominantly picked up by our classification procedure. First, the decoding analysis is time-resolved and computed in small time windows (5 ms and 80 ms, for direction and distance information decoding, respectively). For eye blink patterns to be picked up by the classifier, they would need to occur at consistent time points across trials of the same condition, and not in the other condition, which seems implausible. Second, our MEG helmet does not have the very frontal sensors where eye-related artefacts most strongly affect neural activations (Mognon et al., 2011), but we appreciate that this does not rule out their presence altogether.
To check empirically that eye-related artefacts were not driving our effects, we re-ran our analyses with standard artefact removal as requested. We see the same pattern of results as before, for both the key task-relevant feature of distance-to-object and the less relevant feature of direction of approach. We present the full comparative analysis in Figure 3—figure supplement 2. In the paper we now state that the results replicate with artefact removal and present the additional eye-movement-corrected results in the supplementary materials.
“…,we also did a post-hoc analysis in which we removed these using “runica” Independent Components Analysis (ICA) algorithm as implemented by EEGLAB. We used the ADJUST plugin (Mognon et al., 2011) of EEGLAB to decide which ICA components were related to eye artefacts for removal. This toolbox extracts spatiotemporal features from components to quantitatively measure if a component is related to eye movements or blinks. For all subjects except two, we identified only 1 component which were attributed to eye artefacts (i.e., located frontally and surpassing the ADJUST’s threshold) which we removed. For the two other participants, we identified and removed two components with these characteristics.”
Figure 3—figure supplement 2B shows the decoding results for the key task-relevant feature of distance-to-object without and with eye-related artefact removal, in the left and right panels, respectively. The main effects of attention and time on the task and the key interaction between target frequency and time on the task remain after eye artefact removal, replicating our initial pattern of results.
Figure 3—figure supplement 2A shows the decoding results for the direction of approach information without and with eye artefact removal. The results again replicate those of the original analysis: as before there is a main effect of Attention but no main effect of Time on Task or Target Frequency, and no interaction.
We also checked to see if our trial outcome prediction (Figure 6D) could be driven by eye artefacts by repeating our prediction procedure using the eye-movement corrected MEG data. The results (
The results are for the left-out participant (averaged over all participants) using the threshold obtained from all the other participants as function of distance/time from the deflection point. Figure 6D shows the result without eye artefact removal and
In the Materials and methods section, we removed the sentence “We did not perform eye-blink artefact removal because it has been shown that blink artefacts are successfully ignored by multivariate classifiers as long as they are not systematically different between decoded conditions (Grootswagers et al., 2017).”
We also added the following explanations and the figures to the manuscript in the Results section to cover this point.
“Although eye-movements should not drive the classifiers due to our design (see Materials and methods), it is still important to verify that the results replicate when standard artefact removal is applied. We can also use eye-movement data as an additional measure, examining blinks, saccades and fixations for effects of our attention and vigilance manipulations.
First, to make sure that our neural decoding results replicate after eye-related artefact removal, we repeated our analyses on the data after eye-artefact removal (see Materials and methods), which provided analogous results to the original analysis (see the decoding results without and with artefact removal in Figure 3—figure supplement 2). Specifically, for our crucial distance to object data, the main effects of Attention and Time on Task and the key interaction between Target Frequency and Time on Task remain after eye-artefact removal, replicating our initial pattern of results.
Second, we conducted a post-hoc analysis to explore whether eye movement data showed the same patterns of vigilance decrements and therefore could explain our decoding results. We extracted the proportion of eye blinks, saccades and fixations per trial as well as the duration of those fixations from the eye-tracking data for correct trials (-100 to 1400 ms aligned to the stimulus onset time), and statistically compared them across our critical conditions (Figure 3—figure supplement 3). We saw strong evidence (BF = 4.8e8) for a difference in the number of eye blinks between attention conditions: There were more eye blinks for the Unattended (distractor) than Attended (potentially targets) colour dots. We also observed moderate evidence (BF = 3.4) for difference between the number of fixations, with more fixations in Unattended vs. Attended conditions. These suggest that there are systematic differences in the number of eye blinks and fixations due to our attentional manipulation, consistent with previous observations showing that the frequency of eye blinks can be affected by the level of attentional recruitment (Nakano et al. 2013). However, there was either insufficient evidence (0.3 < BF < 3) or moderate or strong evidence for no differences (0.1 < BF < 0.3 and BF < 0.3, respectively) between the number of eye blinks and saccades across our Active, Monitoring, Early and Late blocks, where we observed our ‘vigilance decrement’ effects in decoding. Therefore, this suggests that the main vigilance decrement effects in decoding, which were evident as an interaction between Target frequency (Active vs. Monitoring) and Time on the task (Early vs. Late) (Figure 3), were not driven by eye movements.”
Relatedly, on page 16 the authors claim that "If the prediction from the MEG decoding was stronger than that of the eye tracking, it would mean that there was information in the neural signal over and above any artefact associated with eye movement." This statement is problematic: Firstly, such a result might only mean that prediction from MEG decoding is stronger than decoding from eye-movements, but not relate to "artefacts" in general, to which blinks would also count. Secondly, given that the signal underlying both analyses is entirely different (and the number of features), it is not valid to directly compare the results between these analyses. More detailed analyses of fixations and fixation duration on targets and distractors might indeed be strongly related to behaviour. What is decodable at a given time might just be driven by what participants are looking at.
We take the point on the issues with this comparison, and so have removed the analysis from the manuscript, replacing it instead with more detailed analyses of the eye movement data:
We extracted the proportion of eye blinks, saccades and fixations per trial as well as the duration of those fixations from the eye-tracking data for correct trials (-100 to 1400 ms aligned to the stimulus onset time), and statistically compared them across our critical conditions as Figure 3—figure supplement 3. We saw strong evidence (BF=4.8e8) for a difference in the number of eye blinks between attention conditions: There were more eye blinks for Unattended (distractor) than Attended (potentially targets) color dots. We also observed moderate evidence (BF=3.4) for difference between the number of fixations, with more fixations in Unattended vs Attended conditions. These suggest that there are systematic differences in the number of eye blinks and fixations due to our attentional manipulation, consistent with Nakano et al., (2013). However, we observed either insufficient evidence (0.3<BF<3) or moderate to strong evidence for no difference (0.1<BF<0.3 and BF<0.3, respectively) between the number of eye blinks and saccades across our Active, Monitoring, Early and Late blocks, where we observed our ‘vigilance decrement’ effects in decoding. Consistent with the replication of the results with artefact removal presented above, this suggests that the main vigilance decrement effects in decoding, which were evident as an interaction between Target frequency (Active vs. Monitoring) and Time on the task (Early vs. Late) (Figure 3), were not driven by eye movements.
This information has also been added to the supplementary materials (Figure 3—figure supplement 3) and referred to in the manuscript (text quoted under previous bullet point).
3. One key finding was that while classifying the direction of the dots was modulated by attention, it was insensitive to many features that were captured by a classifier trained to decode the distance from the deflection. This is surprising since both are spatial features that seem hard to separate.
Yes, we see vigilance decrement effects for the distance information but not the direction of approach. Although they both rely on similar features of the visual display, the direction information classifier is likely to be driven primarily by the large visual difference between the categories (approach from the left vs approach from the right). In the key distance measure, we collapse across left and right approaching dots, which means the classifier has to use much more subtle differences (and is therefore more likely to be sensitive to other modulations). Moreover, the two types of information also differ in their importance to the task: Only the distance information is relevant to deciding whether an item is a target.
We have added to the Discussion noting this point.
“The less relevant information about direction of approach was modulated by attention, but its representation was not detectably affected by target frequency and time on task, and was noisier, but not noticeably attenuated, on error trials. The relative stability of these representations might reflect the large visual difference between stimuli approaching from the top left vs bottom right of the screen. In contrast, the task-relevant information of distance to object was affected by attention, target frequency and time on task and was dramatically attenuated on errors. The difference might reflect the fact that only the distance information is relevant to deciding whether an item is a target, and/or the classifier having to rely on much more subtle differences to distinguish the distance categories, which collapsed over stimuli appearing on the left and right sides of the display, removing the major visual signal.”
In addition, the procedures to decode direction vs distance were very different. Do these differences still hold if the procedure used to train the two classifiers is more analogous or matched?
In terms of technical differences in the decoding procedure between distance and direction information, we cannot directly compare the two types of information on an analogous platform because they have to be defined differently. There are a different number of classes in decoding for the two types of information: only two classes for the
We have added the following paragraph to the Materials and methods section to clarify the point:
“Note that the ‘direction of approach’ and ‘distance to object’ information cannot be directly compared on an analogous platform as the two types of information are defined differently. There are also different number of classes in decoding for the two types of information: only two classes for the direction information (left vs. right), compared to the 15 classes for the distance information (15 distances).”
4. The distance classifier was trained using only correct trials. Then in the testing stage, it was generalized to either correct or miss trials. While there is a rationale for using correct trials only, could the decoding of error prediction be an artifact of the training sample, reflecting the fact that misses were not included in the training set?
No, we do not think there is any way it could be an artefact. Our hypothesis is that correct trials contain information which is missing from miss trials. In other words, miss trials are in some way different from correct trials. Thus, it is crucial to use only correct trials in the training set. Please note that our approach is different from most conventional studies in which people directly discriminate correct and miss trials by feeding both types of trials to classifiers in the training phase and test the classifiers on the left-out correct and miss trials (i.e., without any feature extraction; as in Bode and Stahl, 2014). While this standard approach might lead to a higher classification performance, we developed our new approach for two main reasons. First, in the real world and many vigilance studies, there is usually not enough miss data to train classifiers. Second, we wanted to directly test whether the neural representations of correct trials contain some information which is (on average) less observable in miss trials. The result of conventional methods can reflect general differences between correct and miss trials (i.e., general level of attention, not time-locked to stimulus presentation), but cannot inform us about whether the difference reflects changes in information coding in the correct vs. miss trials; our approach allows this more specific inference.
In our approach, we trained our classifiers on correct trials and tested them on both correct and miss trials. Crucially, we tested the trained classifiers only on unseen data for both correct and miss trials. Specifically, when testing the classifiers, we used only the correct trials which were not used in the training phase. Therefore, there is no artefactual reason that the testing trials should be more similar to the training-phase trials for the correct compared to miss trials; the decoding prediction works because the correct testing trials have more similar neural representations to the correct training trials than the miss testing trials do.
We have added an explanation of the difference between approaches to the manuscript to ensure this point is clearer to the reader.
“Our method is different from the conventional method of error prediction, in which people directly discriminate correct and miss trials by feeding both types of trials to classifiers in the training phase and testing the classifiers on the left-out correct and miss trials (e.g., Bode and Stahl, 2014). Our method only uses correct trials for training, which makes its implementation plausible for real-world situations since we usually have plenty of correct trials and only few miss trials (i.e., cases when the railway controller diverts the trains correctly vs. misses and a collision happens). Moreover, it allows us to directly test whether the neural representations of correct trials contain information which is (on average) less observable in miss trials. We statistically compared the two types of trials and showed a large advantage in the level of information contained at individual-trial-level in correct vs. miss trials.”
5. By accumulating classifiers across time, it looks like classifier prediction improves closer to deflection. However, this could also be due to the fact that the total amount of information provided to the classifier increased. Is there a way to control for the total amount of information at different timepoints (e.g., by using a trailing window lag rather than accumulation), or contrast the classifier that derives from accumulating information with the classifier trained moment-by-moment?
Although it is likely that some of the increase in information reflects increased attention as the dot approaches the object, we think primarily that yes, the improved prediction power closer to the central object is likely to be due to accumulation of information (Figure 6D) and it will decline if we use a subsample of the accumulated information. We took this approach as the main purpose of our prediction analysis was to predict the outcome of the trial with maximal accuracy. We added the following sentence to the manuscript to clarify the point.
“In this analysis, the goal was to maximise the accuracy of predicting behaviour. For that purpose, we accumulated classification accuracies along the distances. Moreover, as each classifier performs a binary classification for each testing dot at each distance, the accumulation of classification accuracies also avoided the spurious classification accuracies to drive the decision, providing smooth “accumulated” accuracies for predicting the behaviour.”
6. Predicting miss trials: The implicit assumption here is that there is "less representation" for miss trials compared to correct trials (e.g., of distance to object). But even for miss trials, the representation is significantly above chance. However, maybe the lower accuracy for the miss trials resulted from on average more trials in which the target was not represented at all rather than a weaker representation across all trials. This would call into questions the interpretation of a decline in coding. In other words, on a single trial, a representation might only be present (but could result in a miss for other reasons) or not present (which would be the case for many miss trials), and the lower averages for misses would then be the result of more trials in which the information was completely absent.
It could be that the results of the subsequent analysis (predicting misses and correct responses before they occur) are in conflict with this more pessimistic interpretation. If we understand this correctly, here the classifier predicts Distance to Object for each individual trial, and Figure 6B shows that while there is a clear difference between the correct and miss trials, the latter can still be predicted above chance level but never exceed the threshold? If this is true for all single trials, this would indeed speak for a weak but "unused" representation on miss trials. But for this the authors need to show how many of the miss trials per participant had a chance-level accuracy (i.e. might be truly unrepresented), and how many were above chance but did not exceed the threshold (i.e. might have been "less represented").
This is a really good point. Yes, in principle, the average decoding levels could be composed of ‘all or none’ misses or graded drops in information, and it is possible that on some miss trials there is a good representation but the target is missed for other reasons (e.g., a response-level error). As neural data are noisy and multivariate decoding needs cross-validation across sub samples of the data, and because each trial, at each distance, can only be classified correctly or incorrectly by a two-way classifier, we tend not to compare the decoding accuracies in a trial-by-trial manner, but rather on average (Grootswagers et al., 2017). However, if we look at an individual dataset and examine all the miss trials (averaged over the 15 distances and cross-validation runs) in our distance-to-object decoding, we can get some insights into the underlying distributions.
We show the distribution of individual trial decoding accuracies for all participants on correct (Figure 5—figure supplement 1A) and miss (Figure 5—figure supplement 1B) trials. The vertical axis shows the number of trials in each accuracy bin of the histogram and the horizontal axis shows the decoding accuracy for each trial obtained by averaging its decoding accuracies over cross-validation folds (i.e., done by subsampling the correct trials into train and test sets and repeating the procedure until all correct trials are used once as training data and once as testing data) and distances. We calculated the percentage of miss trials for which there was strong evidence (BF>10) for above-chance decoding accuracies. To do this, we generated a null distribution with 100*N trials, where we produced 1000 decoding accuracies for each trial by randomizing the labels of distances for that trial. We used the same procedure for Bayes analyses as detailed in the manuscript.
The histograms of individual miss trials suggest a single distribution centred around chance decoding or slightly above (Figure 5—figure supplement 1B). This means that on an individual miss trial, there may be higher or lower decoding, but it is nowhere near the consistent high decoding levels we see for correct trials (Figure 5—figure supplement 1A). This seems consistent with an interpretation that on (most) miss trials, information is less present than on correct trials. Presumably it is this difference that allows our second level classifier to successfully predict the behavioural outcome on >80% of trials.
In contrast, for the correct trials, all trials (100%) for all subjects showed above-chance (>50%) decoding accuracy, with average accuracies around 80%. This suggest that as opposed to missed trials, in which some trials showed some distance information and some did not, on correct trials, all trials reflect the task-related information.
In order to quantify the overlap between correct and miss trials in individual trial level (as opposed to group-level Bayes factor analysis in the manuscript (Figure 5)), we calculated the Cohen’s d (Cohen, 1969) between the two distributions. As the results show (Figure 5—figure supplement 2C), there is a large difference (d >2) between the two distributions for every participant and condition. D values were mostly higher than 3 which corresponds to less than 7% overlap between decoding accuracies obtained for the correct and miss trials.
Overall, this additional analysis demonstrates that although the miss trials vary somewhat in levels of information (as measured by decoding), with some trials representing the distance information while others do not represent the distance information at all, very few miss trials are as informative as the least informative correct trials (the distributions overlap by less than ~7%). The miss trials with high decoding are presumably those on which our second level classifier makes the wrong prediction. We have revised the description in the manuscript to make this clearer and added the following paragraph and analyses to the manuscript.
“In principle, the average decoding levels could be composed of ‘all or none’ misses or graded drops in information, and it is possible that on some miss trials there is a good representation but the target is missed for other reasons (e.g., a response-level error). As neural data are noisy and multivariate decoding needs cross-validation across sub samples of the data, and because each trial, at each distance, can only be classified correctly or incorrectly by a two-way classifier, we tend not to compare the decoding accuracies in a trial-by-trial manner, but rather on average (Grootswagers et al., 2017). However, if we look at an individual dataset and examine all the miss trials (averaged over the 15 distances and cross-validation runs) in our distance-to-object decoding, we can get some insights into the underlying distributions (Figure 5—figure supplement 1). Results showed that, for all participants, the distribution of classifier accuracies for both correct and miss trials followed approximate normal distributions. However while the distribution of decoding accuracies for correct trials was centred around 80%, the decoding accuracies for individual miss trials were centred around chance-level. We evaluated the difference in the distribution of classification accuracies between the two types of trials using Cohen’s d. Cohen’s d was approximately 3 or higher for all participants and conditions, indicating a large (d > 2; Cohen, 1969) difference between the distribution of correct and miss trials. Therefore, although the miss trials vary somewhat in levels of information, very few (< 7%) miss trials are as informative as the least informative correct trials. These results are consistent with the interpretation that there was less effective representation of the crucial information about the distance from the object preceding a behavioural miss.”
7. The relationship between the vigilance decrement and error prediction. Is vigilance decrement driving the error prediction? That is, if errors increase later on, and the signal goes down, then maybe the classifier is worse. Alternatively, maybe the classifier predictions do not necessarily monotonically decrease throughout the experiment. Is the classifier equally successful at predicting errors early and late?
Thanks for the nice question. Our error prediction results initially were obtained from the whole dataset, including all blocks of trials. To answer the reviewer’s question, we now split the blocks into the first 5 (early) and the last 5 (late) blocks and repeated the error prediction procedure on the five early and late blocks separately. To remove the potential confound of the number of trials, we equalised the number of trials across the early and late time windows. As decoding of distances decreased along the time course of the experiment on correct trials (Figure 3), we would predict that there should be less difference in decoding of correct and miss trials in the later vs earlier blocks. The new analysis bears this out: Prediction accuracy for the trial outcome (correct vs miss) declined in later stages of the experiment (moderate to strong evidence (BF>3) for higher predictability for the trial outcome in early vs. late blocks of the experiment). Importantly, even with the decline in predicting accuracy, it is still possible to predict the behavioural outcome in the late blocks with well above-chance accuracy.
We have added these results to the supplementary material of the paper (Figure 6—figure supplement 1).
“The prediction of behavioural outcome (Figure 6) was performed using the data from the whole dataset. However, it is possible that the prediction would not be as accurate in later stages of the experiment (compared to the earlier stages) as the decoding performance of the distance information declined in general in later stages (Figure 3B). To test this, we performed the behavioural prediction procedure on datasets obtained from the first 5 (early) and the last 5 (late) stages of the experiment (Figure 6—figure supplement 1). There was strong evidence for a decline in the prediction power in the late vs. early blocks of trials. However, even with the decline in prediction accuracy, it is still possible to predict the behavioural outcome in the late blocks with well above-chance accuracy (up to 75%).”
8. When decoding distance, active decoding declines from early to late, even though performance does not decline (or even slightly improves from early to late). This discrepancy seems hard to explain. Is this decline in classification driven by differences in the total signal from early to late?
Thanks for the question. We explicitly define the vigilance effects as the difference between Active and Monitoring conditions to ensure that we are not interpreting general task effects like this one as vigilance decrements. This is important because otherwise effects that are not specific to maintaining vigilance (i.e., sustaining attention in the situation where only infrequent responses are necessary) could be misinterpreted. In this case, it could be driven by a number of general factors that are not specific to vigilance such as fatigue, but also equipment effects like the MEG recording system fluctuations in baseline (e.g., due to warming up). Our crucial comparisons for both behaviour and neural correlates are the increase in ‘miss rate’ and ‘reaction time’ for Monitoring vs. Active from early to late blocks and more decline in distance decoding information (from early to late blocks) for Monitoring than for Active (Figure 3B. Interaction between Target Frequency and Time on the task). We have now added the following sentence to Discussion and amended the manuscript to ensure this is clear.
“Note that our vigilance decrement effects are defined as the difference between Active and Monitoring conditions, which allows us to be sure that we are not interpreting general task (e.g., participant fatigue) or hardware-related effects as vigilance decrements. For example, the drop in decoding over time for both Active and Monitoring that is seen in Figure 3 might reflect some of the general changes in the characteristics of the recording hardware over the course of the experiment (e.g., the MEG system warming up), but our design allows us to dissociate these from the key vigilance effects we are interested in.”
9. Classifier performance was extremely high almost immediately after trial onset. Does the classifier perform at chance before the trial onset, or does this reflect sustained but not stimulus-specific information?
Thanks for pointing out that we were missing this information – yes, the classifier performs at chance in the pre-stimulus onset time. We have now added this to the modified figures in the revised manuscript.
10. The connectivity analysis appears to be just a correlation of decoding results between two regions of interest. This means, if one "region" allows for decoding the distance to the object, the other one does too. However, this alone does not equal connectivity. It could simply mean that patterns across the entire brain allow for decoding the same information. For example, it would not be surprising to find that both ROIs correlate more strongly for correct trials (i.e. the brain has obviously represented the relevant information) than for errors (i.e. the brain has failed to represent the information), without this necessarily being related to connectivity at all. The more parsimonious interpretation here is that information might have been represented across all channels at this time. The authors show no evidence that only these two (arbitrarily selected) "regions" encode the information while other do not. To show evidence for meaningful connectivity, (a) the spread of information should be limited to small sub-regions, and (b) the decoding results in one "region" should predict the results in another region in time (as for DCM).
Thanks for the important point. Actually, our connectivity analysis is not simply a correlation of magnitudes of decoding accuracy between two regions of interest, but rather a correlation of the patterns of decoding accuracies across conditions (i.e., across distances). Our approach follows the concept of informational connectivity (explained in more detail below) which measures how much similarity in information coding there is between two brain areas across conditions, which is interpreted as reflecting their potential connectivity. Therefore, rather than the average magnitude of decoding accuracy (high vs. low), the connectivity is driven by the correlation between the patterns of decoding accuracies either across time (Coutanche and Thompson-Schill, 2013) or across conditions (Kietzmann et al., 2018). We used the latter (i.e., RDMs) here to study connectivity. This is a critical difference because high classification values in two regions will not necessarily correspond to high connectivity in our analysis.
Accordingly, the difference in classification levels between ‘correct’ and ‘miss’ trials should not determine the connectivity – it’s more the consistency of the pattern (see below example). Our connectivity relies on (Spearman’s) correlation (which normalizes absolute amplitude), and as such it is unaffected by absolute decoding values in the pairs of input vectors: connectivity will be high only if the two areas encode the information across conditions similarly rather than if they code the information very efficiently across all conditions (i.e., maximum decoding values). For example, assume that we have four brain areas A, B, C and D with (simplified and vectorized) distance RDMs (as in our work) with decoding values of [95 91 97 92], [96 98 99 94], [57 51 55 54], [58 52 59 55], respectively. The inter-area correlation/connectivity matrix would be as in
AREA | ||||
---|---|---|---|---|
1 | 0.8 | 1 | ||
0.4 | 1 | 0 | 0.4 | |
0.8 | 0 | 1 | ||
1 | 0.4 | 0.8 | 1 |
Although mathematically our connectivity should be unaffected by absolute decoding values, we acknowledge that potentially noisier patterns of distance information in the brain on miss vs. correct trials could result in apparently lower connectivity for misses. We therefore added the following paragraph to the manuscript acknowledging this possibility:
“While our connectivity is unaffected by the absolute levels of information encoding in the brain on miss vs. correct trials, potentially noisier patterns of information encoding in miss (vs. correct) trials could result in the lower level of connectivity observed on miss (vs. correct) trials. Therefore, the lower level of connectivity for miss vs. correct trials observed here could result from the pair of regions representing two distinct sets of information (i.e,. becoming in some sense less connected) or representing similar information but distorted by higher level of noise.”
The more parsimonious interpretation here is that information might have been represented across all channels at this time. The authors show no evidence that only these two (arbitrarily selected) "regions" encode the information while other do not. To show evidence for meaningful connectivity, (a) the spread of information should be limited to small sub-regions, and (b) the decoding results in one "region" should predict the results in another region in time (as for DCM).
a. Yes, it is possible that the whole brain may process the information with the same pattern of decoding but changing the ROIs to smaller ones would not rule out this potential scenario (which applies to all connectivity analyses, even the conventional ones). We avoid making claims about the spatial specificity of our connectivity effect, as we are using MEG (as reflected in the names we chose for the regions: peri-occipital and peri-frontal). Please note though that these sub-regions were not arbitrary, but rather based on areas known to be involved in vision and attention, and based on previous attention work which showed a flow of information across the two areas (Goddard et al., 2016; Goddard et al., 2019).
It is very important in the interpretation of our result that, rather than making any claims about the absolute existence or magnitude of potential connectivity in the brain, we compared our connectivity indices across conditions. In other words, we do not seek to test whether connectivity exists or not between our ROIs, but rather whether any such connectivity varies with our manipulations of vigilance. Therefore, even in if the entire brain was responding similarly, the modulation of the connectivity metric is only explainable by the manipulations across our conditions.
b. We could not check the time course of our connectivity as in our previous work (Goddard et al., 2016; Karimi-Rouzbahani et al., 2020), because our distance information involves the whole trial and the direction information does not have enough number of conditions to make RDMs (please see the informational connectivity text below). Therefore, we clarified in the manuscript that:
“Informational connectivity, on the other hand, is measured either through calculating the correlation between temporally resolved patterns of decoding accuracies across a pair of areas (Coutanche and Thompson-Schill, 2013) or the correlation between representational dissimilarity matrices (RDMs) obtained from a pair of areas (Kietzman et al., 2018; Goddard et al., 2016; Goddard et al., 2019; Karimi-Rouzbahani et al., 2019; Karimi-Rouzbahani et al., 2020). Either one measures how much similarity in information coding there is between two brain areas across conditions, which is interpreted as reflecting their potential informational connectivity, and is less affected by absolute activity values compared to conventional univariate connectivity measures (Anzellotti & Coutanche, 2018).”
And added the following considerations to the methods:
“First, it reflects the similarity of the way a pair of brain areas encode “distance” information during the whole trial. This means that we could not use the component of time in the evaluation of our connectivity as we have implemented elsewhere (Karimi-Rouzbahani et al., 2019; Karimi-Rouzbahani et al., 2020). Second, rather than a simple correlation of magnitudes of decoding accuracy between two regions of interest, our connectivity measure reflects a correlation of the patterns of decoding accuracies across conditions (i.e., distances here). Finally, our connectivity analysis evaluates sensory information encoding, rather than other aspects of cognitive or motor information encoding, which might have also been affected by our experimental manipulations.”
11. The display of the results is very dense, and it not always clear whether decoding for a specific variable was above chance or not. The authors often focused on relative differences, making it difficult to fully understand the meaning of the full pattern of results. The Bayes-factor plots in the decoding results figures are so cramped that it is very difficult to actually see the individual dots and to unpack all of this (e.g., Figure 3). Could this complexity be somehow reduced, maybe by dividing the panels into separate figures? The two top panels in Figure 3B should also include the chance level as in A. It looks like the accuracy is very low for unattended trials, which is only true in comparison to attended trials, but (as also shown in Supplementary Figure 1) it was clearly also encoded in unattended trials, which is very important for interpreting the results.
We have extensively revised our figures, and expanded the Bayes plots; we hope they are now clear. We have split the panels in figures into Active and Monitoring panels, added the chance level line, and the pre-stimulus decoding values. We also reduced the density of Bayes Factor dots by down-sampling, and improved their appearance using a log scale and colour coding.
Regarding the relative differences, our design focuses on these because this allows us to be more specific about the effects that reflect actual vigilance decrements. This differs from many vigilance studies, and provides the opportunity for more specific inference. We have ensured this is clearer in the revision.
We hope the revised text and figures enhance the interpretability of the relative differences.
12. While this is methodologically interesting work, there is no convincing case made for what exactly the contribution of this study is for theories of vigilance. It seems that the findings can be reduced to that a lack of decodability of relevant target features from brain activity predicts that participants will miss the target. This alone, however, does not seem to be very novel. Even if the issues above are addressed, the study only demonstrates that with less attention to the target, there is less evidence of representations of the relevant features of targets in the brain. The authors also find the expected decrements for rare targets and when participants do not actively monitor the targets. How do these findings contribute to "theories of vigilance", as claimed by the authors?
This work makes three clear contributions to vigilance research. First, we present a novel multiple-object-monitoring paradigm that clearly evokes specific vigilance decrements in a context that mimics real-world monitoring scenarios. Our design controls for general experiment-level effects that are not specific to vigilance conditions, which as mentioned above, is surprisingly rare in the vigilance literature (which we now make clearer in the revision). This is an important contribution to the field as it provides a tool for further studies and allows us to address our hypotheses in a new and more realistic context.
Second, we showed that behavioural vigilance decrements are reflected in the neural representation of information. Previous studies have only provided coarse-grained correlates for vigilance decrements such as α-band increase in power spectrum (Kamzanova et al., 2014; Mazaheri et al., 2009; O’Connell et al., 2009). Here, we show that the neural representation of task-related information (i.e., distance) is affected by target frequency. While we agree that this is clearly a plausible prediction, it is a major step forward for a field that has had limited success in exploring specific neural correlates.
Third, we showed that change in neural representation of information between miss trials and correct trials can be used to predict the behavioural outcome on a given trial. This involves new methods that will be widely applicable, contributes to the global endeavour to link brain and behaviour, and provides a foundation for further research into potential applications for industries where detecting lapses of attention (as measured by a drop in specific task-relevant information) could prevent tragic accidents, such as rail and air traffic control.
Although we mentioned the major theories of vigilance in the paper, the theories themselves are underspecified, making it difficult to directly test them. We therefore deliberately avoided making strong claims about how our results falsified (or otherwise) the theories: they just do not contain enough specificity to do this. Nonetheless to avoid the implication that we provide a direct test of these theories, we removed the relevant paragraph in the discussion and carefully revised the paper to be explicit that the goal is not to adjudicate between the descriptive cognitive theories but rather to (a) provide a specific tool for studying vigilance in situations that mimic real-world challenges; (b) to understand what changes in the information encoded in the brain when vigilant attention lapses; and (c) to develop a method that can use neural data to predict behavioural outcomes.