Global and national surveillance of SARS-CoV-2 epidemiology is mostly based on targeted schemes focused on testing individuals with symptoms. These tested groups are often unrepresentative of the wider population and exhibit test positivity rates that are biased upwards compared with the true population prevalence. Such data are routinely used to infer infection prevalence and the effective reproduction number, _{t}, which affects public health policy. Here, we describe a causal framework that provides debiased fine-scale spatiotemporal estimates by combining targeted test counts with data from a randomized surveillance study in the United Kingdom called REACT. Our probabilistic model includes a bias parameter that captures the increased probability of an infected individual being tested, relative to a non-infected individual, and transforms observed test counts to debiased estimates of the true underlying local prevalence and _{t}. We validated our approach on held-out REACT data over a 7-month period. Furthermore, our local estimates of _{t} are indicative of 1-week- and 2-week-ahead changes in SARS-CoV-2-positive case numbers. We also observed increases in estimated local prevalence and _{t} that reflect the spread of the Alpha and Delta variants. Our results illustrate how randomized surveys can augment targeted testing to improve statistical accuracy in monitoring the spread of emerging and ongoing infectious disease.

A causal debiasing framework provides accurate estimates of local prevalence and effective reproduction number for surveillance of SARS-CoV-2 cases using data from randomized testing schemes to model ascertainment bias in targeted subpopulation data.

These authors contributed equally: George Nicholson, Brieuc Lehmann.

The spread of the new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the ensuing outbreaks of coronavirus disease 2019 (COVID-19) have placed a substantial burden on public health in the United Kingdom. As of 14 July 2021, the number of people recorded to have died in the United Kingdom within 28 days of a positive SARS-CoV-2 test was 128,530 (refs. ^{1,2}). In response to the ongoing epidemic, the UK government has implemented a number of non-pharmaceutical interventions to reduce the transmission of SARS-CoV-2, ranging from localized measures, such as the closures of bars and restaurants, to full national lockdowns^{3}. The localized measures have been employed through a regional tier system, with lower tier local authorities (LTLAs) being placed under varying levels of restrictions according to data such as the number of positive polymerase chain reaction (PCR) tests returned there over a 7-day interval (or local weekly positive tests)^{4}. Following a third national lockdown that began on the 6 January 2021, the United Kingdom has undergone a staged relaxation of restrictions, with lockdown rules ending on 19 July 2021 (ref. ^{5}).

In the United Kingdom, there are two major ongoing studies that undertake randomized survey testing to provide an insight into the prevalence of SARS-CoV-2. Since April 2020, the Office for National Statistics (ONS) COVID-19 Infection Survey (CIS) tests a random sample of people living in the community with longitudinal follow-up^{6}. The survey is designed to be representative of the UK population, with individuals aged two years and over in private households randomly selected from address lists and previous ONS surveys, although it does not explicitly cover care homes, the sheltering population, student halls or individuals currently being hospitalized. The REal-time Assessment of Community Transmission (REACT) study is a second nationally representative prevalence survey of SARS-CoV-2 based on repeated cross-sectional samples from a representative subpopulation defined via (stratified) random sampling from the National Health Service patient register of England^{7,8}. Importantly, both surveys recruit participants regardless of symptom status and are therefore able to largely avoid issues arising from ascertainment bias when estimating prevalence. The ONS CIS uses multilevel regression and post-stratification to account for any residual ascertainment effects due to non-response^{6}, whereas the REACT study uses survey weights for this purpose.

While randomized surveillance testing readily provides an accurate statistical estimate of prevalence of PCR positivity, precision can be low at finer spatiotemporal scales (for example, at the LTLA level), even in large studies such as the ONS CIS and REACT surveys. Our major goal here is to unlock the information in non-randomized testing under arbitrary, unknown ascertainment bias. Although we expect the methods to apply in a broad manner, here we focus on Pillar 1 and Pillar 2 (Pillar 1+2) PCR tests conducted in England between 31 May 2020 and 20 June 2021 (lateral flow device (LFD) tests are not included; further details provided in ^{9}, and Pillar 2 tests comprise “swab testing for the wider population”^{9}. Pillar 1+2 testing therefore has more capacity than the randomized programmes, but the protocol incurs ascertainment bias because those at increased risk of being infected are tested, such as frontline workers, contacts traced to a COVID-19 case or the subpopulation presenting with COVID-19 symptoms, such as loss of taste and smell^{9}. Hence, raw prevalence estimates from Pillar 1+2 data (as a proportion of tested population) will tend to be biased upwards and cannot directly be used to estimate the unknown infection rate in a region. In contrast, as a proportion of the entire population, the bias is downwards as not all individuals with infection in the area are captured. Furthermore, the degree of upward bias may be influenced by overall testing capacity and uptake. In addition, the raw prevalence estimates tend not to capture asymptomatic infection, even though there is evidence to indicate that asymptomatic individuals can contribute to viral transmission^{10,11}.

Combining data from multiple surveillance schemes can improve estimates for prevalence. For example, Manzi et al.^{12} incorporated information from multiple, biased, commercial surveys to provide more accurate and precise estimates of smoking prevalence in local authorities across the East of England. A number of geostatistical frameworks for infectious disease modelling based on multiple diagnostic tests have been developed^{13–15}. These accommodate different sources of heterogeneity among the tests to deliver more reliable and precise inferences on disease prevalence.

To understand the ascertainment bias problem and to enable a statistical approach to correction, it is helpful to consider a simplified causal model^{16,17} for Pillar 1+2 data. This is represented by a directed acyclic graph (DAG), shown in Fig.

In addition to prevalence, there are a number of epidemiological parameters that may be useful for informing localized non-pharmaceutical interventions. For example, one particular variable of interest is the (time-varying) effective reproductive number _{t}, which is defined roughly as the average number of infections caused by an infectious individual. That is, when _{t} > 1, the epidemic will continue to spread. The current pandemic has spurred the development of models that aim to incorporate multiple sources of data to estimate important epidemiological parameters. See Supplementary Table ^{18–25} (^{26,27} for reviews, which have a particular focus on _{t}.

Within this urgent and fast developing area of research, it is clearly important to define the aspects in which our method contributes. First, we have developed methods to infer unbiased local prevalence, _{t}, from targeted testing data. This is important in its own right because being able to estimate local prevalence accurately from targeted testing data adds an important facet to existing COVID-19 monitoring capabilities. Here, we focus on weekly period prevalence and explicitly target the number of infectious individuals via a correction to the estimated PCR-positive numbers. Second, our method outputs bias-adjusted cross-sectional prevalence likelihoods _{t} of _{t} ∣ _{t}), where _{t} and _{t} are positive and total targeted test counts, respectively. This allows prevalence information from targeted data to be coherently embedded in a modular way into complex spatiotemporal epidemiological models, including those synthesizing multiple data types. We exemplify this by implementing a susceptible-infectious-recovered (SIR) model around our ascertainment model likelihood. Third, our local ascertainment model is based on targeted testing data alone with both the number of positive and total tests being modelled (_{t} and _{t}). This has two important benefits: spatiotemporal variation in testing uptake and capacity is explicitly conditioned on (via _{t}), and differential test specificity and sensitivity can be naturally incorporated into our causal ascertainment model.

Figure _{t} ∣ _{t} of _{t}).

With reference to the causal DAG in Fig.

Our approach combines randomized surveillance data (REACT) and targeted surveillance data (Pillars 1+2) to infer _{1:T}, applied to each constituent local region (LTLA) in the local prevalence analyses. Figure

_{1:T}. Left: heterogeneous bias across the nine PHE regions. Right: London only. The thick curves show the prior means and the narrow curves show 95% credible intervals. Note that ^{3} ≈ 20 times higher in individuals with infection compared with individuals without infection.

Equipped with a coarse-scale (PHE-region level) EB prior on bias

The

The cross-sectional debiased likelihood can be introduced modularly into a wide variety of downstream epidemiological models. We illustrate this by using the likelihood as an input to a simple SIR epidemic model (_{t} number at the most recent time point (the week of 20 June 2021), with each point corresponding to a single LTLA. The scatter plot provides a quick visual representation of regions where transmission rates and/or prevalence are relatively high. To illustrate, we label five LTLAs with high prevalence and/or _{t} estimates. The estimated longitudinal prevalence and _{t} for this subset of LTLAs (Fig. _{t} is increasing or decreasing.

_{t} at a selection of LTLAs. The vertical line and horizontal line in _{t} = 1; when _{t} > 1, the number of cases occurring in a population will increase. In

Figure _{t}, using a fortnightly sequence of maps, with each LTLA coloured according to its estimate prevalence or _{t}. Zoom-in boxes display the local fine-scale structure for London.

_{t} in England from 13 September 2020 to 20 June 2021.

A striking feature of the maps in Fig. ^{28}. Similarly, the increase in _{t} from May 2021 onwards is in accordance with the spread of the Delta VoC 21APR-02 (lineage B.1.617.2), which is estimated to have a reproduction number approximately 60% higher than that of the Alpha VoC^{29}.

Similar to a previous study^{28}, we characterized the relationship between the estimated local _{t} and the frequency of Alpha VoC 202012/01, as approximated by the frequency of ^{30}. Figure _{t} from mid-November 2020 to mid-December 2020. The increase in frequency of the VoC was initially isolated to the South East but then spread outwards, accompanied by a corresponding increase in both local estimated prevalence and _{t}. We observe a strong positive association between the local VoC frequency and estimated local _{t}, which are consistent with the increased transmissibility of this VoC identified in ref. ^{28}.

Maps of estimated local prevalence (left), estimated local _{t} (middle) and frequency of SGTF (right), and scatter plots of SGTF frequency against estimated _{t} (far right). Grey-coloured LTLAs denote missing data.

We performed a similar analysis for the Delta VoC 21APR-02 using data provided by the Wellcome Sanger Institute’s Covid-19 Genomics Initiative^{31}. Extended Data Fig. _{t} from the end of April 2021 to the start of June 2021. We see that the Delta VoC becomes the dominant variant over the course of this time period, and in contrast to the Alpha VoC, the spread of the variant was not isolated to a single region of England. We again observe a strong positive association between the local VoC frequency and estimated local _{t}. A simple linear regression of _{t} against Delta frequency for the week of 23 May 2021 indicated an increase in transmissibility of 0.55 (0.39–0.71) due to the Delta VoC, which is in accordance with estimates obtained in ref. ^{29}.

We assessed the performance of debiased fine-scale (LTLA-level) prevalence estimates by measuring how well they predict LTLA-level REACT data. The validation is best described in terms of coarse-scale REACT training data and contemporaneous fine-scale REACT test data. The training data inputted are REACT PHE-region-level and Pillar 1+2 LTLA-level positive (and number of) test counts for the week at the centre of the corresponding REACT round to be predicted. The test data are REACT LTLA-level positive (and number of) test counts aggregated across the relevant REACT sampling round. Figure

REACT and ONS CIS are among the most comprehensive randomized surveillance studies in the world. We have tried to assess how well the debiasing model might hold when we are faced with coarser-scale or more limited randomized testing data. First, to investigate the downstream effects of ultra-coarse-scale randomized surveillance data, we aggregated all REACT data to the national level, estimated the

_{t} measures whether the number of infectious individuals is increasing, _{t} > 1, or decreasing, _{t} < 1, in the population at time point _{t} estimates with the future change in local case numbers. For validation purposes, here we are performed one-step-ahead at a time prediction and compared predictions with out-of-training-sample observed statistics (fold-change in raw case numbers from baseline). The results were stratified according to baseline case numbers, and we examined predictions 1 week and 2 weeks ahead. Each point corresponds to an (LTLA, week) pair, and the results are for the period 18 October 2020 to 20 June 2021. Across each of the six scenarios presented, there is strong evidence of an association between _{t} and future change in case numbers (^{−16}). The strength of association between _{t} and 1-week-ahead case numbers has Spearman’s

We extracted estimates of _{t} based on our debiasing model likelihood implemented within a standard SIR model, illustrated in Extended Data Fig. _{t} estimates outputted by at the Imperial College COVID-19 website^{32}. A cross-method comparison of longitudinal traces of _{t} for a subset of LTLAs is shown in Extended Data Fig.

The current standard practice internationally is to summarize SARS-CoV-2 infection rates by counting the number of individuals testing positive in a local area over a period of time, typically 1 week. The resulting statistic—cases per 100,000—is used to characterize and monitor the spatiotemporal state of an epidemic alongside other epidemiological measures such as _{t}. Problematically, however, interpreting cases per 100,000 is not straightforward, as the data are subject to a number of unknown biasing influences such as (1) variation in testing capacity, (2) ascertainment bias on who is (self)-selected to be tested and (3) imperfect sensitivity and specificity of antigen tests. These factors, among others, make it difficult to quantify the true underlying local incidence or prevalence of SARS-CoV-2 infection, which places a burden on policymakers implicitly to adjust for such biases themselves. To address this problem, we developed an integrative causal model that can be used to debias raw case numbers and accurately estimate the number of individuals with infection in a local area.

The flexible statistical framework allows simultaneous and coherent incorporation of a number of important features. First, it corrects for ascertainment bias that result from preferential testing based on symptom status or on other confounders. This accounts for any variation in testing capacity by modelling the total number of tests conducted locally. Second, it can incorporate the use of different SARS-CoV-2 testing assays, such as LFD and PCR, including adjustment for particular sensitivity and specificity. Third, it infers the number of infectious individuals, while PCR tests may also pick up positive individuals at non-infectious stages. Finally, the model outputs week-specific debiased prevalence with uncertainty (via a marginal likelihood), which allows modular interoperability with other models. We illustrated this with a SIR epidemic model implementation that estimated local transmission rates while accounting for vaccine- and disease-induced immunity in the population. Our modelling work illustrates the benefits of having both a rolling randomized surveillance survey and targeted testing (for example, of frontline healthcare staff and symptomatic individuals). While targeted testing is routinely collected internationally, the United Kingdom has led the way in introducing regular national surveillance randomized surveys such as REACT^{7,8} and ONS CIS^{6}. Ongoing international pandemic preparedness can benefit from sampling designs that combine random sampling with targeted testing so that they can most powerfully complement and strengthen one another. Our model depends on the availability of randomized surveillance data. Future studies from other countries and collaborations with local experts will show and may further validate the breadth of utility of our debiasing framework and how it contributes towards global public health responses.

Since randomized surveillance data are currently rare internationally, there would be utility in extending the causal framework to address situations where targeted testing is accompanied by semi-randomized data with a well-known selection process (such as routine tests for healthcare workers, in care homes or regular testing at schools). Extending the current framework would begin with careful empirical exploration of the relationship between test positivity rate in such semi-randomized settings and comparable local prevalence (for example, in relevant age strata). The wealth of data available in the United Kingdom provides a good starting point for such exploratory work, which can be used to develop more complex causal models transferable to new semi-randomized contexts.

The Alan Turing Institute Ethics Advisory Group provided guidelines for this study’s procedures and advised that Health Research Authority approval is not required for this research.

The primary target of inference is prevalence,

We applied the debiasing framework to test-count data aggregated into non-overlapping weeks. This has two clear advantages. First, by aggregating to weekly level data, we obviate the need to account for weekday effects that can be driven, for example, by logistical constraints or by individuals self-selecting to submit samples more readily on some weekdays than on others. Second, fitting a weekly model is computationally less intensive than fitting a model to daily test counts. The potential disadvantage of binning data by week is that high-frequency effects cannot be detected. Although it is possible in principle to adapt the framework to analyse daily testing data, we note that daily variation is likely to be confounded by weekday testing effects and so may be difficult to detect and interpret. Furthermore, while we use non-overlapping weekly data for model fitting, it is possible to output rolling weekly estimates, particularly to obtain as up-to-date prevalence estimates as are permitted by the data. However, we note that complete testing data are typically subject to a reporting lag of 4–5 days^{33}.

Suppose that out of a total

PCR tests are sensitive and can detect the presence of SARS-CoV-2 both days before and weeks after an individual is infectious. It is usually desirable for prevalence to represent the proportion of a population that is infectious. We can obtain a likelihood for the number of infectious individuals

The conditional distribution ^{34} and the interval of PCR positivity in individuals with SARS-CoV-2 infection^{35}. More precisely, we specified the infectious time interval for an average individual with infection in the population to span the interval 1–11 days after infection (the empirical range of generation time from fig. ^{34}). We then calculated the posterior probability of a positive PCR occurring 1–11 days after infection (fig. ^{35}). We incorporated the effects of changing incidence in the calculations; this is important because, for example, if incidence is rising steeply, the majority of people who would test PCR positive in the population are those that are relatively recently infected. Full details can be found in

In contrast to the randomized surveillance likelihood in equation (

We introduce the following parameters:

The likelihood in equation (_{β} denoting the number of false-negative test results. An analogous adjustment can be made to the randomized surveillance likelihood in equation (

We leveraged spatially coarse-scale randomized surveillance data to specify an EB prior on bias parameters _{j}). We explicitly use the superscripts LTLA (_{j}) in step 4 below, where notation from both coarse and fine scale appear together. All quantities in steps 1–3 are implicitly superscripted (_{j}), but these are suppressed for notational clarity. For computational efficiency, we handle prevalence in a reduced-dimension space of bins as described in

Infer prevalence from unbiased testing data. At a coarse geographic level (PHE region _{j}), estimate prevalence from randomized surveillance data _{t} of _{t}. Represent the posterior at time

Learn _{t} from accurate prevalence. At a coarse geographic level, for each _{t} by coupling biased data _{t} of _{t} with accurate prevalence information _{t} fixed at

Specify smooth EB prior on _{1:T}. A smooth prior on _{1:T} is specified as follows:_{δ}) imparts a user-specified degree of longitudinal smoothness, thereby sharing information on _{t}, in the absence of random surveillance data, is encapsulated in a Gaussian with large variance _{δ}) corresponds to a stationary autoregressive, AR(1), process of the form_{T×T}) having elements

Infer cross-sectional local prevalence from biased testing data. At a fine-scale geographic level (LTLA _{j}), having observed

The methods can be adapted in a straightforward manner to the situation in which the randomized surveillance study uses a different assay to the targeted testing. For a concrete example, we could use REACT PCR prevalence posterior _{t} of _{t}. Equation (

The cross-sectional analysis described above in “_{t}.

We illustrate this via a Bayesian implementation of a stochastic epidemic model whereby individuals become immune through population vaccination and/or exposure to COVID-19 (Supplementary Fig. ^{36}, chapter 3). Details of the transition probability calculations are given in the

We place priors on ^{+} measured as a proportion of the population; this proportion then gets mapped to prevalence intervals on subpopulation counts as described in “Interval-based prevalence inference—set-up and assumptions” in the _{t} at each time point _{t}, for example a Uniform(0.5, 2.5).

We performed inference under the model represented in the DAG in Supplementary Fig. ^{+}) using separate Gibbs updates. For sampling (^{+}), we represented the joint full conditional as^{new} from ^{+} ∣ ^{new}).

The sampling distribution on prevalence can be expressed as

We expressed the full conditional for

The prior joint distribution of

The update involves sampling from_{t} into an evenly spaced grid and sample from the hidden Markov model defined in equation (^{37}. The transition probabilities are given by equation (_{t} space) and the emission probabilities given by equation (37) (

Further information on research design is available in the

B.L. was supported by the UK Engineering and Physical Sciences Research Council through the Bayes4Health programme (grant number EP/R018561/1) and gratefully acknowledges funding from Jesus College, Oxford. K.B.P. is supported by the National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Healthcare Associated Infections and Antimicrobial Resistance at the University of Oxford in partnership with Public Health England (PHE) NIHR200915 and the Huo Family Foundation. S.R. is supported by MRC programme grant MC_UU_00002/10, The Alan Turing Institute grant TU/B/000092, and the EPSRC Bayes4Health programme grant EP/R018561/1. M.B. acknowledges partial support from the MRC Centre for Environment and Health, which is currently funded by the Medical Research Council MR/S019669/1. G.N. and C.H. acknowledge support from the Medical Research Council Programme Leaders award MC_UP_A390_1107. C.H. acknowledges support from The Alan Turing Institute, Health Data Research, UK, and the UK Engineering and Physical Sciences Research Council through the Bayes4Health programme grant. Infrastructure support for the Department of Epidemiology and Biostatistics is also provided by the NIHR Imperial BRC. Authors at the Alan Turing Institute and Royal Statistical Society Statistical Modelling and Machine Learning Laboratory gratefully acknowledge funding from the Joint Biosecurity Centre, a part of NHS Test and Trace within the Department for Health and Social Care. The computational aspects of this research were supported by the Wellcome Trust Core Award grant number 203141/Z/16/Z (to B.L.) and the NIHR Oxford BRC. The views expressed are those of the authors and not necessarily those of the National Health Service, the NIHR, the Department of Health, the Joint Biosecurity Centre or PHE.

G.N., B.L. and C.H. conceived and designed the research. G.N., B.L., T.P., R.J., J.L., R.E.K. and A.-M.M. acquired, analysed or interpreted the data. G.N., B.L., R.J. and J.L. created new software used in the work. G.N., B.L., R.J., T.P., K.B.P., P.J.D., S.R., M.B. and C.H. wrote the paper.

The data underlying the Alpha VoC 202012/01 analysis were accessed via the UK Health Security Agency Data Science Hub (DaSH) data platform; they are not publicly available and can only be accessed using approved UK government email domains such as @test-and-trace.nhs.uk. For the remainder of the results presented here, the data are publicly available. Randomized surveillance data comes from the REACT study^{7,8} (_{t} estimates outputted by the Imperial College team’s Epidemia model^{38,39} from

The R scripts^{40} used to generate the results in this manuscript are available in the following Git repository:

The authors declare no competing interests.

Directed paths characterise conditional probability distributions, in contrast to the paths showing transitions between model compartments in Supplementary Fig. _{t} of _{t}. A prior on _{t} parameterized (_{δ} imparts temporal smoothness on _{1:T}. Effective reproduction numbers are denoted _{1:T}, and the number of immune individuals by

Grey-coloured areas denote where the total number of variant sequencing assays performed (across all variants) is less than 10; in these cases the delta variant frequency estimates are omitted due to having high standard error.

Each point corresponds to an LTLA. Each scatter plot compares Pillar 1+2 prevalence estimates against unbiased estimates from the REACT study. Panels (a,c) show REACT round 10 data (11th Mar - 30th Mar 2021), and panels (b,d) show round 11 (15th Apr - 3rd May 2021). Uncorrected results are shown in panels (a-b) and bias-corrected cross-sectional estimates in (c-d). Horizontal grey lines are 95% exact binomial confidence intervals from the REACT data. Vertical black lines in panels (a-b) are 95% exact binomial confidence intervals from the raw, non-debiased Pillar 1+2 data. Vertical black lines in panels (c-d) are 95% posterior credible intervals from the debiased Pillar 1+2 data. Neither set of prevalence estimates has been corrected for false positives/negatives. Note that in panels (c-d), the CI widths are systematically tighter for the debiased Pillar 1+2 compared to the REACT data, pointing to the useful information content in debiased Pillar 1+2 data. The number of independent tests underlying each mean and (horizontal) CI for the REACT data varied between 289 and 1,894. The number of independent tests underlying each mean and (vertical) CI for the Pillar 1+2 data varied between 977 and 29,998.

Each point corresponds to an LTLA. Each scatter plot compares Pillar 1+2 prevalence estimates against unbiased estimates from the REACT study. Left to right the columns of panels show results from REACT round 7 (13th Nov - 3rd Dec 2020), round 8 (6th-22nd Jan 2021), and round 9 (4th-23rd Feb 2021). On the vertical axes: (a-c) show uncorrected test positivity rates; (d-f) show bias-corrected prevalence estimates; (g-i) show bias-corrected prevalence estimates where the bias

Each point corresponds to an (LTLA, week) pair, predicting future case numbers in the LTLA using

For each of the nine PHE regions, we present the constituent LTLA whose name is ranked top alphabetically.

Supplementary Figs. 1–7, Supplementary Table 1, Discussion of methodological assumptions and caveats, Supplementary Results.

Reporting Summary

is available for this paper at

The online version contains supplementary material available at