Estimating viral prevalence with data fusion for adaptive two-phase pooled sampling.
The COVID-19 pandemic has highlighted the importance of efficient sampling strategies and statistical methods for monitoring infection prevalence, both in humans and in reservoir hosts. Pooled testing can be an efficient tool for learning pathogen prevalence in a population. Typically, pooled testing requires a second-phase retesting procedure to identify infected individuals, but when the goal is solely to learn prevalence in a population, such as a reservoir host, there are more efficient methods for allocating the second-phase samples.To estimate pathogen prevalence in a population, this manuscript presents an approach for data fusion with two-phased testing of pooled samples that allows more efficient estimation of prevalence with less samples than traditional methods. The first phase uses pooled samples to estimate the population prevalence and inform efficient strategies for the second phase. To combine information from both phases, we introduce a Bayesian data fusion procedure that combines pooled samples with individual samples for joint inferences about the population prevalence.Data fusion procedures result in more efficient estimation of prevalence than traditional procedures that only use individual samples or a single phase of pooled sampling.The manuscript presents guidance on implementing the first-phase and second-phase sampling plans using data fusion. Such methods can be used to assess the risk of pathogen spillover from reservoir hosts to humans, or to track pathogens such as SARS-CoV-2 in populations.