Repository logo

Estimating viral prevalence with data fusion for adaptive two-phase pooled sampling.

Published version

Change log


Peel, Alison J 
Madden, Wyatt 
Ruiz Aravena, Manuel 
Morris, Aaron 


The COVID-19 pandemic has highlighted the importance of efficient sampling strategies and statistical methods for monitoring infection prevalence, both in humans and in reservoir hosts. Pooled testing can be an efficient tool for learning pathogen prevalence in a population. Typically, pooled testing requires a second-phase retesting procedure to identify infected individuals, but when the goal is solely to learn prevalence in a population, such as a reservoir host, there are more efficient methods for allocating the second-phase samples.To estimate pathogen prevalence in a population, this manuscript presents an approach for data fusion with two-phased testing of pooled samples that allows more efficient estimation of prevalence with less samples than traditional methods. The first phase uses pooled samples to estimate the population prevalence and inform efficient strategies for the second phase. To combine information from both phases, we introduce a Bayesian data fusion procedure that combines pooled samples with individual samples for joint inferences about the population prevalence.Data fusion procedures result in more efficient estimation of prevalence than traditional procedures that only use individual samples or a single phase of pooled sampling.The manuscript presents guidance on implementing the first-phase and second-phase sampling plans using data fusion. Such methods can be used to assess the risk of pathogen spillover from reservoir hosts to humans, or to track pathogens such as SARS-CoV-2 in populations.



ORIGINAL RESEARCH, adaptive sampling, Bayesian statistics, group testing

Journal Title

Ecol Evol

Conference Name

Journal ISSN


Volume Title


Defense Advanced Research Projects Agency (PREEMPT D18AC000031)