Precise identification of cell states altered in disease using healthy single-cell references.
Joint analysis of single-cell genomics data from diseased tissues and a healthy reference can reveal altered cell states. We investigate whether integrated collections of data from healthy individuals (cell atlases) are suitable references for disease-state identification and whether matched control samples are needed to minimize false discoveries. We demonstrate that using a reference atlas for latent space learning followed by differential analysis against matched controls leads to improved identification of disease-associated cells, especially with multiple perturbed cell types. Additionally, when an atlas is available, reducing control sample numbers does not increase false discovery rates. Jointly analyzing data from a COVID-19 cohort and a blood cell atlas, we improve detection of infection-related cell states linked to distinct clinical severities. Similarly, we studied disease states in pulmonary fibrosis using a healthy lung atlas, characterizing two distinct aberrant basal states. Our analysis provides guidelines for designing disease cohort studies and optimizing cell atlas use.
Acknowledgements: We thank M. Morgan and R. Lindeboom for the critical reading of the manuscript, and R. Elmentaite, A. Missarova and all members of the Marioni and Teichmann laboratories for valuable discussions and feedback on this project. The PBMC studies included in this work were selected using the materials from the Chan–Zuckerberg Initiative workshop on ‘Assembling Tissue References’, which were kindly shared by L. Dratva. J.C.M. acknowledges core funding from Cancer Research UK (C9545/A29580) and the European Molecular Biology Laboratory. E.D., A.-M.C., A.J.O., K.B.M and S.A.T. acknowledge Wellcome Sanger core funding (WT206194).
Funder: Core funding from the European Molecular Biology Laboratory.
Funder: Core funding from Wellcome Sanger Institute (WT206194)