Repository logo

Use of machine learning to identify a T cell response to SARS-CoV-2.

Published version



Change log


Shoukat, M Saad 
Foers, Andrew D 
Woodmansey, Stephen 
Evans, Shelley C 
Fowler, Anna 


The identification of SARS-CoV-2-specific T cell receptor (TCR) sequences is critical for understanding T cell responses to SARS-CoV-2. Accordingly, we reanalyze publicly available data from SARS-CoV-2-recovered patients who had low-severity disease (n = 17) and SARS-CoV-2 infection-naive (control) individuals (n = 39). Applying a machine learning approach to TCR beta (TRB) repertoire data, we can classify patient/control samples with a training sensitivity, specificity, and accuracy of 88.2%, 100%, and 96.4% and a testing sensitivity, specificity, and accuracy of 82.4%, 97.4%, and 92.9%, respectively. Interestingly, the same machine learning approach cannot separate SARS-CoV-2 recovered from SARS-CoV-2 infection-naive individual samples on the basis of B cell receptor (immunoglobulin heavy chain; IGH) repertoire data, suggesting that the T cell response to SARS-CoV-2 may be more stereotyped and longer lived. Following validation in larger cohorts, our method may be useful in detecting protective immunity acquired through natural infection or in determining the longevity of vaccine-induced immunity.



B cell receptor repertoire, SARS-CoV-2, T cell receptor repertoire, adaptive immunity, antibody, coronavirus, hierarchical clustering, infection, machine learning, Amino Acid Sequence, COVID-19, Cluster Analysis, Complementarity Determining Regions, High-Throughput Nucleotide Sequencing, Humans, Machine Learning, Principal Component Analysis, Receptors, Antigen, B-Cell, Receptors, Antigen, T-Cell, SARS-CoV-2, Sequence Analysis, DNA, T-Lymphocytes

Journal Title

Cell Rep Med

Conference Name

Journal ISSN


Volume Title



Elsevier BV