Repository logo

Prognostic biomarker discovery from omics data using machine learning approaches



Change log


Garg, Manik 


Prognostic biomarker discovery from omics data using machine learning approaches by Manik Garg

Prognostic biomarkers can help clinicians identify high-risk patients to administer appropriate therapies. The omics data from patient samples can be used to find such biomarkers. Moreover, molecular biomarkers can help to understand disease mechanisms. Machine learning and related data analysis methods can be applied to omics data for reliable determination of these biomarkers. In this thesis, I aimed to identify reproducible prognostic biomarkers in three different diseases: Alzheimer’s disease (AD), primary melanoma and coronavirus disease 2019 (COVID-19).

In my second chapter, I contributed to the discovery of a new metabolic signature, to predict which patients with mild-cognitive impairment would later develop AD. As there were thousands of un-annotated metabolic features potentially differentiating such patients, to overcome the problem of over-training, we shortlisted only those features associated with genetic variants. After annotating the top-ranking features, we hypothesized about their potential links to AD using extensive literature research. My contribution to this chapter mostly was employing the machine learning methods to refine and optimise the signature.

In my third chapter, I analysed RNA-sequencing data derived from primary melanomas resected from stage IIB-IIIC (7th edition of the American Joint Committee on Cancer staging manual) patients embedded within a prospective phase III randomized clinical trial. This led to the identification of a 121-gene-based expression signature that can predict poor outcomes and stratify patients with high absolute risk of death in 5 years. The prognostic ability of this signature was validated in 4 independent datasets. I also found that patients with higher signature score (indicating poor outcomes) had lower tumour infiltrating lymphocytes suggesting that these patients have immune cell deprived tumours that warrants the need for specialized treatment strategies.

In my fourth chapter, I performed a meta-analysis of 10 published single-cell RNA sequencing datasets to validate the immune response changes associated with COVID-19 progression reported. I found that 8 out of 20 published immune response changes were consistently reproducible across multiple datasets. In addition, in my fifth chapter, I studied how immune response changes with COVID-19 severity in recovered patients. Here, I showed that while patients recovered from mild/moderate COVID-19 infection had their immune responses close to healthy individuals within 27-47 days of symptom onset, those recovered from severe/critical infection still had their immune response affected.

The described results are published in four peer-reviewed journal papers and specific contributions are highlighted in the text. Overall, the work presented in this thesis demonstrates various approaches in which omics data can be used for prognostic biomarker discovery. Further, this work also contributes to the knowledge of current prognostic biomarkers in AD, primary melanoma and COVID-19.





Brazma, Alvis


Alzheimer's disease, Biomarker, COVID-19, Functional genomics, Omics, Primary melanoma, Prognostic


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
EMBL predoctoral fellowship