Repository logo
 

Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning.

Accepted version
Peer-reviewed

No Thumbnail Available

Type

Article

Change log

Authors

Pandurangan, Arun Prasad  ORCID logo  https://orcid.org/0000-0001-7168-7143

Abstract

Next-generation sequencing methods have not only allowed an understanding of genome sequence variation during the evolution of organisms but have also provided invaluable information about genetic variants in inherited disease and the emergence of resistance to drugs in cancers and infectious disease. A challenge is to distinguish mutations that are drivers of disease or drug resistance, from passengers that are neutral or even selectively advantageous to the organism. This requires an understanding of impacts of missense mutations in gene expression and regulation, and on the disruption of protein function by modulating protein stability or disturbing interactions with proteins, nucleic acids, small molecule ligands, and other biological molecules. Experimental approaches to understanding differences between wild-type and mutant proteins are most accurate but are also time-consuming and costly. Computational tools used to predict the impacts of mutations can provide useful information more quickly. Here, we focus on two widely used structure-based approaches, originally developed in the Blundell lab: site-directed mutator (SDM), a statistical approach to analyze amino acid substitutions, and mutation cutoff scanning matrix (mCSM), which uses graph-based signatures to represent the wild-type structural environment and machine learning to predict the effect of mutations on protein stability. Here, we describe DUET that uses machine learning to combine the two approaches. We discuss briefly the development of mCSM for understanding the impacts of mutations on interfaces with other proteins, nucleic acids, and ligands, and we exemplify the wide application of these approaches to understand human genetic disorders and drug resistance mutations relevant to cancer and mycobacterial infections. STATEMENT FOR A BROADER AUDIENCE: Genetic or somatic changes in genes can lead to mutations in human proteins, which give rise to genetic disorders or cancer, or to genes of pathogens leading to drug resistance. Computer software described here, using statistical approaches or machine learning, uses the information from genome sequencing of humans and pathogens, together with experimental or modeled 3D structures of gene products, the proteins, to predict impacts of mutations in genetic disease, cancer and drug resistance.

Description

Keywords

amino acid substitution probabilities, drug resistance, genetic disorders, machine learning, mutations, protein stability and interactions, protein structure, Computational Biology, Drug Resistance, Genetic Predisposition to Disease, High-Throughput Nucleotide Sequencing, Humans, Machine Learning, Models, Molecular, Mutation, Protein Binding, Protein Conformation, Protein Stability, Proteins, Sequence Analysis, DNA, Software

Journal Title

Protein Sci

Conference Name

Journal ISSN

0961-8368
1469-896X

Volume Title

29

Publisher

Wiley

Rights

All rights reserved
Sponsorship
Bill & Melinda Gates Foundation (via Foundation for the National Institutes of Health (FNIH)) (ABELL11HTB0)
Wellcome Trust (200814/Z/16/Z)
Medical Research Council (MR/N501864/1)
Medical Research Council (MR/M026302/1)
European Commission (260872)