Algorithm development for RNA structure prediction in RNA viruses
Repository URI
Repository DOI
Change log
Authors
Abstract
RNA structures play many different roles in the life cycles of RNA viruses, including in directing translational control, genome replication, subgenomic RNA synthesis, and encapsidation. However, predicting these structures can be challenging, especially when it comes to pseudoknots, long-range interactions, and mutually exclusive interactions or switches. Furthermore, separating biologically relevant structures from the vast number of interactions that could theoretically form can be problematic. Since the sequences of RNA viruses evolve rapidly and, for many species, there are a large number of sequenced isolates, comparative genomics can be used to predict structures that are conserved and therefore more likely to be biologically relevant.
Here, a new program is presented that aims to fill the gaps among existing RNA structure prediction programs by looking for all possibly functionally relevant short-range and long-range interactions that are conserved within a multiple sequence alignment, thereby allowing for the detection of pseudoknots and mutually exclusive structures. An offset parameter makes it possible to find interactions that do not align perfectly, thereby allowing for the possibility to look at more divergent sequences. This can further improve the separation of functional structures from random structures. Furthermore, a phylogenetic weighting scheme aims to balance alignments that may mix an abundance of closely related sequences with a few distantly related sequences, besides mitigating potential problems introduced by sequencing errors. The free energy is taken into account to reduce the number of false positives. The program to predict potential secondary structures in RNA viruses is embedded in a pipeline that, based on a reference sequence, obtains related sequences, builds a sequence alignment and phylogenetic tree, and creates all the necessary files to perform the analysis. The pipeline is then applied to a selection of viruses. Also, a novel tool is presented for improving the quality of multiple sequence alignments by trimming poorly aligned regions, and for visualising alignments and alignment processing steps.
The structure prediction pipeline presented identifies known functionally important structures and suggests new structures for potential experimental follow up. Different options and adjustable parameters allow for individual workflows. By filling an underrepresented niche, the software will hopefully help guide future molecular understanding of RNA viruses.