Repository logo
 

PymiRa: A rapid and accurate classification tool for small non-coding RNAs, including microRNAs.

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Abstract

Small non-coding RNAs (sncRNA; < 200 nucleotide length) are of increasing research interest due to their key regulatory roles in a host of fundamental biological processes. For example, microRNAs (miRNAs), a specific class of sncRNAs, regulate gene expression through messenger RNA (mRNA) interactions, and their dysregulation is associated with disease. Classifying sncRNAs is an important bioinformatic task in small RNA-sequencing pipelines. Here we have developed an aligner called PymiRa, written in Python, to identify and quantify miRNAs from FASTA/FASTQ sequencing files. Unlike other approaches, PymiRa utilises a Burrows-Wheeler algorithm to align an input file against a reference hairpin precursor FASTA file derived from miRBase, the online miRNA registry, permitting up to two mismatches at the 3' end of a read. Previous tools used either a Burrows-Wheeler genome alignment or dynamic programming alignment to precursors; we demonstrate that combining both approaches yields improved results and efficiency. Importantly, the PymiRa aligner accounts for 3' post-transcriptional modifications to miRNAs that typically occur. PymiRa is a fast, accurate, and publicly accessible aligner available via GitHub and/or a webserver for sncRNA identification, including miRNAs, enabling accurate counts to be produced as part of a small RNA-sequencing pipeline. PymiRa will undergo relevant revisions over time e.g., with miRBase version updates. The PymiRa aligner will facilitate a deeper biological understanding of the landscape of sncRNA expression in normal physiological conditions and their dysregulation in disease states, including cancer.

Description

Journal Title

PLoS Comput Biol

Conference Name

Journal ISSN

1553-734X
1553-7358

Volume Title

Publisher

Public Library of Science (PLoS)

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International
Sponsorship
This work was funded through the Vice-Chancellor Award Biosciences Doctoral Training Partnership (DTP) PhD Studentship, University of Cambridge, UK, funded by the Doctoral Landscape Awards under UK Research and Innovation (UKRI), with support from the Waldmann fund (Department of Pathology, University of Cambridge, UK) for ZS. The funders had no role in study design, data collection and analysis, decision to publish, nor preparation of the manuscript.