The carbon footprint of bioinformatics
View / Open Files
Authors
Grealey, Jason
Lannelongue, Loïc
Saw, Woei-Yuh
Marten, Jonathan
Meric, Guillaume
Ruiz-Carmona, Sergio
Publication Date
2021Journal Title
Molecular Biology and Evolution
ISSN
0737-4038
Publisher
Oxford University Press (OUP)
Type
Article
This Version
AM
Metadata
Show full item recordCitation
Grealey, J., Lannelongue, L., Saw, W., Marten, J., Meric, G., Ruiz-Carmona, S., & Inouye, M. (2021). The carbon footprint of bioinformatics. Molecular Biology and Evolution https://doi.org/10.1101/2021.03.08.434372
Abstract
<jats:title>Abstract</jats:title><jats:p>Bioinformatic research relies on large-scale computational infrastructures which have a non-zero carbon footprint. So far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this study, we estimate the bioinformatic carbon footprint (in kilograms of CO<jats:sub>2</jats:sub> equivalent units, kgCO<jats:sub>2</jats:sub>e) using the freely available Green Algorithms calculator (<jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://www.green-algorithms.org">www.green-algorithms.org</jats:ext-link>). We assess (i) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics and molecular simulations, as well as (ii) computation strategies, such as parallelisation, CPU (central processing unit) vs GPU (graphics processing unit), cloud vs. local computing infrastructure and geography. In particular, for GWAS, we found that biobank-scale analyses emitted substantial kgCO<jats:sub>2</jats:sub>e and simple software upgrades could make GWAS greener, e.g. upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Switching from the average data centre to a more efficient data centres can reduce carbon footprint by ~34%. Memory over-allocation can be a substantial contributor to an algorithm’s carbon footprint. The use of faster processors or greater parallelisation reduces run time but can lead to, sometimes substantially, greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimise kgCO<jats:sub>2</jats:sub>e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.</jats:p>
Sponsorship
Medical Research Council (MR/L003120/1)
British Heart Foundation (None)
British Heart Foundation (RG/18/13/33946)
Identifiers
External DOI: https://doi.org/10.1101/2021.03.08.434372
This record's URL: https://www.repository.cam.ac.uk/handle/1810/334169
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.