Repository logo
 

Research data supporting "Introgression dynamics of sex-linked chromosomal inversions shape the Malawi cichlid radiation"


Change log

Description

This repository contains data supporting Blumer et al 2025 “Introgression dynamics of sex-linked chromosomal inversions shape the Malawi cichlid radiation”.

IMPORTANT NOTICE: The Malawi cichlid samples were collected ethically under prescribed permits, and the results and data are published under an Access and Benefit Sharing agreement with the Government of Malawi. We acknowledge the contributions of the Malawi Department of Fisheries and the Government of Malawi for their assistance in the collection of samples and the generation of data and results. THIS DATA IS MADE AVAILABLE ON AN OPEN ACCESS BASIS FOR RESEARCH USE ONLY. Any person who wishes to use this data for any form of commercial purpose must first enter into a commercial licensing and benefit sharing arrangement with the Government of Malawi. For this reason, these data are made available under a CC BY-NC licence.

Contents of the repository:

  1. 22 VCF files: chr1.vcf.gz - chr20.vcf.gz, chr22-chr23.vcf.gz These are the primary files of genotypes at biallelic SNPs used for the genetic analysis in the paper. We collected and whole genome sequenced 1,375 Lake Malawi cichlids, of which 118 have been previously reported by us (Malinsky et al., 2018). We aligned all sequencing data to the Astatotilapia calliptera reference genome (fAstCal1.2; NCBI RefSeq: GCF_900246225.1) using bwa-mem (Li and Durbin, 2009) and called variants according to the bcftools paradigm (Danecek et al., 2013). The final callset contains 84 million biallelic SNPs that passed stringent quality control. Please note that due to naming conventions, there is no "chr21" in Malawi cichlids.

  2. fAstCal1.2.ancestral.fa.gz We inferred the ancestral sequence of the fAstCal1.2 reference genome (NCBI RefSeq: GCF_900246225.1) from whole genome alignments with two outgroup species (Oreochromis niloticus and Cyphotilapia frontosa).

  3. fAstCal1.6.fa.gz We used a new Hi-C dataset of A. calliptera to re-scaffold the fAstCal1.2 reference genome (NCBI RefSeq: GCF_900246225.1) after breaking it up at assembly gaps. The contigs are the same as in fAstCal1.2. We provide this sequence because it was used for some analyses in the paper, but we do not recommend using it as a reference genome for Astatotilapia calliptera. For all our genotype analyses in the paper we used fAstCal1.2, and for future studies we recommend to use the new fAstCal68.1 reference genome (GCA_964374335.1), which is substantially more complete and accurate.

  4. fDipLim2.1.fa.gz This is a chromosome-level assembly for Diplotaxoxon limnothrissa from PacBio HiFi reads and Hi-C data that was used in the paper. It is however somewhat fragmented and we plan to release a new, more contiguous and complete assembly to the public nucleotide databases in the near future.

For queries, please contact Richard Durbin, rd109@cam.ac.uk.

If you wish to use this dataset in your research, please cite the corresponding publication https://www.science.org/doi/10.1126/science.adr9961"

References: Malinsky M, Svardal H, Tyers AM, Miska EA, Genner MJ, Turner GF and Durbin R (2018). Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nature Ecology & Evolution, 2(12):1940-55. Li H and Durbin R (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14)1754-60. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM and Li H (2013). Twelve years of SAMtools and BCFtools. Gigascience, 10(2)giab008.

Version

Software / Usage instructions

The sequence data are in .fa.gz files https://en.wikipedia.org/wiki/FASTA_format and the genetic variant information is in .vcf.gz files https://en.wikipedia.org/wiki/Variant_Call_Format with further links available in these Wikipedia pages. Please take note that this data set is for non-commercial use only. Any person who wishes to use this data for any form of commercial purpose must first enter into a commercial licensing and benefit sharing arrangement with the Government of Malawi. Please refer to the methods section of the publication for details on how the dataset was created.

Publisher

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
Sponsorship
The authors gratefully acknowledge support through the Research Foundation – Flanders (FWO) (G047521N), the Wellcome Trust (Wellcome grant 207492), the Research Fund of the University of Antwerp (BOF), the Cambridge-Africa ALBORADA Research Fund.