Repository logo
 

Chromosome-scale assembly comparison of the Korean Reference Genome KOREF from PromethION and PacBio with Hi-C mapping information.

Published version
Peer-reviewed

Type

Article

Change log

Authors

Kim, Hui-Su 
Jeon, Sungwon 
Kim, Changjae 
Kim, Yeon Kyung 
Cho, Yun Sung 

Abstract

BACKGROUND: Long DNA reads produced by single-molecule and pore-based sequencers are more suitable for assembly and structural variation discovery than short-read DNA fragments. For de novo assembly, Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are the favorite options. However, PacBio's SMRT sequencing is expensive for a full human genome assembly and costs more than $40,000 US for 30× coverage as of 2019. ONT PromethION sequencing, on the other hand, is 1/12 the price of PacBio for the same coverage. This study aimed to compare the cost-effectiveness of ONT PromethION and PacBio's SMRT sequencing in relation to the quality. FINDINGS: We performed whole-genome de novo assemblies and comparison to construct an improved version of KOREF, the Korean reference genome, using sequencing data produced by PromethION and PacBio. With PromethION, an assembly using sequenced reads with 64× coverage (193 Gb, 3 flowcell sequencing) resulted in 3,725 contigs with N50s of 16.7 Mb and a total genome length of 2.8 Gb. It was comparable to a KOREF assembly constructed using PacBio at 62× coverage (188 Gb, 2,695 contigs, and N50s of 17.9 Mb). When we applied Hi-C-derived long-range mapping data, an even higher quality assembly for the 64× coverage was achieved, resulting in 3,179 scaffolds with an N50 of 56.4 Mb. CONCLUSION: The pore-based PromethION approach provided a high-quality chromosome-scale human genome assembly at a low cost with long maximum contig and scaffold lengths and was more cost-effective than PacBio at comparable quality measurements.

Description

Keywords

Hi-C, KOREF, Korean reference genome, PromethION, nanopore sequencing, single-molecule sequencing, Chromosomes, Human, Contig Mapping, Cost-Benefit Analysis, Databases, Genetic, High-Throughput Nucleotide Sequencing, Humans, Republic of Korea, Single Molecule Imaging, Whole Genome Sequencing

Journal Title

Gigascience

Conference Name

Journal ISSN

2047-217X
2047-217X

Volume Title

8

Publisher

Oxford University Press (OUP)
Sponsorship
European Research Council (647787)
This work was supported by U-K BRAND Research Fund (1.190007.01) of Ulsan National Institute of Science and Technology; Research Project Funded by Ulsan City Research Fund (1.190033.01) of Ulsan National Institute of Science and Technology and Clinomics internal funding for KOREF sequencing using a PromethION machine.