Repository logo
 

Efficient iterative Hi-C scaffolder based on N-best neighbors.

cam.issuedOnline2021-11-27
dc.contributor.authorGuan, Dengfeng
dc.contributor.authorMcCarthy, Shane A
dc.contributor.authorNing, Zemin
dc.contributor.authorWang, Guohua
dc.contributor.authorWang, Yadong
dc.contributor.authorDurbin, Richard
dc.contributor.orcidMcCarthy, Shane [0000-0002-2715-4187]
dc.contributor.orcidDurbin, Richard [0000-0002-9130-1006]
dc.date.accessioned2022-01-10T12:45:38Z
dc.date.available2022-01-10T12:45:38Z
dc.date.issued2021-11-27
dc.date.submitted2021-05-02
dc.date.updated2022-01-10T12:45:38Z
dc.description.abstractBACKGROUND: Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. RESULTS: We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. CONCLUSIONS: Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution.
dc.identifier.doi10.17863/CAM.79936
dc.identifier.eissn1471-2105
dc.identifier.issn1471-2105
dc.identifier.others12859-021-04453-5
dc.identifier.other4453
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/332486
dc.languageen
dc.language.isoeng
dc.publisherSpringer Science and Business Media LLC
dc.publisher.urlhttp://dx.doi.org/10.1186/s12859-021-04453-5
dc.subjectHi-C
dc.subjectScaffolding
dc.subjectChromosomes
dc.subjectGenome
dc.subjectGenomics
dc.subjectHigh-Throughput Nucleotide Sequencing
dc.subjectSequence Analysis, DNA
dc.titleEfficient iterative Hi-C scaffolder based on N-best neighbors.
dc.typeArticle
dcterms.dateAccepted2021-09-07
prism.issueIdentifier1
prism.publicationNameBMC Bioinformatics
prism.volume22
pubs.funder-project-idNational Natural Science Foundation of China (2017YFC0907503, 2018YFC0910504, 2017YFC1201201)
pubs.funder-project-idWellcome Trust (WT207492)
rioxxterms.licenseref.urihttp://creativecommons.org/licenses/by/4.0/
rioxxterms.versionVoR
rioxxterms.versionofrecord10.1186/s12859-021-04453-5

Files

Original bundle
Now showing 1 - 3 of 3
No Thumbnail Available
Name:
12859_2021_Article_4453_nlm.xml
Size:
102.55 KB
Format:
Extensible Markup Language
Description:
Bibliographic metadata
Licence
http://creativecommons.org/licenses/by/4.0/
Loading...
Thumbnail Image
Name:
12859_2021_Article_4453.pdf
Size:
1.7 MB
Format:
Adobe Portable Document Format
Description:
Published version
Licence
http://creativecommons.org/licenses/by/4.0/
Loading...
Thumbnail Image
Name:
12859_2021_4453_MOESM1_ESM.pdf
Size:
2.7 MB
Format:
Adobe Portable Document Format
Description:
Published version
Licence
http://creativecommons.org/licenses/by/4.0/