Comparison of long read sequencing technologies in resolving bacteria and fly genomes

Background The newest generation of DNA sequencing technology is highlighted by the ability to sequence reads hundreds of kilobases in length, and the increased availability of long read data has democratized the genome sequencing and assembly process. PacBio and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. Released in 2019, the PacBio Sequel II platform advertises substantial enhancements over previous PacBio systems.

Results We used whole-genome sequencing data produced by two PacBio platforms (Sequel II and RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. Sequel II assemblies had higher contiguity and consensus accuracy relative to other methods, even after accounting for differences in sequencing throughput. ONT RAPID libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries.

The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assemblies or combined ONT and Sequel II libraries for eukaryotic genome assemblies. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs.

Conclusions The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.

Authors: Eric S. Tvedte, Mark Gasser, Benjamin C. Sparklin, Jane Michalski, Xuechu Zhao, Robin Bromley, Luke J. Tallon, Lisa Sadzewicz, David A. Rasko, Julie C. Dunning Hotopp