Nanopore sequencing accuracy

For many years Oxford Nanopore has continuously iterated our technology to improve its performance. We continue to improve the nanopore sensing system, through updates to analytical methods and new chemistries. This page guides you on what to expect from the nanopore sequencing system, and which tools to choose to achieve these results.

Introduction

Nanopore DNA and RNA sequencing accuracy can be measured in a number of ways, and the relevant metric for a scientist will depend on the specific experiments being performed.

As with all systems, choosing the most up to date analysis tools for the analysis that you are interested in is critical, and the quality of the sample can also influence the outcome. With so many relevant variables, clear guidelines are important, and below we have defined some accuracy measurement types, and included recommendations for best performance.

Raw read accuracy

Nanopore sequencing provides direct electronic analysis of the target molecule, rather than sequencing a synthetic copy or using surrogate markers such as fluorescence. Basecalling algorithms are then used to provide an interpretable output of the sequencing reads. Nanopore basecalling algorithms are continuously improved to enhance accuracy over time, also allowing new methods to be applied to previously sequenced raw data.

Direct sequencing avoids sources of bias such as PCR and gives native information about the target molecule. We define raw read accuracy as the accuracy achieved when reading a single DNA or RNA fragment/molecule once. Applications for which raw read sequencing is relevant include those where time-to-result maybe be critical, but at this time most applications are more likely to focus on variant calling, consensus accuracy or other metrics. Improvements in raw read accuracy can drive improvements in other accuracy metrics.

Single molecule accuracy is similar to raw-read accuracy, but in the case of duplex reads combines the basecalled data from template and complement strands of a single DNA molecule into a higher-quality basecall. Duplex data is capable of delivering data in excess of Q30, and perfect reads from DNA molecules 10s of kilobases in length.

Latest updates to nanopore sequencing achieve:

Flow cell Kit Sequencing & basecalling parameters Sample Raw read accuracy Output
R10.4.1 Ligation Sequencing Kit V14 400 bps, 5 kHz, HAC basecalling Human HG002 99.0% (Q20) ●●●
R10.4.1 Ligation Sequencing Kit V14 400 bps, 5 kHz, SUP basecalling Human HG002 99.5% (Q23) ●●●
R10.4.1 Ligation Sequencing Kit V14 400 bps, 5 kHz, Duplex basecalling Human HG002 >99.9% (Q30)

Variant calling

Single nucleotide variants (SNVs), small indels and structural variants (SVs) are critical for our understanding of how genomic changes drive phenotypes. The ability of nanopore technology to sequence any length of nucleic acid molecule allows for unprecedented resolution of complex structural variants, as well as identification and haplotype phasing of single nucleotide alterations.

The ability to accurately call variants is often expressed as precision and recall values, generated from reads covering the position of interest multiple times. Precision is the proportion of calls in the call set that are correct, whereas recall is the percentage of variants present in the genome that are found in the call set.

The latest harmonic mean of precision and recall (F1 score) for nanopore chemistries can be found in Figure 1. The tool chain to achieve similar metrics is reported in the legend

Read more about structural variation and small variant calling & phasing.

Oxford Nanopore Technologies Open Datasets: SV, SNP.

Latest updates to nanopore sequencing achieve:

Figure 1: The latest accuracy data obtained on V14 kit and R10.4.1 flow cells, measured as F1 (harmonic mean of precision and recall) for variant calling, using nanopore sequencing data for the human genome (HG002 cell lines) at several read depths (Kit V14 400 bps 5kHz). Variant calling was performed with the latest version of the Clair3 variant caller, and variants were compared against the Genome In A Bottle consortium’s HG002 truthset (v4.2.1). SNVs and Indels are represented with solid colours, while Indels in CDS regions are displayed with white boxes. A) F1 score for basecalling models of Super accuracy (SUP, v4.2.0) using Dorado v0.2.5 B) F1 score for basecalling models of High accuracy (HAC, v4.2.0) using Dorado v0.2.5.

Consensus accuracy

Building a consensus sequence involves combining multiple copies of a specific DNA/RNA region, sequenced in separate reads, into a single high-quality sequence. In doing so, the multiple copies combined together to form a single sequence means any random errors are averaged and so 'cancelled' out, producing a more accurate ‘consensus’ sequence to work from.

Find out more about assembly & whole-genome sequencing.

Latest updates to nanopore sequencing achieve:

Flow cell Kit Consensus accuracy Sequencing & basecalling parameters Analysis tools Sample
R10.4.1 Ligation Sequencing Kit V14 Ultra-long Sequencing Kit V14 Telomere-to-telomere (T2T): 99.994%* 18 full chromosome haplotype- resolved, N50>135 Mb 400 bps, 5 kHz, simplex SUP, duplex Assembly with Verkko, phasing with Gfase Human HG002
R10.4.1 Ligation Sequencing Kit V14 Q50 at 10-20x 400 bps, 4 kHz, simplex SUP Assembly with Flye Zymo mock community (bacterial)

*Generated by combining approx. 40x duplex, 40x ultra-long and 40x Pore-C

Single molecule consensus

Consensus generation can also be applied to specific regions of interest, by combining multiple exact copies of a single original fragment or molecule into a single high-quality sequence. These exact copies could be sequenced together in a single read, for example generated by circular or linear amplification, or could be associated by use of a unique identifier (UMI). Through combining multiple copies together, a higher confidence in accuracy is achieved.

Applications where single molecule consensus could be particularly useful include liquid biopsy low-frequency variant detection, or 16S sequencing.

Latest updates to nanopore sequencing achieve:

Chemistry Single molecule consensus accuracy Analytical tools Sample
R10.3 ~99.995%, Q45 UMI rRNA amplicons (15X)

Covering all of the genome

To create an accurate picture of the genome, it is important for a sequencing technology to reach all parts of it, even the parts which are difficult to map. Genomes are littered with repetitive and low-complexity regions, which are difficult to sequence and align using traditional technologies. For example, it is estimated that short-read technology reaches only 92% of the human genome, leaving 8% that contains many disease-relevant genes, excluded from the dataset. Nanopore technology has been shown to reduce these “dark” areas of the genome by 81%, shedding light on parts of the genome not sequenced by any other technology (Ebbert, 2019), and giving a more complete picture. Ultra-long nanopore sequencing reads were central to completing the human genome, allowing to resolve of repetitive regions unresolvable with other technologies (Nurk, et al., Science, 2022).

Tuning accuracy for your experimental need

Want to fine-tune accuracy based on your needs? Choose between duplex and simplex basecalling models.​​

Simplex reads: generated by reading a single strand through a nanopore. Accuracy fine-tuned with basecalling models:

  • Fast basecalling: fastest, least computationally intense, highest compatibility with real-time basecalling on device​
  • High Accuracy basecalling (HAC): highly accurate, intermediate speed and computational requirement. Good compatibility with real-time basecalling device​
  • Super accuracy basecalling (SUP): the most accurate, more computationally intense​

Note: modified basecalling (e.g. 5mC and 5hmC) can be performed alongside any of the basecalling methods mentioned above.​​

Higher-quality reads are now available from the “squiggle”: sampling frequency has been increased from 4000 to 5000 samples per second (5 kHz) since MinKNOW 23.04 release, with more data points for basecalling. As a result, all read accuracies are enhanced for both duplex and simplex and all basecalling models.

Our latest Q20+ chemistry enables duplex reads: the second strand can follow the first through the same nanopore, producing information from two orthogonal signals, merged into one consensus sequence. Single molecule accuracy of duplex is ~Q30 or higher. A specific basecaller for duplex reads is available.​

Interested in accessing high-duplex flow cells? Register your interest here. ​

Explore simplex sequencing​

Simplex Duplex
Basecalling model FAST HAC SUP Duplex

Recommended for

Fast analysis: genomics variants (SVs, SNVs, etc.), phasing, de novo assembly, etc.

High accuracy analysis: genomics variants (SVs, SNVs, etc.), phasing, de novo assembly, etc.

Super accuracy analysis: genomics variants (SVs, SNVs, etc.), phasing, de novo assembly, etc.

Highest quality and accuracy: de novo assembly, T2T

Output

●●●

●●●

●●●

Computation requirements

●●

●●●●

●●●●

Table 1. Recommendations, output and computational requirements of simplex and duplex reads sequencing in combination with available basecalling models.

Base modifications

The four ‘canonical’ bases (A, C, G and T in DNA and A, C, G and U in RNA) can be biologically modified by the presence of additional chemical group, such as methylation. These modifications can significantly alter gene expression and are implicated in a range of diseases including cancer. Scientists are only just beginning to scratch the surface of how newly-recognised epigenetic changes impact function, for example, RNA is known to possess over 170 distinct modifications.

Oxford Nanopore’s technology can sequence the DNA or RNA molecules directly, enabling direct, real-time detection of 5mC, 5hmC, 6mA.

This allows for detection of these base modifications with no additional experiments or sample preparation steps required, and modification information is accessible through onboard software. In contrast, traditional technologies can require a separate process called bisulphite sequencing, which uses aggressive sample treatment and has a number of limitations.

Figure 2. Oxford Nanopore 5mC data comparison to synthetic (A) and bisulfite sequencing data (B). Basecalling of 5mC on synthetic strands with known composition is extremely accurate with precision, recall and F1 score all above 99% (A). Oxford Nanopore data for the human sample HG002 shows much higher confidence CpGs (>90%) at a much lower depth than bisufite whole genome sequencing. All data reported in this figure was generated with V14 chemistry and R10.4.1 pores with 400 bps speed, 4 kHz using SUP basecalling models.​

Test accuracy

Sequencing may be used to perform a certain biological test, for example presence or absence of a particular organism, species identification, testing for one or more genetic variants, or to perform multi-omics testing in one assay. Test accuracy can be defined as the ability of the technology to answer that question correctly every time, and this can be quantified by identifying the proportion of true and false positives and negatives among a total number of cases. Test accuracy is an important metric for areas such as food safety, and microbial surveillance. Nanopore sequencing has been shown to be effective at accurately performing many different types of tests. Browse the resource centre for examples.

For these examples, the analysis pipeline is specific to the test in question, but tool recommendations can be found in the protocol builder.

In 2020, the UK Government published a study of 23,000 samples showing that Oxford Nanopore’s first regulated test has gold-standard accuracy. Read the study.

Future developments

Our goal is to enable to genetic analysis of anything, by anyone, anywhere, and as such we are pursuing constant iterative performance improvements. For many years Oxford Nanopore has continuously iterated our technology to improve its performance. We continue to improve the nanopore sensing system, boosting accuracy performance through updates to analytical methods and new chemistries. Latest releases can be found in the Nanopore Community, or in the News section.

Subscribe

Get in touch


Talk to us

If you have any questions about our products or services, chat directly with a member of our sales team.


Book a sales call

To book a call with one of our sales team, please click below.