How basecalling works
- Home
- Platform solution
- How nanopore sequencing works
- How basecalling works
Nanopore sequencing is the passing of a single molecule through a nanopore that has an ionic current flowing across it. When the molecule passes through a pore, it disrupts the current across the pore and changes the electrical signal characteristically. During nucleic acid sequencing, these electrical signal changes are decoded using basecalling algorithms to determine the DNA or RNA sequence in real time.
Capturing the signal
When sequencing DNA or RNA through nanopores, the characteristic electrical signals are recorded by MinKNOW™, the software that controls Oxford Nanopore Technologies sequencing devices. This entire characteristic electrical signal is known as a ‘squiggle’. MinKNOW processes the squiggle into reads in real time — each read corresponding to a single strand of sequenced DNA or RNA. These reads contain not only canonical bases but can also include base modifications such as methylation.
The nanopore structure
The structure of the nanopore determines the information contained within a squiggle: the raw signal that reflects the molecules that have passed through the nanopore before basecalling. Different nanopores contain different ‘readers’. The previous R9 nanopore had a single reader in the middle of the barrel, but the new and improved R10 nanopore has two readers spaced along its length, meaning more bases within a DNA or RNA strand can contribute to the squiggle at any one time. This leads to improvements in capturing signals around homopolymer regions, where multiples of the same nucleotide appear one after the other on a DNA or RNA strand.
Basecallers
Basecalling algorithms process the raw signal to decode the sequence of bases within strands of DNA or RNA into data stored in BAM or FASTQ files. Dorado, the default basecaller integrated within MinKNOW, can perform basecalling during or after sequencing, depending on experimental needs. All basecalling software and base modification models are first released as open-source tools on the Oxford Nanopore GitHub to provide the latest features and accuracy improvements as early as possible. By providing open access to the newest advances in software tools, researchers can provide feedback and help shape the progress of nanopore technology whilst benefitting from new performance features before they are integrated into MinKNOW.
Neural networks
The basecalling algorithms currently deployed by Oxford Nanopore Technologies are based on neural networks (a machine learning model) to predict base sequences from the raw signal. These computational neural networks are loosely modelled on biological neural networks within the human brain, with layers of ‘nodes’ (equivalent to neurons) passing data between themselves to arrive at a predicted base sequence. Crucially, just like a human brain, these neural networks can learn and improve their predictions over time. Oxford Nanopore Technologies uses a variety of neural network architectures, including transformer models and recurrent neural networks, to develop basecaller algorithms. The variety of architecture allows information from across the entire raw signal to better inform the basecaller output. Alternative basecalling algorithms are continuously being developed and assessed by Oxford Nanopore to improve the accuracy and speed of basecalling models.
Machine Learning
The basecalling algorithms provided by Oxford Nanopore Technologies ‘learn’ to determine nucleotide sequences more accurately via machine learning — which neural networks are a subset of — in order to minimise errors during basecalling. The basecalling algorithms are trained using data of known sequences, which guides the algorithm to correctly predict bases without human input. Once trained, they are validated with a subset of data not included in the training dataset to determine the accuracy of the basecaller model. Training datasets significantly determine the performance of a basecaller; therefore, the default basecalling models are trained on a wide variety of data, including a mixture of native and amplified DNA or RNA from multiple organisms, such as plant and animal, as well as bacterial and viral genomes. This approach can also be finetuned to create modification-aware basecallers because training datasets can include native nucleic acid sequences with specific base modifications present.
Outstanding accuracy improvements with basecalling updates
Advancements in both the Oxford Nanopore Technologies platform and machine learning have consistently improved basecalling accuracy results, including raw-read single molecule accuracy, consensus accuracies, and completeness of genome assemblies.
Basecalling acceleration
During sequencing experiments, MinKNOW streams signal data in real time, meaning basecalling can begin even before a DNA or RNA strand has finished passing through a nanopore. For basecallers to keep up with sequencing, they utilise graphical processing units (GPUs) to calculate multiple values in parallel and provide real-time data.Devices with integrated compute (for example, GridION and PromethION 24) feature onboard GPUs to enable real-time basecalling. This setup is compatible with modification calling, barcode demultiplexing, and alignment to a reference genome during live sequencing.
Subscribe
Get in touch
Talk to us
If you have any questions about our products or services, chat directly with a member of our sales team.
Book a sales call
To book a call with one of our sales team, please click below.