Sequencing data analysis: from myths to mastery
Sequencing data analysis has a reputation for being expensive, complex, and accessible only to specialists, but is that still true? In a recent interview with Technology Networks, Dr Anthony Doran (Associate Director of Bioinformatics Field Applications at Oxford Nanopore Technologies) revisits these assumptions and explores how advances in compute, data formats, and analysis tools are changing the landscape. Anthony shares how modern, scalable approaches are making sequencing analysis efficient, informative, and accessible to researchers everywhere.
This blog is a curated summary of the interview transcript, containing the key insights shared by Anthony during the interview.
There's a perception that sequencing requires expensive hardware or specialist infrastructure. Is that still the case?
With Oxford Nanopore sequencing you can scale your computational needs to match your sample throughput. If you’re running just a few samples, a small device like MinION is all you need. For large cohorts, you can scale up accordingly. Our GridION, PromethION 2 Integrated, and PromethION 24 devices include built-in compute for intermediate processing, so you don’t need extra infrastructure for basecalling or post-basecalling analysis. As storage and compute become more efficient, we want to highlight those improvements to everyone because it ultimately means lower costs.
)
From portable to high-throughput, Oxford Nanopore provides scalable sequencing devices suitable for all applications.
Sequencing data is often thought to be huge and difficult to manage. Is that really true?
Big projects naturally create a lot of data, but the file formats we use today are far more efficient. We generate the BAM file format, which is the standard format for storing sequencing reads and their associated information, including genomic positions and, with Oxford Nanopore sequencing, methylation data. As a bioinformatician, I love how efficient the BAM format is compared with older formats like FASTQ. So, while scaling to thousands of samples has its own challenges, the data per sample has become much more manageable over the last couple of years.
How difficult is this to scale to large cohort studies?
Small studies have always been really achievable, but as genomics has become more accessible, large cohort projects have grown quickly. They do bring challenges around infrastructure, automation, and processing large amounts of data, but that’s exactly where we’ve invested a lot of effort. Our analysis pipelines are now highly optimised, user-friendly, and scalable. We’ve supported projects sequencing 50,000 genomes alongside other collaborators, and in one case over 100,000 genomes on a single project. A decade ago, that would have been almost impossible for any technology, but today it’s something we handle routinely.
)
Oxford Nanopore sequencing is powering novel insights across major population-scale genomics projects worldwide. HPRC: Human Pangenome Reference Consortium; CPC: Chinese Pangenome Consortium; APR: Arab Pangenome Reference.
In the past, sequencing analysis tools were seen as limited or hard to use. How would you describe the landscape today?
They absolutely used to be difficult to use; that’s why I became a bioinformatician! But as sequencing has matured, we now have many analysis tools built not just for experts but for any researcher. Our EPI2ME platform is a great example: it gives you a simple point-and-click interface to the same bioinformatics pipelines you’d run on the command line. That makes analysis far more accessible and saves teams from having to build everything themselves. Overall, bioinformatics has become much more user-friendly for the wider community.
)
The EPI2ME platform provides preconfigured workflows for a wide range of applications and is available via an intuitive desktop app or the command line.
Can any researcher now analyse their sequencing data, or do you still need the support of a bioinformatician?
Well, as a bioinformatician, I'm obviously not going to say that you can just get rid of all bioinformaticians! But what’s really exciting is that researchers now have a choice and can analyse their data without being a specialist. Bioinformaticians can then focus on exploring biology and analysis not represented by current pipelines. For anyone without the time or background in bioinformatics, user-friendly tools make analysis much quicker and far more accessible than in the past.
Accuracy is one of the most talked about aspects of sequencing. How should people think about it today?
Accuracy used to be judged mainly by Q scores — an indication of basecalling confidence — but that doesn’t tell the whole story. As we’ve learnt more about reference genomes and genomics in general, the way we think about accuracy has evolved. For example, over recent years, we’ve seen research highlighting regions of the genome that are invisible to legacy short-read sequencing technologies. So, when talking about accuracy, it’s important to consider how much of the genome you can actually access and what level of detail you can resolve.
The Q score in isolation doesn’t really answer that question; it doesn't tell you where in the genome you will be able to identify variants and doesn't tell you what types of genetic variation you can detect. Long Oxford Nanopore reads resolve regions short reads can’t, letting you detect structural variants, phase haplotypes, and see methylation — in one go. The same is true in transcriptomics — you capture full transcripts rather than fragments, providing isoform-level insights. So today, accuracy is less about a single metric and more about the completeness and richness of the biological information you can uncover.
)
Reveal more biology — in one go. Oxford Nanopore sequencing delivers highly accurate genomic, epigenomic, and transcriptomic data for comprehensive variant detection.
Sequencing technology has evolved rapidly over the last decade. How do researchers keep up without disrupting their day-to-day work?
Genomics moves fast, especially bioinformatics, and that’s certainly something we understand at Oxford Nanopore. We provide end-to-end workflows covering sample and library prep, sequencing, and analysis to help bring the latest capabilities to researchers in the most streamlined way possible. We also standardise specific software versions and analysis pipelines, which is especially important for long-term, large-scale projects. By making our software updates optional, our customers can choose either to benefit immediately from the latest algorithms and approaches or to wait until ongoing projects are completed, ensuring consistency and reproducibility.
The underlying nanopore technology doesn’t change between these iterations and updates, and this consistency enables our targeted sequencing approach, adaptive sampling, which performs real-time enrichment of specific genomic regions. Because adaptive sampling is based on mapping reads to the genome, adding new target sites doesn’t require revalidation, unlike traditional primer-based workflows, where each iteration of primer design needs testing and validation. We can just pull up the new targets file, run adaptive sampling, and the expectations will be the same.
Adaptive sampling is a flexible, on-device method that streamlines your workflow by enriching targets in real time — with no extra wet-lab steps or lengthy primer optimisation.
This interview has been summarised and edited for clarity and flow. Watch the full interview with Dr Anthony Doran.
As sequencing continues to advance, the data analysis challenges that once held researchers back are rapidly disappearing. With scalable devices, user-friendly analysis tools, and richer biological insights, Oxford Nanopore now enables every researcher to access multidimensional insights — in one go.
)
Don’t just take our word for it, hear from customers who have used nanopore sequencing to achieve scalability without compromise.
Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.
)