Analysis solutions for nanopore sequencing data
Nanopore sequencing presents a number of significant advantages which allow the sequencing process to be tailored to your requirements:
- Real-time basecalling, enabling immediate access to results
- Stop sequencing as soon as sufficient data has been obtained
- Stop, wash and reuse a flow cell
- Onboard basecalling with Guppy means that neither a local infrastructure nor a stable internet connection is needed
The nanopore sequencing analysis workflow is simple and easy to follow: with five steps from raw data acquisition to analysis completion and experimental interpretation. From the moment data acquisition begins, analysis can be performed in real time. As detailed on this page, Oxford Nanopore provides solutions at each stage.
Primary data acquisition with MinKNOW
MinKNOW, the operating software that drives nanopore sequencing devices, carries out several core tasks, including data acquisition, real-time analysis and feedback, local basecalling, and data streaming – whilst providing device control including selecting the run parameters, sample identification and tracking, and ensuring that the platform chemistry is performing correctly to run the samples. MinKNOW produces FAST5 (HDF5) files and/or FASTQ files, according to your preference. FAST5 files contain raw signal data that can be used for basecalling.
Basecalling and primary data analysis with Guppy
Guppy is a data processing toolkit that contains the Oxford Nanopore Technologies' basecalling algorithms, and several bioinformatic post-processing features. It is provided as binaries to run on Windows, OS X and Linux platforms, as well as being integrated with MinKNOW, the Oxford Nanopore device control software.
Early downstream analysis components such as barcoding/demultiplexing, adapter trimming and alignment are contained within Guppy. Furthermore, Guppy now performs modified basecalling (5mC, 6mA and CpG) from the raw signal data, producing an additional FAST5 file of modified base probabilities.
Which analysis approach should I use?
|EPI2ME||EPI2ME Labs||Protocol & analysis tutorials||Community developed tools||Custom analysis pipelines|
|Bioinformatic capability needed|
|How||Use the cloud-based EPI2ME platform for real-time analysis workflows.||Use EPI2ME Labs for local, post-run analysis and data exploration.||Get analysis recommendations and clear tutorials on the use of open-source tools.||Run open-source tools written and developed by the Nanopore Community.||All the data, raw or basecalled, can be used in custom analysis pipelines written by the user for specific applications.|
|Where||EPI2ME Labs||Community||User defined|
EPI2ME: real-time analysis workflows
EPI2ME is a cloud-based data analysis platform, offering easy access to several workflows for end-to-end analysis of nanopore data in real-time. An intuitive graphical interface facilitates the interpretation of individual or multiple barcoded samples. Full QC metrics give feedback on run performance and include number of reads, read length distribution and quality scores.
Analysis workflows provided on the EPI2ME platform are described in the table below. For more detailed descriptions and tutorial videos for each workflow, please follow the “Analysis workflow” link.
|Rapid species identification of fungi, bacteria, viruses, or archaea, based on execution of the Centrifuge classification engine. Output: Real-time building taxonomic tree. Watch WIMP video|
|Real-time identification of bacteria and archaea at genus level using the 16S rRNA gene; with nanopore long reads, sequencing of the full-length 16S rRNA gene is achieved. The experimental workflow involves 16S gene amplification from a biological sample and nanopore sequencing. The 16S Rapid Sequencing Kit is available to buy on the nanopore store. Output: Classification report, including the number of reads analysed and their taxonomic classification. Watch 16S video|
|Comprehensive analysis of antimicrobial resistance (AMR) in individual and metagenomic samples. The ARMA workflow identifies the genes responsible for antibiotic resistance from FASTQ data, using species identification with WIMP and AMR identification via ARMA, which is integrated with the CARD database. Output: Report detailing the resistance genes found and gene overviews (depth of coverage, alignment details, and the resistance profile from the CARD database for specific genes). Watch ARMA video|
Human genome analysis
|Thorough exome alignment and analysis. Can be used for a variety of analyses, including amplicon sequencing, sequence capture and sequence enrichment. Reads are aligned to the human exome using the minimap2 aligner. Output: Report providing sequencing and alignment metrics, listed on a per-gene basis.|
|One-click analysis of structural variation (deletions, insertions and duplications) within a human whole-genome dataset versus the reference genome. Output: Report providing a searchable list of variants and their genomic location, and a VCF file of the results. Watch SV caller video|
|FASTQ alignment of reads in real-time against the GRCh38 human reference using minimap2. Output: Alignment report displaying depth of coverage across each chromosome, and accuracy of alignment.|
|Custom reference FASTA file upload to EPI2ME for subsequent read alignment using the Custom Alignment workflow. Output: Report of alignment success.|
|Tailor sequencing analysis to your specific requirements without the need for complex bioinformatics pipelines, by uploading and aligning to a custom FASTA reference. Reads are aligned to an uploaded custom FASTA reference using the minimap2 aligner Output: Report stating the success of the alignment, including depth of coverage across the reference, alignment accuracies, and number of reads analysed per barcode.|
Quality control and raw data processing
|Demultiplexing of barcodes in sequencing data; the barcoding option can be selected within other workflows, such as WIMP or Human Exome. Output: Demultiplexed reads returned in individual subfolders; a barcoding report is also produced, including the number of reads per barcode.|
|DNA QC. It is advised to run a Lambda control experiment to try out your nanopore sequencing platform before sequencing your own samples. Output: Report detailing sequence length, accuracy, quality score, and the amount of data generated.|
|RNA QC. It is advised to run an RNA control experiment to try out your nanopore sequencing platform before sequencing your own samples. Output: Report detailing sequence length, accuracy, quality score, and the amount of data generated.|
|A check to ensure that the EPI2ME cloud-based analysis is functioning correctly, and that basecalled reads can be uploaded and downloaded. Output: Confirmation of the sxuccess of read uploading and downloading.|
EPI2ME Labs: simplifying bioinformatics workflows
EPI2ME Labs is an innovative bioinformatics solution, designed to assist you in developing your skills and confidence in the analysis of your nanopore sequencing data. The tutorials included provide best practise examples of how to analyse and explore nanopore sequencing data, using both open source software and our own research tools.
EPI2ME Labs provides a number of beneficial features:
- Minimal installation: analysis and workflows delivered through your web browser
- Interactivity: explore your data interactively, such as browsing lists of genes and genomic regions, with extensive graphical views
- Customisation: “cutting-and-pasting” content between EPI2ME Labs notebooks (and inclusion of your own Python code in the notebook) enables you to create customised templates. EPI2ME Labs can also be integrated with your Github and/or Google accounts.
- Sharing workbooks and results: Google Colaboratory makes it simple to share notebook-based workflows and results with collaborators.
- Transparency: bioinformatics codes and logic are provided, and can be customised as you wish, although there is no requirement to use the command line.
- Browsers, viewers, and widgets provide further interactivity and ease of use
EPI2ME Labs is a notebook environment; a notebook is a reproducible document that integrates computer commands, graphical and tabular results, and accompanying text. Notebooks are accessed using Google Colaboratory; this provides a web interface to the notebooks (no software installation is required), and enables sharing of notebooks with others.
What’s the difference between EPI2ME and EPI2ME Labs?
EPI2ME is a cloud-based platform, with a graphical interface and simple, single-click solutions: no bioinformatics experience is needed. The platform provides pre-configured analysis workflows, and is focused on the real-time analysis of your data.
EPI2ME Labs is local, currently supported on the GridION. It is customisable, with the freedom to develop your own workflows and databases, and to modify outputs according to your preferences. EPI2ME Labs is designed to advance your bioinformatics skills, and assist you in tailoring analysis to your individual requirements. EPI2ME Labs is not designed to be a real-time analysis solution.
|Aim||Provide simple, one-click analysis solutions||Provide bioinformatics best practices and training|
|Focus||Simple, rapid, real-time analysis||Customisable, exploratory, post-run analysis|
EPI2ME Labs tutorials
EPI2ME Labs includes integrated tutorials, requiring no additional installation and providing step-by-step support with dynamic, interactive outputs. Tutorials are linked to from the EPI2ME Labs landing page, and a list of the current tutorials can be found here.
The following analysis tutorials are currently available in EPI2ME Labs, with more in development:
- Guidance for the review of sequencing summary statistics (BasicQC)
- Identification, orientation, and trimming of full-length cDNA transcripts (pychopper)
- Assembly of smaller genomes, including consensus polishing and assembly benchmarking
- Evaluation of a Cas9-mediated PCR-free targeted sequencing
- Analysis of base modifications (5mC in CpG context)
- An introduction to FASTQ
- Coming soon: Differential gene expression; structural variation calling
Protocol builder and analysis tutorials
Theprovides recommended extraction protocols, library preparation methods, and downstream analysis workflows, enabling you to build a bespoke end-to-end protocol to suit your specific requirements.
Ais now available providing tutorials on tools available for analysing your nanopore sequencing data. Each tutorial provides clear step-by-step instructions and example data. Current tutorials available include:
- Differential transcript usage and gene expression analyses using pipeline-transcriptome-de
- Annotation of gene transcripts and novel gene isoform discovery with pinfish
- Differential gene expression analysis with DESeq2
- Tutorial BasicQC providing guidance for the review of sequencing summary statistics based on the sequencing_summary.txt file produced by the Guppy basecalling software.
- Structural variation calling with pipeline-structural-variation
- Evaluation of read-mapping characteristics from a Cas-mediated PCR-free enrichment
Oxford Nanopore Technologies also has its own Github page featuring a wide variety of analysis tools, including those featured in our analysis tutorials, tailored specifically to the analysis of nanopore long-read sequencing data. Please refer to the “Custom analysis pipelines” tab for further information regarding the Github page.
To customise your analysis, FAST5 and FASTQ files produced by MinKNOW can be taken forward into a variety of analysis tools developed by users of nanopore technology. These tools are designed both to work with the long reads produced by nanopore sequencing, and to use real-time analysis wherever it is needed.
Such tools are available in the resources section and have a wide variety of applications, from data processing (e.g. demultiplexing and filtering) to assembly and variant calling.
Custom analysis pipelines
The standard FAST5/FASTQ data output from nanopore sequencing devices allows data utilisation in a variety of downstream analysis platforms and custom user-developed pipelines, tailored to your specific application.
Achieve the greatest flexibility by writing your own custom scrips from either FAST5 or FASTQ sequencing data and explore new routes of analysis tailored to your unique requirements. The research software Taiyaki can be used for training neural network models for basecalling of nanopore sequencing reads. This software is available from the Oxford Nanopore Github page.
Other key tools available on our Github page include:
- Medaka consensus tool: a tool for base-space consensus polishing, used by the team at Oxford Nanopore
- Medaka variant caller: a tool for calling small variants on nanopore data, used by the team at Oxford Nanopore
- Research basecallers: enabling users to deploy the latest algorithms from the research teams at Oxford Nanopore