London Calling 2017: Day 2 updates

day 2 LC.jpg

Plenary: Nick Loman

The first plenary lecture of the morning was from Nick Loman who works in the Institute for Microbiology and Infection at the University of Birmingham. Nick started the presentation by showcasing the portable nature of the MinION, with sequencing carried out across a range of locations, including on public transport, the Antarctic and Arctic poles, and even in space! This portability meant Nick and his team were able to carry out real-time in-field sequencing during the Ebola outbreak. Fast sequencing and sharing of the data allowed them to identify cross border transmissions, links between cases, and the sources of flare-ups, including one in 2016 caused by a survivor infected >500 days previously. More recently, Nick worked on sequencing the Zika outbreak. Metagenomics is especially challenging for the Zika virus due to low levels of DNA, so the team created the ‘Primal Scheme’ tool for rapidly designing multiplex PCR panels. This now has validated schemes for both Zika and Yellow fever, among others, making real-time viral outbreak surveillance a powerful tool for preventing future epidemics.


Nick also gave an update on his work towards the 1MB read. His longest read so far is 886 kb, one of 7 reads that cover the whole E. coli genome. Recent improvements to the technique include extracting E. coli DNA in an agarose plug to maintain the whole chromosome. Increasingly long read length should eventually allow us to complete a human genome and has implications for the study of bacterial resistance and metagenomics. Nick’s wish-list for the future includes faster local basecalling and ultra-low input.

Lightning talks:

Niranjan Nagarajan from the University of Singapore showcased his work exploring resistance in gut bacteria. By carrying out metagenomics sequencing from patient samples, he was able to study the dynamics of resistance plasmids. Use of the MinION data significantly boosted plasmid assembly, allowing annotation of the resistome from a single patient sample.

Michael Boemo from the University of Oxford presented work studying chromosome replication dynamics. Adding thymidine analogues to a replicating chromosome creates regions of modified bases that can be detected using the MinION. A hidden Markov Model was trained for the detection of the base analogues, with promising results within a pilot system. Michael will be moving onto in vivo systems once the 1D2 system is available.

Celine Bigot is currently carrying out her postdoc at the Centre National De Genotypage working on the PathoTRACK project. They are aiming to identify threat agents in metagenomics samples, and have set up a test using a standard mixed microbial community with three different sequencing systems: the Illumina MiSeq, the ONT MinION and the Ion Torrent PGM. Preliminary results with WIMP (What’s In Your Pot) analysis on the MinION found all microbial species in only 5 hours, some to strain level.

LC2 talk.jpg

Scott Gigante from the Walter & Eliza Hall Institute of Medical Research discussed his work using neural networks to convert raw nanopore signals into basecalling. He is hoping to move away from the Hidden Markov Models and Scott feels the nanonet local basecaller provides an excellent opportunity for researchers to program neural networks to address a wide range of biological questions.

Ben Matern from the Department of Transplantation Immunology at the Maastricht University Medical Center has been working on identifying human leukocyte antigen splice variants by MinION cDNA sequencing. Using the GMAP splicing aware aligner he identified exons in aligned regions and built an in-house software to interpret and summarise expression profiles as an alternative approach to quantitative expression analysis. As a next step, he wants to directly sequence the resulting mRNA variants using MinION RNA sequencing.

Benjamin Istace from Genoscope has been using Oxford Nanopore Technologies long read capabilities to provide a high quality reference banana genome. He used gDNA sequences and Blue Pippin sizing to select the longest fragments, followed by library prep and running on the R9.4 flowcell. Assembly was carried out with Canu, with the longest reads run on SMART denovo and finalized with Pilon. Only 4 nanopore runs were required for plain genome assembly.

Matthew McCabe from Teagasc has been working on identifying DNA and RNA viral species associated with bovine respiratory diseases (BRD) using a PCR-free rapid sequencing kit. Current qPCR diagnostics detect viruses in <30% of BRD cases, and so there is a requirement for rapid untargeted BRD viral pathogen diagnosis – preferably in a form that can be used on the farm! Using MinION sequencing on three combined lung cultures, he managed to correctly identify 99.6% of the viral reads.

Libby Snell from Oxford Nanopore presented the new direct RNA sequencing kit, the only technology capable of reading bases from native RNA. The current kit can sequence RNA in just 2 hours and Oxford Nanopore are working on reducing the total time to 30 minutes. As discussed yesterday in Miten Jain’s presentation, the sequencer is also capable of detecting modified RNA bases, and will be compatible with VolTRAX.

Breakout: Targeted Techniques

Andy Heron: Enrichment and sensitivity methods

- Andy Heron, Senior Director of Advanced Research from ONT, gave a whistle stop tour of the current Cas9-related sensitivity and targeted techniques being developed at Oxford Nanopore. A big emphasis was given to the fact the nanopore actually becomes saturated for sequencing at very low sample amounts; the difficulty remains in getting that DNA to the pore. ng amounts of DNA will completely occupy a pore, and in conjunction with Cas9 enrichment means all sequenced reads are on the target of interest. The Cas9-mediated techniques lead to 100-1000x enrichment of targets over background samples, with future R9.5 chemistry upgrades providing methods of delivering those straight to the pore. Time and cost improvements will result from incorporation of these approaches into Oxford Nanopore products and protocols.

Tslil Gabrieli: Cas9-Assisted targeting of chromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping

- Tslil Gabrieli of Tel Aviv University covered their use of Cas9 for extracting very large and complex regions from the genome, allowing the ideal of targeting regions known for disease-causing structural variation without having to carry out whole genome sequencing. An advantage to CATCH lies in the fact only the flanking sequence needs to be known, target sequences can be unknown. Using CATCH, pulsed-field gel electrophoresis and the Oxford Nanopore low input sequencing kit, they obtained >70x coverage of 200kb fragments. They also functionalised Cas9 to aid in optical mapping of the DNA, which in conjunction with the CATCH process allowed them to create hybrid scaffolds of high quality, along with structural variants and more targeted mutations.

Alfonso Benitez-Páez: multi-locus amplicon sequencing approach to study microbial diversity at species level

- Finally Alfonso Benitez-Páez from IATA presented his work on the study of microbial communities in complex samples. A special interest still remains in discovering the microbiome of the human gut, with the questions "who's there?" and "what are they doing?" being the predominant answers sought. 16S-based methods are still the major approach used, but they wondered if it was possible to selectively sequence a larger genomic region common to all bacteria. PCR of the rrn region lead to long fragment amplification of ~5kb, but required the compiling of a reference database of almost 70,000 bacterial genomes to make the analysis possible. They proved that microbial mock communities could be accurately represented using R9 and R9.4 chemistry, outperforming the results seen with other sequencing platforms. Nanopore sequencing therefore presents a viable alternative to 16S sequencing for identifying microbial communities affecting human health, and other areas.


Breakout: RNA & cDNA

John Tyson from the University of British Columbia shared his first experiences of the direct RNA kit and some work on targeted full length cDNAs. John discussed the reasons why he is using nanopore sequencing to study the transcriptome: because it enables full length transcripts providing full splice variant information, transcript start and stop sites as well as base modifications with the direct RNA kit. In his first experiments with the direct RNA kit John used an RNA sample that had been stored in for 18 years in the freezer. Despite the sample age the sample generated his 1.1 million reads (1.3Gb), a 1.2Kb average read length with the largest mappable read over 14Kb. By comparing the basecalls in direct RNA and cDNA John has identified sites that are likely to contain modified bases which he looks to follow up next. John went on to discuss his research where he is looking at low abundance calcium channel transcripts. He used a targeted cDNA approach and is using the splice variant information provided by the long reads to study isoforms associated with epilepsy.

Rachael Workman from Johns Hopkins University shared her work where she compared the direct RNA kit and a cDNA strand switching library prep using C elegans. Rachael found the approaches to be comparable in many respects showing good concordance between the two when looking at transcript abundance. Interestingly there were a number of transcripts identified with the direct RNA that were not detected with the strand switching approach. Rachael also demonstrated how homopolymers present in the transcripts are successfully identified in both the RNA and cDNA using the transducer based basecalling method implemented in the latest software. Rachael has high hopes to begin using the direct RNA kit to look at polyA tail length.

Next, Chris Vollmers from UC Santa Cruz explainedhowsingle cell gene expression profiling is the next step in understanding individual cell function. Chris showed how the long read sequencing technologies used in the MinION device make it possible to gather information on the complex diversity of transcript isoforms even at the single cell level.

Using a PCR-based cDNA approach Chris was able to amplify tiny amounts of reverse transcribed mRNA from single B cells. Furthermore each sample was multiplexed using their own homebrew barcoding method.

Comparing results generated by Oxford Nanopore’s long read technology and other short read sequencing methods showed that both approaches had incredibly high correlations in terms of gene expression profiling. This convincingly showed how expression profiling was very much possible on the MinION device from complex sample types such as human.

Chris went onto talk about the advantages of long read sequencing in the detection of splice variants. Next Chris showed that, using tiny amounts of input mRNA from known synthetic mRNA transcripts with a high number of splice variants, reliable isoform identification was achieved with their analysis pipeline Mandalorion.

Andrew Smith from UC Santa Cruz presented work detailing the detection of modified bases in 16S ribosomal RNA using direct RNA sequencing. The small subunit of the ribosome has many essential roles in cellular function and is often use as a phylogenetic marker gene.

Using a customised oligonucleotide to target the 16S rRNA allowed the selective addition of the sequencing adapter onto the 16S rRNA itself. Andrew was able to see very good full-length coverage of the 16S rRNA in the 10s to hundreds of thousands of reads. Subsequently Andrew and his team used this approach to get high quality taxonomic identifications of particular microbial taxa.

Moving onto epigenetic modifications in the 16S mRNA, Andrew and his team compared two strains of E.coli, one of which was unable to methylate RNA. A modified ribonucleotide was consistently detected at an a priori predicted point and highlighted the potential use of this type of analysis in methylation based antibiotic resistance detection.

LC audience day2.jpg

Breakout: pathogens and infectious disease

First up, we had Oliver Pybus discussing the genomic epidemiology of Zika virus in the Americas. He described how in the past, when he required data to test new epidemiology methods, Oliver used to have to collaborate with other teams to generate the data however now, the MinION allows him (and others like him) to take back control. The portability also allows the possibility of directly analysing samples from patients on-site.

Zika is a flavivirus spread (for the most part) by mosquitoes that causes mild fevers. Oliver traced the history of Zika, how it had been characterised and catalogued some time ago - along with hundreds of thousands of obscure viruses and not investigated any further until the first expected transmission event in the Americas in 2015. He showed the subsequent rise in microcephaly in children born during that time and how in February 2016 WHO had designated the Zika outbreak as a Health Emergency of International Concern.

So how can genetic investigation of Zika help the children with microcephaly and their parents? Oliver’s work aimed to: characterise the genetic diversity of the zika population - an essential for vaccine response; characterise mutations that are directly associated with severe disease (unfortunately this has not been found to be the case) and finally, track the spread of Zika across Brazil. Evolutionary analysis of zika genomes help track what has happened, any links with rises in encephaly and can help predict what will happen in the future.

ZIBRA there were estimated 37 million infected with only 10 genome sequences available - not a good ratio for surveillance! Oliver and his team travelled the coast of Brazil sampling individuals and mosquitoes. At first there were issues with low viral load in samples, but this was mitigated by Josh Quick’s Primal Scheme protocol. Interestingly, Oliver investigated the coverage of genome required to do reliable phylogentic placement, and found it to be a requirement of only 60%. When Zika was first detected it was found all over Brazil suggesting cryptic transmission. The phylogeny reveals that the virus likely came into the Americas via the Pacific - matching anecdotal evidence and revealing that it has half a season to spread across the country before the issue was raised as a public health concern. He closed his presentations with early findings from his study which aims to track and characterise Yellow Fever.

Satomi Misuhashi was up next with her talk “Portable system for rapid bacterial composition analysis using MinION”. As a clinician, Satomi was interested in leveraging the “sequence anything, anywhere” and read-until features of nanopore technology for investigating community composition of a mock community (BEI HG-782D). She created a 16S classification pipeline to compare multiple nanopore protocols and contrast them with short read technology (IonPGM). Satomi was particularly interested in rapid diagnosis and (disregarding protocol advice) attempted the rapid protocol and amplicon sequencing - with excellent results. Comparing a BLAST-based and Centrifuge-based pipeline with a custom 16S database (GenomeSync - made available for download online) Satomi measured the proportions of classifications at both species and genus rank within her samples and found them pretty comparable. She showed that at different time points (hours apart) that the community composition found in the dataset was comparable at each time. She found that species level classification was better with BLAST over Centrifuge, but took significantly longer to complete the analysis. Satomi highlighted the utility of metagenomic sequencing for unculturable infections.

Zamin Iqbal is currently sequencing 100K TB genomes from around the world. There were 10 million recorded cases of TB in 2016 and WHO estimates that 75% of multi-drug resistant TB is never diagnosed. This is partly because the timeline for diagnostics are very slow, expensive and are simply inaccessible for many populations.

goal is cheap, portable and same-day point-of-care test as this is what is required to tackle arguably the number one killer pathogen today. Rapid PCR based tests have failed to resolve this diagnostic deficit as it only targets robust, “high-hitting” antimicrobial resistance loci in the genome - missing the majority of the genome and is unable to scale effectively. The UK, with Public Health England, is the first country to introduce routine DNA sequencing for TB diagnosis. Zam’s Mykrobe application is now part of this clinical pipeline.

Zam highlighted the opportunities and challenge areas for rolling out nanopore sequencing for the task of TB sequencing straight from biological samples. In particular investigating reduction of background host and SNP calling with nanopore data. Zam found that 98% of variants were detected within 4 hours of sequencing. This was validated through randomising “SNP” positions in the genome and he found these were relatively consistent numbers. He posed the question: can depth compensate for error rate? And the answer is Yes! The current pipeline takes 6.5 hours of prep and 5 hours of sequencing (including classification and variant calling). Zam is currently collaborating with ONT to produce a concrete TB diagnostic solution with a 2-4 hour turnaround. He has collaborators around the world looking to roll-out and trial TB diagnostics in field in India, Africa, S.America and Asia as soon as possible. Watch this space!

Finally, Justin O’Grady’s talk “Developing rapid sample-to-result” diagnostic workflow on the MinION. Looking to the future Justin believes that sequencing diagnostics will take over from culture eventually as it is totally sufficient for many applications. He started the talk with advice to those looking to start their own rapid diagnostic workflows including choosing the appropriate technology and making sensible decisions with the trade-offs between starting material, turnaround time and detail of result. Next he presented three case-studies. Urinary-tract infections (UTI), Hospital-acquired Pneumonia (HAC) and childhood Meningitus. The UTI project involved samples with a high bacterial load, but also high white cell counts (about 1:10 ratio of bacteria or human cell) so human DNA depletion was required. Sequencing was done using multiple technologies but was found to be comparable with the main difference being time to diagnosis. The MinION’s run-until feature allowed Justin to see real-time the bacterial make-up of his samples using EPI2ME’s ‘What’s In My Pot’ workflow and comparison against CARD database (antibiotic resistance genes) pulled out the resistance profile. Next, Justin spoke about his work on the Inhale project detecting and diagnosing HAC. He compared two commercial multiplex PCR tests (Biofire filmarray and Curetis Unyvero) with MinION metagenomic sequencing. He optimised methods for sample extraction from sputum. The challenge with this prep was the large amount of host cell contamination, depletion of non-human DNA after host removal and contamination of the lung microbiome. The ONT rapid low-input kit allowed Justin to clean up samples really well and infections were identifiable through the EPI2ME WIMP workflow.

Finally Justin introduced a quick pilot study in childhood meningitus. Unlike, with sputum, the sample site is sterile but there is still a high host background. He was able to identify infection within 5-6 hours a (culture positive) sample with EPI2ME WIMP. He then tried the same protocol but without enrichment for non-host with a culture negative sample and detected pneumonia without amplification, though the sample had poor coverage - but still promising.

Breakout: Structural variation - revealing the inaccessible genome

Professor Wigard Kloosterman from University Medical Center Utrecht in the Netherlands gave the first presentation of the breakout session on his work mapping structural variations in patient genomes. Professor Kloosterman has been developing a bioinformatics pipeline called NanoSV for mapping genomic structural variants in patients with congenital abnormalities. Testing on a simulated chromothripsis model recovered almost 100% of the introduced breaks at 20-fold coverage with a false positive rate of ~1%. He tested the system further on a patient with 40 de novo chromosomal breaks, identifying all of them from 16-fold coverage data with the NanoSV. Read-based phasing was used to identify the parental source of the breakpoints, and showed that all germline chromothripsis breakpoints occurred on paternal rather than maternal chromosomes. His team are working on resolving the long range structure of the chromothripsis events, taking advantage of the long read lengths of the nanopore reads.

The second presentation was from Tomas Sesani at the University of Utah, presenting his work observing conflict-driven genome dynamics in viral evolution. Dr. Sesani is studyingOrthopoxvirus vaccinia, a relative of the smallpox virus, to determine the chromosomal changes that occur in response to evolutionary stress. Using an experimental evolution stimulation, the team detected two main genetic responses in the virus: changes in gene copy number and a single nucleotide polymorphism (SNP). The long Oxford Nanopore reads allowed his team to study the precise copy number of duplicated genes in individual virus genes over time, and track the SNP within copy number variable regions of the genome. In the future, Dr. Sanasi will be looking to identify recombination signatures in nanopore reads, track DNA exchange between viruses, and aims to develop a rapid bench-to-sequencer system for tracking experimental evolution.

For the last breakout presentation, Dr.Sudha Rao from Genotypic Technology in India, presented her report on a proof-of-concept study evaluating technologies for mapping balanced translocation breakpoints in a patient with epilepsy. Current methods use karyotyping which provides a resolution of around 5 MB and short-read techniques which struggle to identify multiple breakpoints. Dr. Rao used the MinION sequencer with Sniffles for structural variation calling. The resulting read mapping was proportional to chromosome size and the team identified reads matching to the region of translocation. Current challenges include increasing the coverage to reduce false positive levels and improve the speed of analysis.

larger genomes2.jpg

Plenary session: larger genomes.

Dr. Raymond Hulzink, a scientific researcher at KeyGene in the Netherlands, started the panel by discussing the use of the Oxford Nanopore sequencing technology for fungal and plant genomes. Plant genomes are challenging not only due to their length, repetitive nature, and heterozygous or polyploid nature, but also due to difficulties in DNA extraction. Higher quality DNA results in larger read lengths, so Dr. Hulzink’s team explored optimizing the DNA extraction process, removing the nucleus from the plant cell prior to lysis to reduce contamination with secondary metabolites, polysaccharides and mitochondria or chloroplast DNA. Agarose block lysis was then used to preserve the structure of the chromosomes are far as possible. Dr. Hulzink stressed the importance of investment in next generation plant extraction protocols, to provide high quality long reads for nanopore sequencing. Dr. Hulzink also gave an update on his current work, sequencing and assembling the 450 MB genome of a melon variety using the MinION R9.4 flow cells. Currently they have around 9.8 million reads and over 150 times coverage, and are working on assembly using the new Albacore to resolve long homopolymers and simple repeat regions.

Dr. Ivo Gut is the Director of the Centro Nacional de Análisis Genómico (CNAG-CRG), one of the largest genome sequencing centres in Europe. He presented the de novo sequencing of the hummingbird and Houbara bustard genomes using the Oxford Nanopore. 20-times coverage of the hummingbird genome was used for the de novo assembly, producing a contig N50 of 2.7 MB. The Houbara bustard is an endangered flightless bird. A phenol-chloroform protocol was used to extract the bustard DNA, which was assembled using a mix of Illumina short-reads and the nanopore long reads. The team achieved an 8 MB contig N50, and now have 181 contigs of over 1 MB in size. With the long reads, Dr. Gut feels there is no longer any need for scaffold systems, but that traditional style DNA extraction techniques are becoming important for producing high-quality DNA for sequencing. These successful outcomes with the nanopore sequencing will allow the CNAG-CRG to simplify their de novo assembling strategies, reducing overall turnaround time and cost.

Dr. Christiaan Henkel from Leiden University gave an update on the development of the TULIP algorithm for lightweight assembly using seed sequences in the genome to quickly assembly long nanopore reads. The algorithm was tested on the European Eel genome, and produced successful results when compared with an older Illumina draft of the genome. Dr. Henkel will be using the algorithm to sequence the 35 GB tulip genome which has chromosomes of >3 GB in size; these would take up to a month to individually thread through the nanopore! The first tulip to be sequenced will be the Orange Sherpa variety, and the PromethION will be used to gather as much data as possible by running multiple flow cells. A preliminary sequencing run on the MinION showed that only around 10% of the genome consists of highly repetitive regions, which are interspersed with unique sections. This indicates that the lightweight TULIP algorithm should be a viable approach to whole genome assembly.

Kazuhara Arakawa, Untangling the spider silk genes using nanopore long reads

Professor Kazuhara Arakawa, of the Institute for Advanced Biosciences, Keio University, gave the final plenary presentation of the conference. His lab is working on a range of different organisms including extremophiles such as tardigrades and anthrobacter, with the full anthrobacter genome sequenced in <1 hour. He is currently working on the genes for spider silk, which is highly strong and elastic with potential to be used as a renewable protein material. Spiders produce many different types of silk with different ratios of elasticity to strength. The silk is made from proteins called spidroins, the majority of which are monophyletic and contain repetitive amino acid sequences. The team aim to sequence 1000 spiders collected from all over the world, to obtain a mix of spidrion sequences. They have also created a gland-specific transcriptome for different spider species and have identified three novel proteins that are highly expressed and localised to the silk glands