Nanopore Community Meeting 2019: Day 1

The first day of Nanopore Community Meeting 2019 in New York has been a day packed full of talks. Videos and recordings of all the talks will be online soon, but for now, here are a few highlights.

Plenary: Andrew Beggs

Ultra-high-resolution HLA typing

Andrew introduced his first topic – ultra-high-resolution HLA typing using Oxford Nanopore sequencing - by asking, why type HLA at all? He explained how organ and stem cell transplantation are critically important, life-saving treatments.

Andrew explained how kidney dialysis is very important but incredibly expensive; you can get around these issues with transplantation, but this also isn't cheap - costing $17,000 for the first year, then ~$5,000 in subsequent years.

HLA mismatching increases the risk of transplant rejection, so it is "critical to get a good match". However, HLA typing is difficult for a number of reasons - the locus is highly polymorphic, it is co-dominantly inherited, and HLA gene expression is also important to measure.

Current methods for HLA typing, such as serology and sequence-specific PCR amplification ("like the first version of HD"), give low resolution; short-read NGS gives higher resolution (4-field), but it cannot easily phase haplotypes and struggles with homozygosity. Moreover, all of these methods are expensive - surely there is a better way?

Andrew then introduced his solution - nanopore-based HLA typing. This also provides 4-field (8-digit) resolution and, exploiting the inherent advantages of long reads, involves long-range PCR. Compared to previous PCR assays which used 96 well plates, this is a "single-tube assay with a 150 minute turn-around".

The approach costs a total of $109, with 60 ng DNA input required for the PCR reaction; post-PCR, ligation library preparation is performed (Nanopore library prep kit SQK-LSK109), 12-sample multiplexed sequencing is then carried out on a single MinION Flow Cell. The run is basecalled in real time with Guppy, and finally HLA assembly and calling is carried out with HLA-LA*, which uses reference graph assembly. The total time of the workflow is 5.5 hours.

Testing their workflow on 33 reference samples, it outperformed current technologies, with 100% concordance for class I calls, and only one sample with a second field mismatch; however, it turned out that the nanopore sequencing result was correct, not the short-read sequencing result. In response to this, Andrew said that "accuracy at the moment outperforms current state of the art".

For class II calls, concordance was also 100% for the first field, and one sample had an error in the second field of DPA1*. However, when indels were polished out of the data, this was corrected.

Haplotyping was performed using the WhatsHap tool, and runs of homozygosity could also be called as part of the HLA algorithm they were using.

Andrew stated that they have been testing the R10 pores, and these show substantially lower numbers of mismatches during alignment. They have also run a single sample on a Flongle Flow Cell in 2 hours, and he suggested that HLA could be called in only 50 minutes from 8 samples multiplexed on the MinION.

To conclude this section, Andrew said that, with their approach, we can type HLA within a day and to a much higher resolution than technologies which are as fast, but also faster than technologies of equivalent resolution. He shared future work that he will be doing in this area – such as Cas9 enrichment, combined DNA/RNA expression, and incorporating SNP typing into the assay. He pointed out that there is great potential here to democratise sequencing - at the moment, reference laboratories perform these sorts of assays, but there is no reason that they cannot be taken into the field thanks to the portability and speed of nanopore technology.

CNV resolution of clinical samples on the Flongle

"Wouldn't it be really cool if we could try doing CNV calling on a Flongle?"

The second part of Andrew’s talk discussed “quick and dirty CNV calling” on the Flongle. Many human diseases are caused by germline CNVs, and CNVs are associated with cancers, such as EGFR amplification in lung cancer. Taking 1 µg input DNA from blood or tumour samples, library prep (SQK-LSK109) was performed, followed by an 8-hour Flongle run. This produced ~0.05x depth of coverage of the whole human genome.

Early results from this work on a few colorectal cancer samples, using Sniffles for SV calling and QDNASeq/Bioconductor for CNV calling, found concordance between the nanopore and short-read WGS data. Known translocations and deletions were also detected, as was loss of heterozygosity, although not with high confidence, and Andrew suggested more reads using the MinION could be more optimal, or performing an enrichment-type method.

Clinical whole-genome sequencing using the PromethION

To introduce this section, Andrew discussed the UK 100K genome project (GP) which has sequenced >20,000 human tumour genomes.

`clinical whole-genome sequencing (WGS) is likely to transform patient care; many patients with advanced/metastatic disease have had treatment changes due to WGS findings. However, the workflow of the UK 100K GP was relatively slow – with an average turn-around time of 4-6 weeks, which is "too slow for patient care". Short-read WGS, which was used for the UK 100K GP, also struggles to provide high-quality SV calls due to read length.

Andrew said that we need to make it quicker, and taking some samples from the UK 100K GP they "had a go" at doing just that.

Andrew discussed their approach to clinical WGS and variant calling, using the PromethION sequencing platform. With 3 µg DNA from GeL samples, library prep was performed with the Ligation Sequencing Kit, followed by 72-hour sequencing runs and a custom bioinformatics pipeline which included alignment (Minimap2), variant calling with various tools (Clair, Longshot and Sniffles), and methylation calling (Nanopolish).

One of the challenges that they faced was that got so much data from all the human genomes sequenced on the PromethION that it became a problem in terms of data transfer requirements clashing with what was available at the genome centre. Thanks to the BEAR/Castles team, they managed to reduce their computational burden.

So far, 12 samples have been processed via this pipeline (48 will be processed in total), with a median flow cell output of 100 Gbases and a longest read of 1.14 Mbp ("we should be part of the long read club really").

They have observed a reduced output with very long read lengths but shearing before library prep increased the yield. In terms of variant calling, SNV accuracy was comparable to short-read sequencing, and many SVs were identified in cancer that were not seen in the short-read WGS data, "which was definitely fascinating"; typically, they observed mostly intronic variants. CNVs were “relatively straightforward” to call on the PromethION data, including complex CNVs and loss of heterozygosity, with binning reduced down to 15 kb, and using only the tools QDNAseq and Bioconductor for calling.

Andrew described how you "can detect fusions much more easily at the DNA level" compared to short-read sequencing. Fusions were detected with the Sniffles tool, although Andrew suggested that it may be preferable to detect fusions from RNA sequencing data. Nanopolish and MethplotLib tools were used to call methylation; hypomethylation of MLH1 near its promoter, as is commonly observed in colorectal cancer, was detected - and Andrew said that this is a drug target. Andrew stated that methylation detection from PromethION data had a much higher resolution "compared to anything else we do" in terms of methylation calling with other technologies. It was a "piece of cake" and "in fact we are going to move all our methylation assays onto PromethION and nanopolish".

In conclusion, Andrew said that clinical WGS on the PromethION “has the potential to be game changing” although we are still in the “beta” stage of its application – we need better variant calling tools, and a clinical pipeline and ISO accreditation (which they are going to work with Genomics England to achieve). In terms of accuracy, nanopore sequencing data is comparable to short-read data, and “in some ways better”. Moreover, with multiplexing, clinical WGS will be possible within < 24 hours, and this will be "a transformation for clinical genetics".

What could be the future of clinical nanopore sequencing?
Andrew suggested that amplicon targeted sequencing on Flongle is ideal for the clinic, and clinical WGS is “potentially a game changer” for nanopore sequencing, as it is the same price as short-read sequencing yet much more information is obtained from a sequencing run, such as methylation and SV calling.

It is "a very exciting time to be involved in nanopore sequencing", which has the potential to radically change clinical genetics in the next few years.

Plenary: Sissel Juul & Eoghan Harrington

Kicking off the second plenary of NCM 2019 was Sissel Juul, Director of Genomic Application at Oxford Nanopore. Sissel began by introducing the applications team: in Oxford, United Kingdom, the group focus on sample technology, whilst the Genomic Applications team is split across Oxford, New York and San Francisco, and showcases the technology through high-impact biological applications. Last but not least, the Applications Support team travel all over to provide in-depth support for customers.

SIssel highlighted the main vision of Oxford Nanopore: to enable the sequencing of anything, by anyone, anywhere. Focusing on the "anything" part of that goal, she noted that the long-read capacity of nanopore sequencing means that discussion of the technology generally features just that - long reads. However, Sissel said, "really, there's no reason why you'd only sequence long reads with nanopores". Displaying how typical fragment length vary by sample type, from very short degraded DNA to very long mammalian chromosome DNA, Sissel explained how many applications involve samples with short read lengths. Whilst the majority of reads sequenced on nanopore platforms are ~1.5 kb-150 kb, in this talk Sissel decided to focus on the other end of that scale. As nanopores can sequence read after read, one after another, she noted, sequencing short fragments does not result in loss of throughput over sequencing long reads.

In her first example of "the short of it", Sissel introduced the work of the Applications team using "longish short reads" to call variants in a cystic fibrosis panel. Cystic fibrosis, a recessive genetic condition affecting around one in 3,000 people, can result from many mutations in the gene encoding the cystic fibrosis transmembrane conductance regulator (CFTR) protein on chromosome 7; ~139 mutations in the gene have been verified as clinically relevant. A genetic test is available to detect CFTR gene mutations, enabling carrier screening, confirmation of clinical diagnosis, prenatal diagnosis and determination of optimal treatment options. This is achieved via panels, which detect the 139 mutations and variants in the space of 80 amplicons; however, with ultra-short reads, problems result from the presence of paralogous pseudogenes, which cause issues when mapping reads. The Applications team demonstrated long-read amplicon sequencing of the same mutations: by extending the amplicons to ~1,500 bp each, the same variants could be spanned in 24 amplicons, with the longer reads enabling variant phasing and unambiguous identification of the paralogous genes. Sissel displayed an alignment showing identification of expected known SNPs in a sample, plus one unexpected SNP. The additional SNP fell outside of the range of the short-read panel, but was picked up by the longer amplicons used here; this SNP was shown in the literature to have been associated with cystic fibrosis. The team aimed for 150x depth of coverage for each target, and found that this could be achieved on a MinION Flow Cell in five minutes; Sissel suggested that, to make the most of the throughput of a flow cell, samples would be ideally sequenced in multiplex or on a Flongle Flow Cell. The panel was then tested on 23 samples, each with known cystic fibrosis-related variants. The panel was enriched via the 24-amplicon multiplexed PCR, then each sample was uniquely barcoded, enabling all 23 to be sequenced on a single MinION Flow Cell. Of the 36 known mutations across the sample, 34 were correctly called, and no false positives were seen, giving a sensitivity of 94.44% and specificity of 100%; Sissel noted that for the 2 false positives, the correct mutations were visible in the data, suggesting a bioinformatics issue they are now investigating.

Looking at shorter reads still, Sissel then discussed their work in progress with "ultra-short" cell-free DNA (cfDNA): fragments of DNA present in blood plasma. In cancer patients, tumour DNA is also present; this represents 0.1-10% of cfDNA, with more tumour DNA present at more advanced stages of disease. To complicate things further, these fragments are typically only 100-200 bp in length. Sissel described how, in order to detect circulating tumour DNA as early as possible in the disease, it is important to be able to detect variants in the oncogenes of these DNA fragments with high accuracy at low frequency. To enable this, the team developed a protocol combining target enrichment and unique molecular identifiers (UMIs). Clustering all the reads originating from a single initial molecule via UMIs, Sissel explained, performs two functions. Firstly, it enables any PCR bias to be detected and controlled for. Secondly, clustering can be used to polish reads, generating high accuracy single molecule consensus reads. In this method, UMIs were incorporated into the primers used to amplify all the cfDNA in a sample, then biotinylated probes were used to enrich for regions of interest. The on-target captured DNA was then further amplified and sequenced; in analysis, the UMIs were used to cluster and polish reads, with all reads associated with the same UMI in a cluster having derived from a single parent strand, enabling the generation of high-accuracy, single-molecule consensus sequences.

The workflow was tested on human gDNA from NA14097, with a known SNP in BRCA1, spiked into the well-studied human genome NA12878 to a final variant frequency of 5%. The Roche Avenio ctDNA kit, a pan-cancer assay, was used to target oncogene-specific fragments from 17 cancer-associated genes. DNA was fragmented to ~160 bp to reproduce the typical length of cfDNA. After the addition of UMIs, enrichment and amplification, samples were sequenced and aligned to the NA12878 reference genome using minimap2. UMIs were then clustered via vsearch, filtered and then high-accuracy consensus reads obtained using Spoa, Racon and Medaka, all publically available software. A cluster size of 8 was shown to give a read accuracy of 99%, whilst a cluster size of 20 reached 100% read accuracy. SNPs were called from these high-accuracy consensus reads using varscan2, enabling quantification of the low-frequency variant, which was successfully detected at 5% as expected. In future, the team will focus on lowering the frequency of the variants, widening the selection of loci and testing the workflow on cfDNA samples.

Sissel then handed over to her colleague Eoghan Harrington, Associate Director of Genomic Applications Bioinformatics, who began by introducing the Pore-C project - a collaboration with Marcin Imielinski's lab at Weill Cornell and NYGC. Pore-C is Oxford Nanopore's method of chromatin conformation capture ("3C"), using long-read nanopore sequencing to assess the 3D organisation of DNA that is close in proximity, but not in sequence. The pre-print, nanopore sequencing of DNA concatemers reveals higher-order features of chromatin structure (author et al.) features comparisons to chromatin conformation capture methods HiC and SPRITE, and discusses the use of Pore-C in assessing structural variation and improving assemblies. The end-to-end Pore-C workflow is now available for members of the Nanopore Community: the protocol is now online, whilst tools and pipelines for Pore-C data analysis, plus sample data, can be found on Github.

Eoghan then focused in more detail on chromatin conformation capture, explaining how traditional methods work by measuring how close in proximity two points in a genome are - "pairwise contacts". The 3D structure of chromatin can then be visualised from pairwise contact maps. The reason that these methods use pairwise contacts, Eoghan explained, was due to the use of short reads, in which further spatial information is not visible.  Displaying the Pore-C protocol, Eoghan explained briefly how this is achieved, featuring cross-linking of chromatin, restriction digestion and ligation of DNA in close proximity. This forms long concatemers of DNA fragments which are close in proximity. This DNA is subsequently purified, prepared and sequenced. In analysis, Pore-C tools then identify the restriction fragments present in the reads. Long nanopore reads enable multi-fragment concatemers to be produced and sequenced in single reads, providing many pairwise contacts in the space of a read. This enables resolution of both direct pairwise contacts, which are adjacent in sequencing, and virtual pairwise contacts, which are not adjacent but can be associated through their presence within the space of a single long read. Long-range information is encoded in these virtual pairwise contacts, maximising the span of the associations.

Having discussed "the short of it", Eoghan then moved on to "the long of it", asking: why analyse only pairwise interactions? Whilst these are compatible with available analysis and visualisation tools, breaking down Pore-C data in this way means that the higher order information is lost. Pore-C reads can go beyond this: Eoghan demonstrated how complex queries could be set up for the multiple contacts in full-length Pore-C reads, enabling identification of subsets of reads with specific high order contact patterns, to gain more spatial information and identify long-range interactions. In their pre-print, this is demonstrated, showing how reads can be pulled out that span the A/B compartments in a chromosome.

Eoghan described how chromosomes tend to occupy their own spaces within a nucleus, known as chromosome territories. This is also the case for homologous chromosomes, so in Pore-C, reads generally represent a particular allelic phase. Furthermore, where 3C methods traditionally require PCR, Pore-C is PCR-free, enabling preservation and detection of base modifications in sequencing - Eoghan noted that this means that "epiallelic" information can also be assessed, to investigate imprinted loci and cell-specific methylation.

Closing the plenary, Eoghan introduced one more case study in which many of these types of Pore-C information were brought together. He described how known SNPs can be used to phase data: here, a Genome in a Bottle sample with full phasing information was analysed via Pore-C, with the data split into two buckets to build allele-specific contact maps for entire chromosomes. Comparing the chromatin conformation of the two alleles of this chromosome, Eoghan showed how distinct differences in the structure were visible. He then revealed the reason ("if I had longer, I'd build some suspense around this"): the chromosomes in question were the two X chromosomes in this female human genome sample, and the allelic differences seen are the result of X inactivation. Eoghan pointed out how the inactive (Xi) chromosome data shows the two expected superdomains, whilst the "checkerboard" data for the active (Xa) chromosome results from its expected A/B compartment structure. The other hallmark of X inactivation is differential methylation. In Xi, inactivated genes feature hypermethylated promoters, whilst the smaller proportion of genes that "escape" inactivation feature methylation more similar to that on the Xa chromosome; this was the precise pattern of methylation identified in the native DNA Pore-C reads.

Visit the "Posters" section of the Resource Centre to read more about the studies featured in this plenary:

Read the Pore-C preprint:

A selection of lightning talks

Marcela Aguilera Flores – Culture-free detection of boxwood blight to improve disease diagnosis and prevention

The ornamental bush boxwood is the most sold woody plant in the USA, with a value of $126 million a year. Boxwood blight, caused by the fungal species Calonectria pseudovaniculata (Cps) and Calonectria henricotiae (only seen in Europe), is the most aggressive disease of boxwood, causing defoliation. It is easily spread by contact. It can be asymptomatic and is very hard to get rid of when the plant contracts the disease.

Marcela discussed how current methods for blight identification involve long incubation periods in high humidity environments, which mean that it could be too late to prevent the spread of disease. Fast and early diagnosis is needed to prevent, or at least reduce, its spread. The goal of Marcela’s work was to use the MinION for faster detection of Cps in metagenomic samples from Boxwood, with different levels of infection.

Five DNA extraction methods were tested initially, to determine the optimal method. Once DNA had been extracted and prepared for sequencing, samples were sequenced on the MinION. In terms of the bioinformatics analysis, using the EPI2ME workflow “What’s in my pot (WIMP)?" Marcela identified fungi in the samples, for analysis to be "as fast as possible", but greater resolution was achieved using MetaMaps, BLAST, and a custom database.This correctly identified the exact species C. pseudovaniculata in the infected samples.

In conclusion, Marcela stated how she was able to detect Cps in small quantities, in all of the metagenomic samples, and the MinION enabled rapid pathogen detection. Her future work will involve genome assembly to help determine if unclassified reads were from the plant, and improved fungal DNA extraction so that the workflow can be used even for asymptomatic plants.

Audrey Bollas - Single-molecule long-read sequencing reveals the chromatin basis of gene expression

Audrey opened her talk by stating that there is great complexity in the organisation of the genome, with different levels of folding required for packaging the genome into chromosomes. It is known that there is a relationship between these degrees of folding and the control of gene expression, and the research of Audrey and her team involves determining this relationship.

One technique that can be used to investigate this is NOME-seq (nucleosome occupancy and methylome sequencing). This method involves treating the target sample with methyltransferase to investigate endogenous CpG-specific vs. exogenous GpC-specific 5mC methylation, to reveal chromatin state. The nucleosome protects nucleosomal DNA from being methylated by the exogenous methyltransferase, but cytosines which are exposed in the linker sequences between the nucleosomes are preferentially methylated to 5mC.

Audrey described how their method, MeSMLR-seq - methyltransferase treatment followed by nanopore single-molecule long read sequencing - can be used to map nucleosome occupancy at the single-molecule level, to differentiate open/accessible from closed/inaccessible regions of the genome. Furthermore, Audrey stated that, because they used yeast cells in their experiments, single molecules represented single cells in these instances.

Similar to NOME-seq, MeSMLR-seq profiles 5mC modifications at GpC sites; these modifications are inserted preferentially at regions of open chromatin. However, by employing long-read nanopore sequencing, methylation could be directly measured via changes in the current, and long reads enabled multiple genes and nucleosomes to be spanned in single reads.

Presenting data from two genomic loci, Audrey described how there was high heterogeneity in nucleosomal positioning around the AUA1 gene, which is transcriptionally silent, at the transcriptional start site, but high uniformity of nucleosome spacing. The EMW1 gene, which is actively transcribed, shows the converse - low heterogeneity in nucleosome positioning but low uniformity in nucleosome spacing. This shows how MeSMLR-seq can relate nucleosome positioning and chromatin accessibility to transcriptional activity of genes.

Next, Audrey showed an example demonstrating a relationship between accessibility and coexpression of the glucose transport genes HXT3 and HXT6 ; as glucose concentration increases, HXT3 expression reduces, whereas the expression of HXT6 increases. Her team found that changes in the chromatin status followed these gene expression changes.

In conclusion, Audrey stated that MeSMLR-seq enables the long-range measurement of chromatin accessibility, and phasing of nucleosomes at the single-molecule/cell level, to demonstrate the link between chromatin accessibility and gene expression.

Together with single-cell RNA sequencing, MeSMLR-seq can be used to reveal the quantitative link between chromatin accessibility and gene transcription.

For more information about MesSMLR-seq, check out the publication from Audrey's team, in the Resource Centre section of our website:

Eric Bortz – Nanopore sequencing of novel highly pathogenic avian influenza: rapid pathotyping and novel defective interfering viral RNA

Influenza is an RNA virus associated with severe respiratory and systemic disease. Eric focused on on the highly pathogenic avian influenza (HPAI) viruses: these have a diverse array of subtypes, which are defined by their HA and NA protein sequences. Although wild birds are the major reservoir of HPAI viruses, both wild birds and domestic poultry experience the disease and mortality associated with them. Eric described how he has been involved in a capacity-building project in Ukraine, where, in 2016/2017, outbreaks of the H5N8 strain were detected.

Eric discussed his workflow used for sequencing and detecting HPAI viruses using the MinION. This involved sequencing of cDNA amplicons produced from RT-PCR of 8 gene segments across the viral genome. Samples were barcoded with the Native Barcoding Expansion pack, then prepared for sequencing in multiplex with the Ligation Sequencing Kit. Minimap2 was used for reference-based genome assembly, and Medaka/Canu were used for de novo assembly.

Displaying a representative ~21-hour amplicon run on the MinION, with 6 barcoded full-genome cDNA pools, >7.7 Gbases of data and 7.23 million reads were produced, and the run was basecalled and demultiplexed live using Guppy. Live basecalling enabled a quick real-time assembly check on a small number of reads, before the sequencing run was complete. This was achieved using Geneious, and confirmed the presence of the influenza in the sample.

In terms of their findings, Eric described how, by sequencing a range of samples from different wild birds, genomic reassortment was identified, in the mute swan and other wild birds, in the context of these outbreaks.

To wrap up his talk, Eric discussed how rapid subtyping and pathotyping can be achieved using nanopore sequncing; furthermore, this could be performed on the Flongle for a very cost-effective solution.

"Going deeper" into the influenza genome, MinION sequencing can also be used to identify novel, small defective interfering RNA species of influenza that can affect the host cell response to the virus and so the longevity of the virus in its host.

Spotlight Session

The Spotlight Session, a part of the agenda designed to give exposure to early career scientists, kicked off with Nanopore’s Leila Luheshi introducing the format: a 2 minute pitch by each speaker, followed by a vote for who should stay on to present a full-length talk. The runners-up would then go on to give their talk in the Mini Theatre immediately afterwards, so no one was left without the opportunity to present.

First to the stage was Alessia di Lillo, from the FIRC Institute of Molecular Oncology, discussing long noncoding RNAs involved in DNA damage response. Alessia showed the mechanism of action for these RNAs, and described their experiments to do a functional knockdown of the lncRNAs to inhibit the response pathways. Finishing her pitch, Alessia highlighted that the RNAs of interest were particularly important in telomere shortening and endogenous DNA damage, and could not be picked up by short read sequencing.

Louise Cerdeira, from the University of Sao Paulo, was next to pitch, introducing “LUISA” – a low cost unit for sequencing applications. Louise detailed LUISA's computational specs, showing that it would be a cost-effective option for nanopore data analysis, particularly in resource-limited settings. The unit had been used in OneHealth projects in Brazil, and could be powered completely by solar energy, making it an attractive option for field expeditions too.

The final pitch came from Lewis Stevens, from Northwestern University, who discussed his efforts to establish reference genomes in the field. Lewis noted that “we’ve all heard of Caenorhabditis” – the genera of one of the most important model organisms – Caenorhabditis elegans. C. elegans, Lewis described, inhabits rotting fruit, which he called “a fairly boring place for a nematode to live”, but C. bovis, the organism in question for his experiments, instead lives in the ears of African cattle species. Because of this, C. bovis has the potential to teach us things about the parasitic properties of nematodes, which is of key importance as it is estimated that approximately 1.5 million people are currently infected with nematodes.

Lewis explained that he faced a number of challenges with this project to investigate C. bovis, not in the least that “no one in the worm community really knew it existed”. So, in order to pursue his goals of characterising its genome, Lewis took samples from a livestock market in Kenya, took them to a lab, and ended up with a chromosome-scale reference genome that will allow him to interrogate how parasitic worms evolve.

After all three pitches were given, voting opened, and a nail-biting minute passed in which the number of votes rose and rose. When the minute was up, Lewis Stevens emerged as the clear winner, taking approximately 40% of the votes, where Alessia and Louise were close runners-up with approximately 30% each.

Invited back to the stage, Lewis went on to give his full talk with the comment: “so you’ve all agreed to listen to me talk about worms, brilliant”.

Reference genomes from the field: the genome of Caenorhabditis bovis

Lewis Stevens opened his talk by stressing the significance of nematodes as parasites: ~1.5 billion people worldwide are currently infected with nematodes. He introduced Meloidogyne spp. which cause huge economic loss every year, and Onchocerca volvulvus, the organism responsible for river blindness. Whilst the nematode Caenorhabditis elegans has been extensively studied as a model organism, its applicability as a model for nematode parasitism is limited: in fact, all Caenorhabditis species currently in culture are free-living, and most were isolated from rotting fruits and flowers. In contrast, the species C. bovis is parasitic and “does things slightly differently”; it is responsible for parasitic otitis and in severe cases, mortality. Overall, Lewis described, we know very little about C. bovis – there’s just a handful of papers from the 1980s and 1990s that exist.

Lewis went on to explain how he contacted Eric Fevre, who had an initiative named ZooLink, a surveillance program at the International Livestock Research institute to study emerging zoonotic diseases in livestock. Thinking the process through, Lewis realised that it would be nearly impossible to take cattle samples out of the country of origin, and exporting extracted DNA would also have its challenges – so the only logical solution was attempt sequencing in situ. The plan, Lewis described, was to “go to Kenya, find the worm, sequence it” with a MinION.

When on site in Kenya, Lewis explained how the vets “did all the hard work of sticking fingers in cows’ ears”, but many samples contained nothing. Just when the team were giving up hope though, a worm crawled out of one of the samples!

In a field laboratory close to the sampling sites, C. bovis from the samples were cultured on horse blood agar. Lewis described the worms as “incredibly happy” on these plates – multiplying hugely in number and generating 8 μg of DNA from the extraction process. After extraction, the DNA was needle-sheared and then prepared for nanopore sequencing in a slightly modified "one pot" reaction using the Ligation Sequencing Kit. The read lengths of the two runs were very different, but this was attributed to the fact that for one sample, the DNA was “essentially cooked” in the rush to get it into solution.

The long reads were assembled via wtdbg2, corrected with Medaka and polished with the incorporation of short read data using Racon and Pilon.

The resulting C. bovis version 1 assembly spanned 62.7 Mb, with 35 contigs and a contig N50 of 7.6 Mb, reaching a BUSCO completeness of 95.2%. Half the genome was contained in 4 contigs, and two of those contigs represented complete chromosomes – “not bad for data generated in rural Kenya”. Phylogenetic analysis of the assembled genome revealed that C. bovis diverged early. Its closest relative in analysis was C. plicata, another ecologically unusual Caenorhabditis species: the nematode has been isolated from a dead elephant in Kenya and a dead pine marten in Germany, and spreads via carrion beetles. Given these results, Lewis asked: could there be a clade of vertebrate-associated Caenorhabditis species that is, as of yet, largely undiscovered?

Delving deeper into the C. bovis genome, Lewis noted that it exhibits low heterozygosity to the point of being essentially homozygous; he compared this to the C. benneri genome, which has been desribed as harbouring "the most molecular diversity of any eukaryote". Lewis suggested that the low diversity seen in the C. bovis genome results from the transport of a very small number of worms between parts of a population, resulting in extremely limited gene flow. He then discussed the presence of expansions in gene families associated with parasitism, with one potentially conferring resistance to antihelminthic drugs and others perhaps modulating the immune system of its mammalian hosts.

Finally, Lewis described his planned next steps for this study, highlighting how "to understand the biology of C. bovis, we need more than just a genome". To further study the worm, the team aim to export live cultures to the United Kingdom and to the Caenorhabditis Genetics Center (CGC) at the University of Minnesota. However, the biology of this worm in situ is still not fully understood, which means many more projects in Africa with local teams. Finally, Lewis thanked the teams supporting his project, in particular the vets who were “so willing to put their fingers in cows’ ears to look for a worm they’d never heard of”.

Technology update: Clive Brown

The day concluded with an update from Clive Brown, Chief Technology Officer, Oxford Nanopore Technologies - the video will be online in a separate post shortly.