The real Simon Pure

Walking on to REM’s “Imitation to life” with a talk titled “The real Simon Pure” foreshadowed the focus of Dan’s presentation, that being the sequencing of native DNA on the Oxford Nanopore platform. However, before getting into the meat of his presentation, Dan opened by introducing the Applications team at Oxford Nanopore. Outlining the roles of the different groups across the globe, he said that the Oxford contingent specialised in sample prep technology and kits, whilst the American group were more focused on showcase projects demonstrating what Oxford Nanopore sequencing can do.

Moving on, Dan explained that much like the play “A bold stroke for a wife” from where his talk title originated, his presentation will focus on fakery, or more importantly, finding the real answers. Elaborating upon this, Dan gave a nod to the recent BioRxiv paper by Ebbert et al. titled “Systematic analysis of dark and camouflaged genes: disease-relevant genes hiding in plain sight” to highlight the fact that many areas of the genome are inaccessible to sequencing by amplification-based approaches. To demonstrate this point, Dan brought up a coverage plot of chromosome 21 as generated by Oxford Nanopore native DNA sequencing. Here, the coverage was relatively even across the whole 1 Mb region displayed. However, when a PCR version was overlaid, huge coverage drops were obvious, with some regions being completely missed. To back up his point further, Dan showed a histogram of read coverage by GC content across a whole human genome. Where PCR had been used, the GC content formed a normal distribution around 40%. However, regions which could only be sequenced using the native DNA, and not through PCR, had a bimodal GC content around the extremes. To give further examples, Dan brought up a wealth of scientific literature highlighting problems with sequencing genomes with known GC biases using amplification-based approaches. Specifically, Dan talked about one example where PCR had significantly shifted the measured GC content of a sequenced organism. Explaining why this happens, Dan said that PCR prefers GC-neutral areas of DNA and preferentially amplifies shorter DNA fragments. Whilst this is fine for targeted approaches, when examining a whole genome via amplification, these factors result in longer or GC rich PCR templates amplifying less efficiently - or not at all. In order to get as unbiased a representation of a whole genome as possible, sequencing the actual native DNA is required. As an additional point, any amplification-based approach will lose epigenetic markers as these are not preserved through PCR on to the copied strands. Dan reiterated that the Oxford Nanopore platforms are the only sequencing solution that allows the sequencing of DNA and RNA itself without the need to ever synthesise DNA copies. Therefore, this removes the potential biases introduced through loss of information via PCR drop-out, or removal of methylation signatures. Dan then went on to show examples of why this is such an important option to have in a researcher’s arsenal of molecular tools when attempting to answer an array of different biological questions.

DNA modifications, specifically methylation, became a recurring theme throughout Dan’s talk. With a brief introduction stating that “DNA methylation of cytosine residues in eukaryotes alters gene expression patterns” he moved on to show how this may be relevant in the debilitating inherited disease Friedreich’s Ataxia. Friedreich’s Ataxia is one of the most commonly inherited recessive neurodegenerative diseases, affecting 1 in 50,000 people, and resulting in loss of motor skills and eventually death. With approximately 1 in 112 people carrying a single copy of the disease allele, the inheritance patterns follow typical Mendelian genetics, with the offspring of two carriers having a 25% chance of inheriting the terminal disease. Two proposed mechanisms exist that result in loss of function of the gene encoding the frataxin protein, both involving the inhibition of RNA polymerase to process through the gene, resulting in a loss of transcription. The first involves a GAA repeat expansion between exons 1 and 2 that causes triplex DNA to form, where healthy individuals typically have less than 20 copies, and diseased individuals show in excess of 1000. The second involves hyper-methylation of the 2 kb running up to repeat expansion which further inhibits RNA polymerase procession. For this study, Cas9 was used to excise the frataxin gene and Oxford Nanopore long-read sequencing of the native DNA was successfully used to find this low complexity repeat expansion in parental carriers and their affected child. In the parents, the repeat expansion was observed on one allele, identifying them as carriers; the same repeat expansion was detected in the child, but in its homozygote form. When the number of repeats was calculated, simply by taking the number of bases in the low complexity region and dividing by three, the expansion count mirrored that elucidated by southern blot analysis. In terms of methylation, the 2 kb upstream of the repeat expansion was analysed for CpG methylation using Nanopolish; hypermethylation was observed, where reads containing longer GAA repeats showed more methylation than the wild type. Dan stressed that the sequence of interest contained long GAA repeats which would be difficult for polymerases to process through. Furthermore, the need to maintain the methylation patterns meant that sequencing the native DNA with Cas9 mediated targeting was the only way to get the results presented here (poster: https://nanoporetech.com/resource-centre/resolving-highly-complex-rearrangements-genomic-architecture-using-long-0).

Dan followed this with a methylation story focusing on bacteria.  Bacterial DNA methylation is often used as a defence mechanism against invading bacteriophages by protecting the host DNA from restriction endonucleases while allowing the invading phage nucleic acids to be destroyed. However, in this section of his talk, Dan suggested that methylation patterns could be used as a way to cluster genomes of closely related organisms after the DNA has been co-extracted from a metagenomic pool. Here, two strains of E.coli were pooled where one had two methyl transferase genes knocked out. This strain was both DAM and DCM deficient meaning that only one of the strains had the ability to methylate DNA in a cCwgg and a gAtc context respectively, while the other did not.  DNA was extracted from this mixed population and Oxford Nanopore’s Tombo methylation caller was used to find methylated bases in the native DNA. Using the median DCM and DAM values for both genomes, each could be distinctly separated across the two dimensions. In order to prove this was in fact the case, sequences from each cluster were assembled and a mummer plot of one against the other showed a 2 kb insertion leading to the inactivation of the DAM gene. As an added bonus, plasmids associated with each organism also showed the expected methylation patterns and could therefore be linked to their organism of origin (poster: https://nanoporetech.com/resource-centre/using-long-native-reads-partition-and-assemble-genomes-complex-metagenomic-samples).

Hinting at spoilers for a future talk, Dan spoke about another way to link plasmids to their host by using the newly developed chromatin capture method called MetaPore-C. Here, DNA is cross-linked with proteins within the cells themselves and the free DNA ends are cut and ligated together. As a result, DNA close together in 3D space becomes ligated together and sequenced in a single concatemer molecule. Here, plasmids and host genome would be physically linked via ligation and sequenced together allowing taxonomic information from the host genome to be assigned to each plasmid. Dan only briefly explained an ongoing project using this method to track antibiotic resistance plasmids through a bacterial population as a teaser before segueing onto how long-read Pore-C has other uses, for example aiding in the de novo assembly of whole genomes.

Using the same idea of cross-linking DNA in the host cells prior to cutting and ligating ends together, this application of Pore-C exploits the distance between reads in 3D space to aid assemblies, find copy number variations, and identify genomic rearrangements. The output of Pore-C is often displayed as a “contact map” where genome position is denoted on the X and Y axes, and pixel intensity represents read depth for a given section of genome. When this was performed on the well sequenced and studied human reference genome NA12878, the contact map mainly showed many reads map to locations that they are expected to, i.e. along the diagonal. Dan then showed a contact map for a breast cancer cell line. In this case, vertical and horizontal lines could be seen away from the diagonal and he said these were indicative of copy number changes. Furthermore, darker points located off the centre line suggested rearrangement events. Examining the contact map for NA12878, Pore-C was used in conjunction with 44 X coverage of standard Oxford Nanopore reads with a read N50 of 40 kb and 11 X coverage of ultra-long reads in excess of 100 kb. As an example of how Pore-C can be used to correct an assembly, the tool SALSA2 was used to find the most optimum assembly based upon the contact map data. Displaying the results for chromosome 8, contigs were merged, inverted or split based upon the Pore-C data resulting in a 129 Mb scaffold spanning 90% of the whole chromosome. The overall contig N50 of the human genome after correction with SALSA2 was 36.2 Mb resulting in potentially one of the most contiguous diploid human genomes to date. Summarising this section, Dan said that Pore-C can be used to verify and improve de novo assemblies and observe copy number changes and rearrangements in the contact map. Furthermore, this information can be used to scaffold contigs (poster: https://nanoporetech.com/resource-centre/pore-c-using-nanopore-reads-delineate-long-range-interactions-between-genomic-0).

Wrapping up his talk, Dan wanted to say that although his talk mainly focused on the benefits of native DNA sequencing, sometimes PCR is the best tool for the job. For example, often you will have limited DNA or a lot of background and are targeting a specific region of interest. However, it is only native nucleic acid sequencing on the Oxford Nanopore platform that can give you the lowest bias, the longest reads, and epigenetic modifications all in the same experiment.

Authors: Dan Turner