Accurate detection of m6A RNA modifications in native RNA sequences using third-generation sequencing

Opening this year’s London Calling conference, Eva Maria Novoa from the Center for Genomic Regulation in Spain, provided an update on her team’s work utilising nanopore sequencing to directly detect base modifications in native RNA molecules. Eva Maria started her presentation highlighting how the process of protein generation – from transcription of DNA to RNA, and subsequent translation of RNA to protein – is much more complex than we were taught at school. We now know that many more factors play a role in this process, requiring the study of the epigenome, epitranscriptome, and post-translational protein modifications. Furthermore, these factors are intrinsically linked, interacting with each other to add a further layer of complexity to functional studies.

Eva Maria showed a chart revealing that there have been significantly fewer studies of the translatome (i.e. all translated RNA molecules) than the transcriptome, which she puts down to the lack of suitable analysis technologies, rather than a lack of importance. It was originally thought that RNA modifications were a structural feature of tRNA or rRNA; however, in 2011, a publication revealed that the m6A modification, which was known to exist in mRNA was reversible. This led to the realisation that the modification may have functional properties, and, as a result, pushed the development of techniques to better analyse these modifications.

According to Eva Maria, in excess of 170 RNA different modifications have now been identified and, of these, over 70 have already been linked to human diseases, including cancer and neurological disorders. The first genome-wide method for the analysis of the modified base m6A, m6A-Seq, was published in 2012. Since this time there has been an exponential increase in the number of publications using this method and m6A has been shown to have a pivotal role in a range of cellular functions such as cell differentiation, stress response, mRNA stability, and sex determination. However, Eva Maria described how this method, which relies on traditional sequencing by synthesis (SBS) analysis, has a number of limitations for the detection of base modifications. For example, it requires the existence of selective antibodies or chemicals, which only exist for a handful of modification types and which may also exhibit cross reactivity. Further, the methodology is complex and only provides an indirect measure of modification state. It was her assertion that better methods were required. Fortuitously, a move to the Garvan Institute of Medical Research, brought her together with Martin Smith who was using nanopore sequencing for direct RNA. It was apparent that this technique may be able to solve many of the challenges faced by current technology. Eva Maria described how she was ‘super excited’ when Oxford Nanopore released the first direct RNA Sequencing Kit.

Unlike traditional sequencing technologies, nanopore RNA sequencing requires no amplification or reverse transcription steps, allowing the retention and detection of base modifications alongside the nucleotide sequence. As with standard nucleotides, base modifications give a measurable and characteristic disruption to the electrical current applied to the nanopore, allowing their direct detection and identification. However, Eva Maria revealed, it was not always straightforward to associate every single change in current intensity to the presence of a modification. The team quickly realised that they needed to create a training set to create a specific basecalling algorithm for modified bases. To this end, they designed sequences that covered all possible 5mers of this modified base. They sequenced modified and unmodified sequences to obtain a set of features, which they would ideally be able to classify using machine learning to their relevant modification states. This was benchmarked using m6A. When they mapped the subsequent reads back to the sequence set, they saw a large number of sequencing errors in the m6A modified sequences. They realised that these base calling errors could be used, in addition to current intensity to improve the identification of modifications. The revised algorithm, delivered a 90% accuracy for calling m6A. The algorithm was then validated on wild type yeast and an Ime4 knockout strain, which lacks m6A. The data showed that, as anticipated, the basecalling features were changing in the wild type due to the presence of m6A but not the knockout strain – allowing single molecule resolution detection of the modification. Inspired by these results they are now building data sets for other base modifications, including 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC), and pseudouridine (pU).

In collaboration with Martin Smith at the Garvan Institute of Medical Research, the team are also developing a barcoding methodology to further reduce sequencing costs, for which initial results have yielded an accuracy of 98.9% with an 80% recovery. The methodology and data behind this work will be released shortly. According to Eva Maria ‘The establishment of the Oxford Nanopore platform as a tool to map virtually any given modification will allow us to query the epitranscriptome in ways that, until now, had not been possible’.

Authors: Eva Maria Novoa