Enhancing plant gene annotation and gene expression analysis

Duckweed is the fastest growing plant on Earth, and this attribute, combined with its small genome, minimal gene set, aquatic lifestyle, and transformation system are behind its recent resurgence as a model research organism. Furthermore, some species of duckweed such as Wolffia arrhizal, known as khai-nam (or ‘eggs of the water’) in Thailand, are edible, offering the potential for a new generation of fast-growing, nutritious, and sustainable crops.

Comprised of five genera (Spirodela, Landoltia, Lemna, Wolffiella, and Wolffia), duckweed genome sizes span an order of magnitude — from 150 Mb (Spirodela) to 1,881 Mb (Wolffia).

Possessing the largest body, smallest genome, and fewest genes, researchers at the J. Craig Venter Institute (JCVI) sought to further characterise the genome of Spirodela polyrhiza and study gene expression using full-length nanopore cDNA sequencing reads1. Initial genomic analysis using long nanopore sequencing reads and optical mapping allowed the generation of a highly contiguous genome assembly with chromosome-arm level resolution2. The assembly also allowed the identification of errors in previous reference genomes created using short-read sequencing technology.

‘We demonstrate here the capability of the low-cost, long-read sequencing technologies such as the Oxford Nanopore platform to provide genome-wide, sequence-based validation of sequence assemblies at high resolution as well as to identify likely regions of mis-assembly in a genome draft’2

Daily light-dark cycles and the internal circadian clock drive most plant gene processes to specific times of the day, and, as such, time-of-day (TOD) sampling can provide greater insight into different gene expression networks. Utilising the long-read capability of nanopore technology, the JCVI team performed full-length cDNA (FL-cDNA) transcriptome sequencing of Spirodela polyrhiza samples taken every four hours over two days. The resultant expression data was found to be highly concordant to that obtained using short-read sequencing technology; however, as the nanopore reads were full length, far fewer of them were required to identify cycling genes (i.e. genes whose expression levels change dependent on time of day)1. The FL-cDNA reads were also found to provide more accurate gene models than those obtained using traditional short-read technology (Figure 8).

Figure 8: Full-length cDNA sequencing using nanopore technology allowed the creation of more accurate gene models than provided by short-read sequencing technology. Figure kindly provided by Professor Todd Michael, JCVI, US1.

Critically, the long nanopore reads also enabled the identification of many more alternative gene transcripts, enabling more powerful, comprehensive analyses (Figure 9).


Figure 9: Full-length nanopore cDNA sequencing reads enabled the identification of more transcript isoforms than a traditional short-read RNA-Seq methodology. Figure kindly provided by Professor Todd Michael, JCVI, US1.

Using FL-cDNA reads to analyse the expression of LHY, a key circadian clock gene, the team were able to correlate alternative isoform expression with different timepoints. In addition, a number of novel transcripts were identified that the team plan to further validate. Furthermore, the FL-cDNA allowed expression analysis of paralogous genes (caused by tandem repeats and whole genome duplications), revealing that these genes often have distinctly different expression levels and cycling. Lead researcher Professor Todd Michael described this as a ‘game-changer’ in plant gene expression analysis, allowing greater insights into polyploidy and recent genome duplications1. Highlighting the accessibility of nanopore technology, all of the library preparation and sequencing work was undertaken by an undergraduate student.

This case study is taken from the plant research white paper.

  1. Michael, T. Full-length cDNA sequencing coupled to time-of-day sampling  enables enhanced gene prediction in the fastest growing plant on Earth. Presentation. Available at: https://nanoporetech.com/resource-centre [Accessed: 15 December 2019]
  2. Hoang, P.N.T. et al. Generating a high-confidence reference genome map of the Greater Duckweed by integration of cytogenomic, optical mapping, and Oxford Nanopore technologies. Plant J. 96(3):670-684 (2018).