Understanding the genomes of weird and wonderful plants

From identifying leaves to creating a plant sequencing lab

Todd Michael’s love of plants can be traced as far back as the seventh grade. Tasked with collecting and identifying leaves, he became fascinated by the differences between trees — their diverse shapes and sizes. He recalls how ‘I became really fascinated with this idea that there must be a code in there that makes trees and plants be different.’ Set on investigating this further, Todd asked his guidance counsellor whether there was a job that would allow him to figure out this code, only to be told that ‘that’s not a thing’, to perhaps try engineering. Undeterred, Todd’s fascination continued, and he began to collect and grow seedlings — a habit he maintains to this day.

‘My back yard is full of unique plants that I’ve collected. Always, when I sequence a plant, I have it growing somewhere: either at my house or at the lab.’

By the time Todd started his PhD at Dartmouth College (USA), the project to sequence the Arabidopsis genome for the first time had only just begun. He chose to study the circadian clock; Todd highlights the importance of this underlying mechanism in enabling plants to leverage the time of day: ‘plants can’t move, so they pretty much have to bundle everything together into their genetics… to survive in one place.’ After continuing his research at the Salk Institute for Biological Studies, Todd had the opportunity to start his own lab and genome centre at Rutgers University, where he began to integrate plant genome sequencing into his work. Since then, Todd has headed up sequencing centres across several companies and institutions, where he has sequenced and analysed numerous, diverse plant genomes. His goal, then and now, is ‘to find all these weird plants that are doing interesting things, and then sequence their genomes and try to understand: how has their genome evolved to do these special things?’

What makes a 'weird plant'?

Todd and his team consider several features when choosing plants to sequence. Firstly, they look for those with unusual morphological features — ranging from the corkscrew leaves of the carnivorous plant Genlisea, to the tiny, aquatic duckweed, the fastest growing plant on Earth. Todd shares the example of Welwitschia: a plant that exists only in the Namib desert which, despite living for as long as 8,000 years, produces only two leaves. During the day, Welwitschia’s leaves act ‘like big sponges’, with morphological changes enabling them to suck in water through their stomata. Welwitschia is one of many plants in which Todd is investigating telomere length using long nanopore sequencing reads. Despite its longevity, sequencing revealed the plant to have relatively modest telomeres — much shorter than trees and some species that are primarily propagated clonally, such as cannabis.

A Welwitschia plant. Image credit: Todd Michael.

Secondly, the team are interested in how different types of photosynthesis have evolved. They recently published the genome sequence of Isoetes: a lycophyte that is able to perform CAM photosynthesis underwater, despite this mechanism generally occurring in plants growing in dry regions1. Todd explains: ‘it turns out it’s just about water use efficiency and, specifically, carbon availability. Plants solve the problem the best way possible’. Finally, they consider the size of plant genomes — Genlisea and Utricularia, for example, have the smallest known genomes of flowering plants.

A Genlisea plant. Image credit: Todd Michael.

Todd’s work with plant genomes also encompasses the identification and sequencing of new plant model organisms. What makes a good model organism? Todd explains that no single organism can fulfil every aspect that may be required; it depends on the biological question being asked. One example is Wolffia, a duckweed comprised of only ~4,000 cells, which can double in under a day. Wolffia features about half of the typical number of genes found in plant genomes, yet retains all the core gene families. Where most plants have evolved by whole-genome duplication and then subsequently purge some of their paralogous genes, Wolffia has ‘basically purged everything, right down to the core gene set’. This makes Wolffia an ideal model system for knock-out studies, whilst its lack of organs could make it a candidate chassis for synthetic biology2.

The importance of the time of day

Could the times at which we sample and sequence impact what we see? Todd stresses that, especially with plants, it’s important to always consider the time of sampling. As a post-doc, before sequencing was an option, Todd investigated Arabidopsis microarray data with time-of-day in mind. The result: ‘it turns out that most people do their experiments right after coffee time, so around ten o’clock in the morning’.

In 2019, Todd used nanopore cDNA sequencing to investigate the impact of the circadian clock on gene expression in the duckweed Spirodela polyrhiza3. Sequencing full-length transcripts from samples taken at four-hour intervals across two days, the team were able to correlate alternative isoform expression of LHY — a key circadian clock gene – with different time points. They also observed that different paralogous genes often displayed different expression levels and cycling.

Species of duckweed. Image credit: Todd Michael.

More recently, Todd has been investigating heterosis — the improved vigour seen in hybrid offspring — describing how ‘whereas nanopore was like the holy grail of sequencing, the holy grail of plant biology…is heterosis’. Building on their work with duckweed, they are now utilising the combination of full-length transcript sequencing and time-of-day (TOD) analysis to assess paralogs in crops such as maize. Where short-read sequencing cannot distinguish between paralogs due to their high similarity, this method is enabling them to identify strong signals in which the circadian clock is influencing heterosis.

‘Just like Oxford Nanopore opened up the opportunity to…change how we look at plant genomes, I think extending that into the space of transcriptomics is really now allowing us to look more closely at this very important biological question.’

The democratisation of sequencing

Since the early days of sequencing plant genomes, Todd notes, the accessibility of sequencing technology has changed considerably. In 2017, he and his colleagues sequenced the Arabidopsis genome on a single MinION Flow Cell, demonstrating that anyone could do this: ‘what took ten years, and millions of dollars hundreds of million dollars for sequencing the first Arabidopsis genome, we basically could do in a week for a couple of hundred dollars. In other words, sequencing should become part of how people approach experiments’4. Todd anticipates that this greater access to sequencing will shift the way experiments are approached: now that experiments can be designed within a week or a single day — more easily, even, than setting up a qPCR for a handful of genes — it is easy to ask biological questions and quickly conduct hypothesis-driven sequencing in your own lab.

‘In terms of the democratisation of sequencing, that’s what I really love about nanopore – everybody can jump in’.

Todd describes how, in collaboration with Ian Henderson (Department of Plant Sciences, University of Cambridge), he used nanopore sequencing to revisit a previously inaccessible region of the Arabidopsis genome: the centromeres5. He explains that it hadn’t been possible to characterise these sequences using an alternative long-read sequencing platform, whilst their methylation state could not be determined using short-read bisulfite sequencing due to the repetitive nature of the sequences. With long nanopore reads, ‘we were able to answer both questions at once, so that was super exciting’. The sequences showed a repetitive structure, with nested changes revealing a ‘vibrant evolutionary history’. The next challenge? Completing the ribosomal sequences in full. Todd also expects that the use of ultra-long nanopore reads will make telomeric regions much easier to finish.

The future of plant genomics

Another reason that Todd loves plant genomes, he explains, is their complexity. Their frequently large genomes, often featuring high levels of heterozygosity, polyploidy, and repetition, are also features that have — until recently — hindered their assembly. He emphasises how, with the move towards scientists performing nanopore sequencing of plant genomes in their own lab rather than sending samples away to a core facility, researchers can now set out their experiment with a goal of telomere-to-telomere assembly from the outset. Todd notes how, in recent publications, including the assembly of the vanilla genome, nanopore sequencing has been used to generate haplotype-resolved assemblies6. He explains that, for plants, the picture is more complicated still than haplotypes, with the need to sequence subgenomes that are the result of polyploidy. He thinks that, whilst short-read chromatin conformation capture methods can help resolve clearly distinct subgenomes, long nanopore reads generated with Pore-C could allow the resolution of haplotypes in even more complex subgenomes that result from polyploidy.

Todd highlights how sequencing a single plant genome is not enough to tell its story — that adopting a pan-genomics approach, or perhaps ‘sequencing a species, instead of a genome’, will help reveal the multiple structural and genomic aspects making up an organism. He shares the example of the duckweed Lemna: a plant that can be found in almost every pond outside of the North and South Poles. This plant can grow very rapidly in response to a small amount of nutrients and, whilst individuals appear very similar, a deeper look at their genomes has revealed variation in polyploidy that helps shed light on how the population functions.

At the other end of the scale, Todd describes the ‘next big thing’ of single-cell sequencing of genomes, transcriptomes, and methylomes, and a future of tissue-specific analysis. He describes Baobab (Adansonia digitata), a plant that he currently works on, which can live for as long as 5,000 years: ‘there are cells that have to be thousands of years old, replicated faithfully, so there’s some really interesting things going on at the individual cell level that I think that kind of technology will allow us to approach’.

With so many technological advances in recent years, Todd describes how the number of published plant genome sequences is continuing to grow. He recalls how he wrote a paper when there were 50 plant genomes available, the next when there were 100, the next, 200. And yet, ‘there are so many interesting plants out there to sequence still. It is great there’s a lot of people participating in this endeavour’.

Want to find out more?

Watch Todd Michael’s presentation, Unfinished business: solving highly repetitive plant telomere, centromere, and ribosomal arrays with long reads, from the PAG 2022 Symposium

View our white paper on Closing the gaps in plant genomes

Discover more about plant research with nanopore sequencing

1. Wickell, D. et al. Underwater CAM photosynthesis elucidated by Isoetes genome. Nat Commun. 12(1):6348. DOI: 10.1038/s41467-021-26644-7 (2021).

2. Lam, E., and Michael, T.P. Wolffia, a minimalist plant and synthetic biology chassis. Trends Plant Sci. 27(5):430-439. DOI:10.1016/j.tplants.2021.11.014 (2021).

3. Hoang, P.N.T. et al. Generating a high-confidence reference genome map of the Greater Duckweed by integration of cytogenomic, optical mapping, and Oxford Nanopore technologies. Plant J. 96(3):670-684. DOI: 10.1111/tpj.14049 (2018).

4. Michael, T.P. et al. High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat Commun. 9(1):541. DOI: 10.1038/s41467-018-03016-2 (2018).

5. Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science. 374(6569). DOI: 10.1126/science.abi7489 (2021).

6. Hasing, T. et al. A phased Vanilla planifolia genome enables genetic improvement of flavour and production. Nat Food. 1, 811–819. DOI: 10.1038/s43016-020-00197-2 (2020).