Characterising alternative splicing in the human cortex with ultra-deep nanopore sequencing

With the goal of ‘understanding both the causes and consequences of molecular variation in the human brain’, Jonathan and his group use a multiomic approach to study how the regulation of gene expression changes across development, and how these features relate to neuropsychiatric and neurodegenerative disease.

‘The advent of long-read sequencing has really enabled us to ask new questions’

Jonathan highlighted the significance of alternative splicing: occurring at an estimated 95% of human genes, it plays an important role in the developmental control of gene expression. The transcripts resulting from alternative splicing can produce different proteins, adding ‘an extra layer of diversity to the way our genetic material is transcribed’ and potentially producing antagonistic effects on function. Traditional short-read sequencing encounters challenges in comprehensively analysing alternative splicing events because short reads typically cannot cover the entire length of transcripts. In contrast, nanopore sequencing excels in spanning complete transcripts end to end in single reads, offering a unique capability to unambiguously identify isoforms. Jonathan described how long nanopore reads have ‘really transformed our ability to characterise things such as alternative splicing and isoform diversity in the brain’.

Jonathan presented his team’s recent work utilising nanopore sequencing to characterise human regulatory genomic variation throughout development and across the life course. This work builds on a pilot study in which long nanopore reads revealed ‘considerable transcript diversity’ in the human and mouse cortex, including many novel transcripts1. As part of this, they developed a data analysis pipeline, FICLE, for the accurate analysis of alternative splicing and visualisation of isoforms from long nanopore reads2.

‘the ability to read directly through entire transcripts now enables us to unambiguously look at the actual structure of the transcripts that are being expressed’

In their recent study, Jonathan and his team selected human cortex tissue RNA research samples from around 50 pre- and post-natal donors, spanning from six weeks post-conception to 95 years old. These were prepared using the cDNA-PCR Barcoding Kit and sequenced to ultra-high depth of coverage on high-output PromethION Flow Cells, generating an average of eight million reads per sample.

Sharing the results, Jonathan illustrated the ‘huge’ number of different transcripts expressed in the brain. As well as enabling the identification of new isoforms, Jonathan emphasised that many of the novel exons identified through whole-transcript nanopore sequencing may contain genetic variants that would have been missed by traditional whole-exome short-read sequencing approaches.

These novel findings enabled the addition of new transcript annotations, which in turn adds to the information available for the interpretation of whole-genome sequencing data. Additionally, approximately 1% of the identified transcripts represented outcomes from gene fusions.

In total, almost three million transcripts were identified from nanopore sequencing data. Many represented novel transcripts, and around 40% were missing from existing annotations. In some instances, these novel transcripts were the most abundant isoform identified in the cortex. Whilst many of the identified isoforms were individually rare, Jonathan emphasised that this does not necessarily reflect their potential importance or protein-coding ability. Large numbers of novel transcripts were identified from genes implicated in neurodegenerative diseases including Alzheimer’s disease (AD), Parkinson’s disease, and amyotrophic lateral sclerosis.

Utilising a computational approach to identify the potential significance of these novel transcripts, the group found that a large proportion had ‘very high coding potential’, with many of these ‘enriched for highly conserved bases’. This suggests that they represented functional coding sequences that are missing from current annotations — and could harbour pathogenic variants associated with human disease.

Next, Jonathan described how he and his team integrated these novel annotations from their nanopore data with whole-genome sequencing data from Genomics England, to explore instances in which de novo mutations overlapped novel coding sequences in known, dominant developmental disorders. In one example, a novel coding exon was identified in transcripts of the gene TBR1 using nanopore sequencing. They then found a whole-genome dataset from a clinical research sample which featured a de novo stop codon in this gene — a variant which was not reported in existing databases. This research sample was from a study participant who did not have a diagnosis, but ‘their phenotype matched closely with other known mutations from that gene’, suggesting that the variant in this previously hidden novel exon could be pathogenic.

‘We even see... changes in transcript expression where you don’t see changes in the gene-level expression, [which] reinforces the power of long-read sequencing’

The nanopore sequencing data also allowed an in-depth comparison of isoform expression between pre- and post-natal cortex samples, revealing differential isoform expression of nearly 7,500 transcripts. Jonathan emphasised the significance of this isoform-level resolution: in some instances, any expression changes would have been invisible from a gene-level view.

Sharing an example of transcripts from a key glutamate receptor gene, which is important for neurodevelopmental disorders, including schizophrenia, Jonathan showed how pre- and post-natal gene expression remained the same, but a ‘dramatic shift’ was visible in isoform expression. They also identified differentially expressed transcripts between clinical research samples from female and male subjects, with some also interacting developmentally, showing changes between pre- and post-natal research samples.

Transcriptomic, genomic, and epigenomic data from a single platform

The group are now expanding their study in multiple directions. In ongoing work, they are further investigating transcript expression in disease by characterising post-mortem tissue samples from individuals who had psychiatric or neurodegenerative phenotypes. They are also utilising a mouse model of AD. Jonathan presented an instance wherein a significant upregulation of a specific transcript of TREM2, one of the key genes associated with AD through genome-wide association studies (GWAS), was observed in the context of elevated tau pathology. Expanding beyond brain tissue, they are also characterising research samples from other organs across development, including the pancreas. Furthermore, they are using single-nucleus sequencing to reveal potential differences between cell populations across development. Notably, this approach has already revealed transcript expression differences for an AD-related gene between neurons and oligodendrocytes.

‘One of the real powers of this technology is not just being able to detect DNA methylation, but also to directly read out DNA hydroxymethylation’

Finally, Jonathan discussed epigenetics, the primary focus of his group’s research. Harnessing ‘the power of [nanopore] sequencing to characterise DNA modifications’, they used direct sequencing of native DNA to profile methylation across the human genome. With nanopore sequencing, epigenetic modifications can be characterised alongside canonical bases in a single sequencing run, without the need for special library prep such as bisulfite conversion. Comparing neurons and oligodendrocytes, he illustrated the consistent difference in methylation patterns observed in a marker of excitatory neurons. Going further, the team utilised the capacity of nanopore sequencing to directly detect 5-hydroxymethylcytosine (5hmC) — a modification that would be extremely challenging to distinguish from 5-methylcytosine (5mC) with traditional sequencing technology. The results revealed varying levels of this modification between the two cell types. Jonathan highlighted that 5hmC is critical in regulating alternative splicing and expression of brain-expressed genes.

1. Leung, S.K. et al. Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing. Cell Rep. 16;37(7):110022 (2021). DOI: https://doi.org/10.1016/j.celrep.2021.110022