Interview: Exploring complex disease in Asian populations with PromethION

with Professor Jianjun Liu of the Genome Institute of Singapore

Conducted and written by Jonathan Pugh

“long-read sequencing will become the main platform for clinical diagnosis, just because they have a potential to reveal all classes and types of variants”*

Professor Jianjun Liu discussing his thoughts on the future of clinical diagnosis and the development of long-read sequencing. *Oxford Nanopore Technologies products are currently for research use only.

Health-related technologies exist in an ever-changing ecosystem, where continual developments drive new insights into human health and disease. One example of this is the use of wearable technology to help track and manage diabetes; companies such as Alphabet, Apple, and Amazon are currently active in this area, developing wearables, such as wristbands, smartwatches, skin patches, and even contact lenses. This is a prime example of how, as new technologies become available, people revisit existing diseases in new, previously inconceivable ways. This is also true for investigations focusing on genomic data, where advancements in technology allow researchers to return to existing diseases and demonstrate how new tools give them greater insights. For example, in 2019, a group of researchers used nanopore sequencing to identify, for the first time, repeat expansions in human genes that evolved through segmental duplication. This illuminated a new genetic basis for a known condition (neuronal intranuclear inclusion disease) which previously hadn’t been possible due to technological limitations.

In short, well-characterised diseases can continue to surprise us when assessed in new ways. This may be even more true with complex diseases — ones where the disease burden is caused by the interaction of multiple genomic and environmental factors. To learn more about how nanopore sequencing is helping unlock new findings for complex disease, I spoke with Professor Jianjun Liu (JJ) of the Genome Institute of Singapore (GIS), an expert with over 20 years of experience in pursuing and determining genetic variants that influence complex disease susceptibility, progression, and treatment outcome.

Same diseases, new techniques

When JJ reels off the list of diseases he is working on, at first, they sound very diverse and, in some cases, completely unrelated to each other: fasciitis disease caused by Mycobacterium leprae; cancers associated with Epstein-Barr virus; neurological and neuropsychiatric conditions such as schizophrenia; neurocognition deficiency and its contribution to psychiatric disorders and Alzheimer’s disease; kidney disease and in particular the chronic condition IgA nephropathy; germline genetics and their role in cancer development; somatic mutations and how they influence cancer progression; and finally, B cell lymphomas. There is something which links every single one of these however, and it is “the resource that can power the genetic studies of complex diseases”— the reference genome. Over the course of JJ’s career, he has seen the genetic disease field develop, and now it’s moving away from array-based research to whole-genome sequencing-based analysis.

The Asian Reference Genome

Whilst some studies are moving to de novo assembly as a default for human genomic investigations, JJ makes it clear that, from his point of view, for “population-scale genetic studies … the analysis is still really referenced based”. The problem for JJ is that for almost as long as human reference genomes have existed, they have been “Caucasian centric … I think it’s time to generate [a] newer reference… [one] particularly useful for the Asian population”. He’s referring to the work of the Asian Reference Genome Project (ARGP), and now is definitely the right time to go about this with the recently-heralded completion of the human genome (still with a Caucasian focus) but also the first reference genome from an individual of African descent announced in June 2021. Both of these relied upon developments in long-read nanopore sequencing..

“...the PromethION is primarily used for the Asian Reference Genome project because it’s powerful, it’s high-throughput"

Population-scale genome sequencing is something GIS and JJ have experience in already, having released a study in 2019 wherewhole-genome sequencing (WGS) was performed on almost 5,000 Singaporeans. In this particular case they used the data for genome-wide association studies (GWAS; finding known variants which may relate to a specific phenotype) and imputation (statistical inference of variants which may be related to a phenotype and not yet known). This study greatly increased the density of Asian genomic data in public databases; however, it still relied on existing reference genomes. According to JJ, “the PromethION is primarily used for the Asian Reference Genome project because it’s powerful, it’s high-throughput”, and they are sequencing samples from three ethnic groups within Singapore: Chinese, Malay, and Indian. The team are currently in phase one of the project, which involves sequencing 100 genomes spread across each ethnic group; following this, they aim to scale to a larger reference panel. “At 100 [genomes] you’re talking about 30 per ethnic group right, it’s not really big enough to capture … a good portion of genetic diversity of each population, so I think we need to go bigger to be able to have a good representation of genetic variation in each ethnic group. This is the reason why only the PromethION has been really used for the Asian Reference Genome project”. According to JJ, 500 or 600 genomes would represent a suitably wide range of genetic diversity.

The genetic material for this work is derived from fresh blood in order to extract “the high molecular weight DNA template”. Currently GIS is running “about 12” libraries in parallel on the PromethION at any time, and “in the near future we might be able to do 24 libraries”. This development has been aided by improvements to the basecalling software, with JJ stating “real-time basecalling on the machine is possible now, so we are evaluating [the move] to … 24 genomes a week”. They are aiming for 50–60x coverage per genome, with their standard read lengths coming in around 20–30 kb. They expect read length to further increase in the future and are “in the middle for implementing that … We do recognise for de novo assembly analysis that ultra-long reads can be extremely useful”. The benefits aren’t only in assembly as “for some very complex structural variants, you may have to really rely on these ultra-long reads to help you resolve it”, and JJ anticipates they can expect reads of 100 kb and longer. In depth analysis of the data has not yet started, with GIS currently developing “a state-of-the-art pipeline for Oxford Nanopore data analysis”. Once complete, they intend to run the first 100 genomes through this pipeline, with the first results anticipated in mid to late 2021.

Beyond canonical bases

Whilst it is clear that JJ’s passion currently resides with WGS and assembly for the ARGP, he is also aware of the other benefits that nanopore sequencing can provide. He contributed that, with nanopore sequencing, “you already have the signal” required to identify base modifications and “bisulphite conversion will maybe give some bias …[that] direct methylation might allow us to evaluate”. Improved results when compared to ‘gold-standard’ whole-genome bisulphite sequencing (WGBS) have already been demonstrated by researchers at the University of Cambridge working on mitochondrial human DNA. Their analysis of 55 publicly available datasets demonstrated 58% of WGBS samples exhibited bias. By optimising library preparation for nanopore sequencing, they presented accurate single nucleotide variant (SNV) and CpG methylation calling, thus overcoming the limitations of WGBS. This theme is continued in work by the Applications team at Oxford Nanopore Technologies, which demonstrated a lesser impact of GC bias, a lower read depth requirement, and greater reproducibility in nanopore 5-methylcytosine (5mC) data compared to results derived from WGBS. The ability for nanopore sequencing to detect modified nucleotides within long reads makes it ideal for detecting and phasing allele-specific methylation as well, so JJ has the perfect technology to help him explore his belief that “phased methylation data will be important”.

“Oxford Nanopore will give you a better assembly … it has great application for sequencing novel species”

Looking to the future

With complex disease having such a close association with clinically relevant disorders, JJ is clear that this represents a very real future application for long read sequencing. “Over there [in the clinical space*] the accuracy and completeness of the information is important. We understand that short-read sequencing is unable to resolve all the cases…[based] on my knowledge in Singapore, if you do [short-read] WGS … you are only able to resolve maybe half of cases. Other cases we don’t understand because short reads cannot resolve SVs or copy number with sufficient accuracy, and all of these [are] actually more likely to be present in serious genetic diseases. It doesn’t matter how deep you sequence, you won’t be able to get it. So, I think … long-read sequencing will become the main platform for clinical diagnosis, just because they have a potential to reveal all classes and types of variants”*.

There are still more possibilities in the research space as well. “Oxford Nanopore will give you a better assembly” states JJ, “So it has great application for sequencing novel species. There are [long read] efforts looking at plants, animals, and I think long-read sequencing and de novo assembly analysis is even better because they don’t even have a reference. Human has a reference, even if it’s not perfect for all the populations … For many of these other species you really have to do de novo [or] there’s nothing else for you to look at”. Some of this work has already begun, including the genome of the durian (which I had to search online for and, once I did, I understood what JJ meant by “either you love it or you hate it”), with additional work planned to sequence Singapore’s other local fruits and primate species. Regarding conservation, “now it is recognised as a research area, I think that we will have a consolidated and coordinated effort in this area”. .

With such promising aims and ambitions, I look forward to following the progress of JJ and the GIS over the coming months, to see how the drive for creating a relevant reference genome for his country goes on to improve healthcare for Singapore as a whole. Talking to JJ, it is clear that an exciting future is on the horizon, with the rapid development of sequencing technologies set to deliver unprecedented insights into human health, resolving the cause of complex genomic diseases.

Jonathan Pugh is an Associate Director at Oxford Nanopore Technologies and has spent almost 10 years developing and introducing nanopore sequencing technology

Want to learn more?

Discover more uses of nanopore sequencing for clinical research

See how long reads enhance SNV calling and phasing

Read about and efforts to sequence endangered species

*Oxford Nanopore Technologies products are currently for research use only. technology