A new benchmark: discussing telomere-to-telomere assemblies with the founder of GrandOmics

The foundations and vision of GrandOmics

Could you share how GrandOmics was established and what the company's vision is?

My lifelong goal was to become an entrepreneur who possesses a profound understanding of science and technology while simultaneously embodying the spirit of a scientist well-versed in the intricacies of business management. The year 2011 marked a turning point in my career, as I made the bold decision to depart from BGI and pursue my entrepreneurial aspirations. In the nascent stages of my entrepreneurial venture, the company's headquarters comprised merely two modest offices, totaling slightly over 100 square meters. Since 2012 we have been committed to long-read sequencing, venturing into diverse domains within the realm of technological services.

Our overarching vision is to emerge as a global leader in the realm of genomics and medical research, crafting an innovative, internationally renowned enterprise that stands on a par with pioneering companies in Silicon Valley.

Could you give an overview of what services GrandOmics provides?

Our offerings encompass third-generation sequencing technology for genome and transcriptome analysis, including single-cell transcriptomics, among others. Additionally, we are actively engaged in accumulating knowledge in the realms of algorithms, software, and databases.

Furthermore, we are in the process of developing reagent kits related to long-read sequencing. We are also working towards mastering third-generation sequencing platforms with the hope of gradually transforming the vision of precision medicine through our dedicated efforts. Our fundamental values revolve around ‘pushing the boundaries of science, pushing the limits of technology, upholding the ethics of humanity, and fostering a sense of human compassion.’ We are committed to exploring uncharted territory, even where the global scientific community has yet to venture.

Which nanopore sequencing devices does GrandOmics have?

‘Since I purchased the first nanopore MinION sequencer with a credit card in the UK in 2017, Oxford Nanopore Technologies has been one of our most important strategic partners.’

Our laboratory is equipped with a range of Oxford Nanopore sequencers, including the PromethION 48 (P48), MinION, GridION and PromethION 2 Solo (P2 Solo). The P48, as a high-throughput sequencer, is ideally suited for centralised laboratory settings and research service provision. Nanopore sequencers are portable, with low environmental requirements, making them ideal for areas that require rapid on-site testing. This was exemplified during the COVID-19 pandemic over the past three years.

Tackling T2T assemblies: the benefits and the barriers

Can you explain why T2T assemblies have been a goal in genomics research and what benefits they can provide over previously published assemblies?

Literally speaking, T2T refers to assembling a genome with no gaps, where every base pair from one telomere to the other is precisely and accurately sequenced.

Although in 2003 the International Human Genome Project claimed to have ‘completed’ the mapping of the human genome, it wasn't until the release of CHM13-T2T last year that the final 8% of the genome was ultimately filled. When we couldn't ’see’ these regions it was even more challenging to determine their functionality and value. If unknown regions exist in the most extensively studied human genome, then there will be even more blind spots and misconceptions in the genomes of other species.

I believe that T2T assembly offers us a perspective to comprehensively and completely examine the genome, enabling us to study genomic questions in a way that is fundamentally distinct from any previous genome assembly versions.

What have been the barriers to producing T2T assemblies historically?

In the genome, there exist numerous highly similar repetitive sequences. These complex regions, taking the human genome as an example, collectively account for over 50% of the entire genome. It was impossible to span these large repetitive sequences using short reads, rendering it challenging to precisely distinguish and assemble these regions.

It was not until the emergence of long-read sequencing technologies that a definitive solution was found. However, obtaining complete end-to-end T2T assemblies often required substantial sequencing and computational resources, which could be costly. Moreover, dealing with large-scale genomic data, including storage and analysis, presented significant computational challenges. The high cost of sequencing and the demands on computational resources remain barriers to the widespread application of T2T assembly in the current landscape.

Current focuses and perspectives of using nanopore sequencing technology

What projects using nanopore technology is GrandOmics currently involved in?

Currently, our primary focus lies in genome assembly, especially in the context of T2T assembly that involves ultra-long sequencing reads. Typically, we recommend to our clients to generate 150x or even greater BAC (bacterial artificial chromosome)-Long data to achieve optimal assembly results. In addition, we complement the assembly process with data from Oxford Nanopore’s end-to-end workflow for chromatin conformation capture, Pore-C. Our empirical findings demonstrate that Pore-C data has distinct advantages over Hi-C in resolving multi-way chromatin interactions.

Beyond T2T assembly, we also conduct research on full-length single-cell transcriptomes using the nanopore platform. We have mature solutions for both manual selection methods like SCAN-seq and seamless integration with automated sorting platforms, such as 10X Genomics. As for resequencing efforts, we plan to primarily carry them out on the PromethION 2 device, accompanied by our in-house developed GrandBox bioinformatics analysis and annotation reporting integrated system.

Are there key genomic regions or structures which are particularly challenging to assemble, and what insights can they provide?

Based on our current experience, regions rich in repetitive sequences pose the main challenges in genome assembly. These areas include centromeres, pericentromeric regions, subtelomeric regions, ribosomal DNA regions, and tandem repeat regions. Additionally, the Y chromosome and highly polymorphic loci like the major histocompatibility complex represent key difficulty zones in assembly. These regions have been demonstrated to offer valuable insights into genome architecture, evolution, and disease susceptibility.

Why did you decide to use nanopore technology for working on T2T assemblies, and how has it benefited you?

‘Oxford Nanopore technology stands out as the longest read length commercial platform available today. Its natural advantage is particularly evident when addressing ultra-long repetitive sequence regions.’

When combined with our proprietary BAC-Long sequencing approach, we effortlessly achieve ultra-long read lengths with an N50 of up to 150 kilobases. In addition to widespread applications in human T2T assembly, we have partnered with numerous researchers to achieve T2T genome assembly in various species, including maize, rice, Arabidopsis, potatoes, watermelon, and cowpeas. This collaborative effort has yielded high-quality T2T genome publications in prominent journals, such as Nature Genetics and Cell Research. Oxford Nanopore's ultra-long sequencing technology has played a pivotal role in facilitating T2T genome assemblies, especially excelling in the assembly of centromeres, telomeres, and repetitive regions.

Whilst working with nanopore technology at GrandOmics, are there any other highlights or successes that you wish to share about your work?

In 2019 we obtained the world's first P48 sequencer. I vividly remember the excitement that permeated our team at that time, as the arrival of the P48 finally addressed the long-standing issue of insufficient throughput. We worked collectively as a team and achieved an astounding feat: completing 96 third-generation, whole-genome sequencing projects within 96 hours, generating a total of 7.6 Tb of data.

We also collaborated with Professor Fuchou Tang to develop SCAN-seq, a third-generation, full-length transcriptome sequencing technology based on the nanopore platform. SCAN-seq has taken the field of single-cell research from ‘One gene, one phenotype’ to a new level of ‘one RNA isoform, one phenotype.’

Furthermore, our NextDenovo series of assembly software, developed from the outset using Oxford Nanopore data, has gained over 11,000 downloads on GitHub. Compared to similar tools, it stands out with lower error rates and faster speeds, offering distinct advantages.

Looking to the future

Do you think T2T assemblies will become standard practice in the future, and what do you think the impact will be?

‘I wholeheartedly believe that T2T assembly will become a routine, and even a standard, technique in many critical fields.’

We have collaborated with the National Medical Products Administration to set standards for T2T genome assembly. In the field of genome assembly, T2T assembly has already replaced simpler third-generation assembly techniques, establishing a new benchmark.

Let's consider this scenario: when we can easily achieve T2T assembly for individuals, it means that every SNV and structural variation in an individual's genome will be revealed without any gaps or errors. Patients with well-defined family histories of genetic diseases will receive a clear diagnosis, as opposed to the current situation where the detection rate for genetic diseases is often less than 50%. However, achieving the desired technological outcomes with just one or two T2T assemblies is not feasible. When we attempt to directly link a trait to genome-level sequences or variations, comparing them to a reference genome is often insufficient to determine whether a variation is real or simply a population-level polymorphism.

We firmly believe that T2T pan-genomics is the ultimate tool for genomic research. In line with this belief, we have made substantial investments in research and development, developing a complete set of technical solutions.

What are you most excited about for the future of genomics?

Personally, I am hopeful for the true clinical implementation of long-read sequencing technology. Over the past decade that I've been focusing on long-read sequencing, this pioneering technology has made rapid advancements in scientific research, yielding numerous significant research outcomes. However, it has yet to make the decisive leap into clinical translation.

The Human Genome Project is finally complete, while initiatives like the Human Pangenome Project and Chinese Pangenome Project are officially underway. These developments mark the beginning of the integration of long-read sequencing into genomic medicine. I believe that in the next 3-5 years, long-read sequencing will achieve a ‘landmark in medical application’ in the diagnosis of rare diseases, and will subsequently expand into more complex and larger markets, such as cancer. At that time, we will be equipped with our cutting-edge application tools, ready to embrace the era of long-read genomic medicine.

Huang, Z. et al. Evolutionary analysis of a complete chicken genome. Proceedings of the National Academy of Sciences of the United States of America 120(8): e2216641120 (2023).

Hu, J. et al. An efficient error correction and accurate assembly tool for noisy long reads. bioRxiv (2023).

Hu, J. et al. NextPolish2: a repeat-aware polishing tool for genomes assembled using HiFi long reads. bioRxiv (2023).

Gao, Y. et al. A pangenome reference of 36 Chinese populations. Nature 619:112-121 (2023).

Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nature Genetics 55:1221-1231 (2023).

Peng, C. et al. Large-scale snake genome analyses provide insights into vertebrate development. Cell (2023).

Yang, C. et al. The complete and fully-phased diploid genome of a male Han Chinese. Cell Research 33:745-761 (2023).

Hou X. et al. A nearcomplete assembly of an Arabidopsis thaliana genome. Molecular Plant 15(8): 1247-1250 (2022).

Shang, L. et al. A complete assembly of the rice Nipponbare reference genome. Molecular Plant 16(8): 1232-1236 (2023).

He, Y. et al. T2T-YAO: a telomere-to-telomere assembled diploid reference genome for Han Chinese. Genomics, Proteomics & Bioinformatics (2023).