Danny Miller — The 1000 Genomes Project Oxford Nanopore Sequencing Consortium: expanding our understanding of human genetic variation
Danny Miller (University of Washington, USA) opened his talk at the Nanopore Community Meeting Singapore by highlighting that, in contrast to other types of medical testing, in which multiple tests can be ordered at once to reach an answer quickly, clinical genetic testing for rare paediatric diseases currently involves ordering one test at a time. A family will often have to enter a diagnostic odyssey, going through stepwise tests such as microarray, repeat expansion testing, and finally exome or whole-genome short-read sequencing. This process can take years; at the end of this time, less than half of families will receive a diagnosis. Danny asked: ‘why can’t we do this in a single step?’. Could long nanopore reads offer the potential to address these challenges?
‘The way that I think of long-read sequencing is that it simplifies the analysis of complex genetic regions’
Danny introduced how long nanopore reads can characterise complex regions of the genome. Where repetitive sequences are often intractable to traditional short-read sequencing technology, limiting analysis of repeat expansions, ‘long reads simplify it by simply, in many cases, just spanning the length of the repeat region itself’. He described the ‘information-rich’ nature of nanopore reads, in which epigenetic modifications can be analysed alongside genomic variants, and emphasised how ‘this is important because we know that if we use other technologies, like short-read sequencing, we miss a lot of things’. Short-read sequencing, Danny noted, misses ‘at least half’ of structural variants (SVs) across the genome.
To put this potential into context, Danny shared the currently used clinical testing pathway for Beckwith-Wiedemann syndrome. This begins with a test to assess methylation at three loci; if negative or uninformative, this is followed by further tests for different genomic variants. In contrast, with whole-genome nanopore sequencing of a clinical research sample, Danny demonstrated that it was possible to detect SVs, copy number variants (CNVs), and single nucleotide polymorphisms (SNPs) from the same dataset. He shared an example in which nanopore sequencing of a clinical research sample revealed potential uniparental isodisomy and a pathogenic SNP, where the current diagnostic pathway would require 4–5 separate tests.
Danny illustrated the many reasons for the currently low rates of diagnosis through several examples. Firstly, not all genotype-phenotype relationships are yet understood, and many genes are difficult to sequence with traditional sequencing technologies. He shared an example of a newborn who had respiratory failure at birth, but later recovered well. Short-read sequencing of the subject and their mother revealed a likely pathogenic 2 bp deletion in the gene HYDIN, which could suggest the recessive condition ciliopathy, but a second variant was not found. However, Danny noted that the 400 kb gene features a large segmental duplication which is known to be poorly covered by short-read sequencing, limiting the ability to find or rule out a variant in this region. As part of a study in collaboration with Oxford Nanopore, Danny and his team performed nanopore sequencing of a clinical research sample, using adaptive sampling — a bioinformatics-based, real-time targeted sequencing method — to access this region with long nanopore reads. The data suggested that a second variant was not present. Clinical testing of a cell sample later confirmed that the cells were healthy.
Secondly, many SVs are missed by currently used technologies. He shared an example of a subject with prolonged bleeding. A bleeding disorder panel identified a SNP in the gene FGA which was associated with the phenotype of a recessive disorder but, once again, only one variant could be found. Again using adaptive sampling to enrich the gene in a clinical research sample, they were able to characterise and phase the first variant — a maternally inherited stop codon — and reveal the second, previously hidden pathogenic variant: a 4 kb deletion covering the first exon of FGA, suggesting the presence of the recessive bleeding disorder.
Another challenge for currently used diagnostic tests is the characterisation of segmental duplications and, in particular, regions where they overlap. Giving an example of a family in which two siblings were affected by Fanconi anaemia, Danny described how multiple clinical tests spanning a long period revealed only a single variant in the gene FANCD2. Sequencing a clinical research sample from an affected subject with adaptive sampling to target this gene, the long nanopore reads characterised a 400 bp deletion in a region featuring two overlapping segmental duplications — ‘so you can imagine why it’s so hard for short reads to evaluate this'.
In a fourth example, Danny described a subject with a suspected glycogen storage disorder. Previously, a panel had found a single pathogenic variant in the gene AGL, but no second variant; next, SNP and exon-level arrays had come back negative. Whole-genome short-read sequencing of a clinical research sample had suggested a potential second variant — a translocation — but subsequent optical mapping again came back negative. A clinical research sample was then sequenced with long nanopore reads, and the data phased. This revealed that the first variant was maternally inherited, and characterised a likely second variant: a paternally-inherited Alu insertion, likely to disrupt splicing.
Looking back on these examples, Danny stressed that three featured variants for which the frequency is truly unknown, due to the difficulty in reliably capturing them with traditional technology. This, he explained, is what makes the 1000 Genomes Project so important. Comprising ~3,200 samples, the collection has been used to evaluate patterns of healthy human variation — but ‘one thing that’s missing is long-read sequencing of this collection’. To capture variation across challenging genomic regions, Danny and his team selected 800 research samples from the project for nanopore sequencing. Sequencing to a minimum depth of 30x, the first 100 samples produced read length N50s of 40–80 kb; variants were then called using the CARD pipeline1. Crucially, he described how ‘as you add individuals, you get more and more novel structural variants’, with more novel SVs observed in research samples from individuals of African ancestry, indicating their underrepresentation in previous datasets. The team are now developing applications to help filter and prioritise these variants.
Concluding his presentation, Danny expressed his belief that long nanopore reads have the potential to ‘change our approach to clinical genetic testing within five years’, as ‘a single test that will replace our current stepwise approach’. He emphasised the potential future utility of the technology to identify variants from haplotype-resolved assemblies that could increase diagnostic rate and reduce turnaround times around the world, including in low-resource settings where such technology is currently inaccessible.
Ahmad Abou Tayoun – Nanopore sequencing as a potential diagnostic tool for genetic diseases in the Middle East
Ahmad Abou Tayoun (Al Jalila Children’s Specialty Hospital, United Arab Emirates) began his talk at London Calling 2023 by describing how ‘rare diseases are individually rare but collectively common’: adding up to a cumulative frequency of 6% prevalence, these conditions affect four million individuals globally. However, rare diseases can be very challenging to diagnose; patients and their families often go through a diagnostic odyssey spanning five years or more, during which time they may undergo recurrent hospitalisation and many tests. Due to a lack of genomic diagnostic facilities, Ahmad suggested that diagnostic odysseys may be expected to be longer in the Middle East than elsewhere.
Ahmad introduced Al Jalila Children’s Speciality Hospital, which houses a multi-disciplinary paediatric centre and the facility for in-house clinical genomic testing through whole-exome and whole-genome short-read sequencing, targeted panels, and microarrays. Though the use of sequencing-based methods has increased the diagnostic rate up to 40%, from 30% without these methods, this still leaves the majority of families without an answer. Ahmad also noted that if a patient receives a negative result for a chromosomal microarray — which is the case 80% of the time — they are unlikely to be able to subsequently access an exome sequencing test, and vice versa. It is these challenges that motivate Ahmad and his team to investigate the potential to ‘establish a comprehensive, accurate, single assay that can capture SVs, SNVs, CNVs, and indels’.
To study the potential of long nanopore reads to capture all this information from a single dataset, Ahmad and his colleagues conducted a pilot project, in collaboration with Oxford Nanopore, to sequence 50 clinical research samples. Ten of these had known pathogenic variants, while the rest had received negative results — ‘we expect something missed by short-read sequencing’. Ahmad emphasised that the data obtained demonstrated the potential of nanopore sequencing to ‘detect all types of different variants that have been otherwise detected by different validated clinical assays’ — covering SVs, SNVs, trisomy, uniparental disomy, aneuploidies, and methylation disorders.
Ahmad shared several examples from the pilot project. The first was that of a clinical research sample from a subject with global developmental delay. Long nanopore reads revealed the presence of an inverted deletion/duplication on chromosome 8, which ‘would have clearly been missed by whole-exome sequencing’. Chromosomal array analysis of this research sample later validated this finding. In a second example, the team sequenced a clinical research sample from a subject expected to have Angelman syndrome, an imprinting disorder caused by aberrant methylation on chromosome 15. Through PCR-free nanopore sequencing of native DNA, it was possible to directly study methylation at this locus, revealing a lack of methylation in both alleles consistent with Angelman syndrome.
In another example, a clinical research sample was sequenced from a subject with the eye disorder anterior segment dysgenesis. The diagnostic odyssey for this subject had featured a negative result for targeted sequencing of the gene PAX6. Whole-exome trio sequencing had identified a frameshift heterozygous variant in a gene associated with autosomal recessive eye disease — but as only one variant could be found, the test remained inconclusive. Sequencing a research sample with long nanopore reads enabled the characterisation of a previously hidden 79 kb deletion in this gene, in trans with the frameshift mutation. Primers were generated across the breakpoints, and this mutation was confirmed via PCR. Phased analysis demonstrated compound heterozygosity, indicating that these variants were pathogenic.
Ahmad then introduced another study, in collaboration with Oxford Nanopore and Asuragen, focusing on spinal muscular atrophy (SMA). The most common form of SMA — accounting for ~95% of cases — results from the biallelic deletion of exon 7 in the gene SMN1, while ~5% result from SNVs. Furthermore, ~5–30% are healthy carriers who carry two SMN1 copies in cis. He explained that this condition was selected as a prototype for rare diseases as it is the second most common recessive fatal disorder in several populations, the current diagnosis rate at the hospital is >30%, and it is one of the first rare disorders for which life-saving gene therapy is available. Effective screening for this disease would require a ‘comprehensive, accurate, cost-effective’ method of characterising variants in the gene SMN1 and its paralog SMN2; however, the high level of identity between the two make this difficult using traditional short-read technology, and the locus features variable copy numbers due to frequent unequal crossovers.
‘long-read sequencing…has great potential of becoming a powerful tool for rare disorders in general’
The team developed two amplicon-based approaches to enrich and sequence exons 1–8 of SMN1/2 from clinical research samples using long nanopore reads. One method uses the ratio of a C/T variant that differs between the gene and paralog to call SNVs, while the other uses depth of coverage and paralog-specific variants to study copy numbers. Sequencing clinical research samples with these methods and comparing the data to previously conducted clinically validated tests, they found 100% concordance for SMN1 copy number variants and 97% for SMN2. The team expect to reach 100% concordance for the latter with bioinformatics optimisation. The data also allowed the identification of silent carrier variants; Ahmad shared an example in which nanopore sequencing of research samples from a family affected by SMA demonstrated the capacity to characterise the molecular basis of the inheritance. Where the currently used clinically validated test would have indicated a low risk of a child inheriting the disease in this pedigree, the nanopore sequencing data would have revealed the hidden silent carrier configuration of two copies of SMN1 in cis, indicating a 25% chance of inheriting the disease. Next, they plan to test the potential of the method to detect pathogenic variants.
Concluding his presentation, Ahmad highlighted that long nanopore sequencing reads have ‘great potential of becoming a powerful tool for rare disorders’ and that targeted sequencing of challenging loci demonstrates ‘high accuracy combined with fast results, efficient pooling of samples, and reduced costs’.
1. Kolmogorov, M. et al. Nat. Methods 20:1483–1492 (2023). DOI: https://doi.org/10.1038/s41592-023-01993-x