Tim Mercer: Improving the precision of nanopore sequencing with a synthetic human genome


The complexity of human DNA sequences, and occurence of technical errors and artifacts, confounds the analysis of the human genome.  However, these errors can be understood and mitigated with the use of reference standards. We have developed internal synthetic RNA and DNA controls, termed sequins (sequencing spike-ins), that can be used to understand and improve nanopore sequencing. Sequins mirror human genomic, transcript and microbial sequences of interest. Due to the chiral properties of DNA, sequins retain the nucleotide content and repetitiveness of the original DNA sequence, and recapitulate many of the same errors and bias during nanopore sequencing. Due to their synthetic sequence, sequins can be added directly to an RNA/DNA sample prior to sequencing, and analyzed as internal qualitative and quantitative controls in the output library. We have built sequins that represent hundreds of clinically-important features of the human genome, including genes and mutations associated with cancer and inherited disease, diverse structural variants, mitochondria and immune receptors. We use these sequins during nanopore sequencing of the human genome where they provide an ideal internal ground-truth set by which we can evaluate the diagnosis of germline and somatic mutations, resolve complex structural variants (such as oncogenic translocations and papilloma viral insertions), and perform rapid HLA typing. We have also developed a set of sequins that comprise a synthetic transcriptome of hundreds of spliced human gene isoforms. When added to RNA samples, these sequins can help measure gene expression, resolve spliced isoforms, and assess the diagnosis of fusion genes in cancer samples. Finally, we have also built sequins that represent a synthetic community of microbial genomes that can assess pathogen detection, resolve strains and variants and enable improved normalization between multiple samples. In each of the above applications, we show how sequins can measure and also mitigate technical errors that occur during nanopore sequencing. By comparison to sequins, we can minimize the impact of base-calling errors, and thereby improve the resolution of difficult and refractory sequences, and ultimately improve diagnostic power and yield. Together, these studies show how reference RNA and DNA standards provide and simple, yet effective approach to improving the standardization, accuracy and performance of nanopore sequencing.