Telo-Seq - info sheet and recommendations

The Telomere Sequencing (Telo-seq) end-to-end protocol is currently available for MinION and GridION users via registration and for research use only. In this first release, the workflow is not fully supported, but the current version of the protocol is available using this link.


Introduction

Telomeres in health and disease

Telomeres are stretches of repetitive eukaryotic DNA found at the end of each linear chromosome (Schmidt et al. 2024). In humans they consist of repetitive motifs of n(TTAGGG) ending in a single stranded 3’ G-rich overhang (see Figure 1) (Smoom et al. 2023).

Telo seq Know-how V2 Fig 1 Figure 1. The telomeric 3' overhang. In this example the overhang starts with 'TTAGGG'.


Telomere attrition occurs during genome replication as the chromosomes ends cannot be fully replicated end-to-end. The telomeres provide padding as they do not contain genes, do not require complete replication, and may shorten without impacting gene expression. However, this can only occur a finite number of times before the telomere becomes too short for replication, known as ‘the Hayflick limit’, resulting in cellular senescence (Lulkiewicz et al. 2020). Therefore, telomere length may be correlated with cellular aging. Many oncogenic cells avoid senescence by activating telomerase or alternative mechanisms to elongate telomeres. Telomere characterisation through sequencing can improve understanding of their role in health and disease (Schmidt et al. 2024).

In humans there are 22 pairs of autosomal chromosomes, along with a pair of sex chromosomes XX or XY, making up 23 chromosome pairs. Both maternal and paternal chromosomes have telomeres on the P and Q arms (see Figure 2), for a total of 92 individual telomere arms.

Telo seq Know-how V2 Fig 2 Figure 2. Inheritance of parental chromosomes and their contribution to individual telomere arms.


Telo-Seq overview

Telo-Seq aims to measure telomere length accurately and assign each telomere to a chromosome arm. Telo-Seq uses the telomeric 3’ overhang to ligate custom ‘Telo-Adapters’ onto the end of each chromosome arm (see Figure 3). The DNA is then subjected to restriction digestion and subsequent 3’ dA-tailing. The restriction enzyme digests most of the chromosome, leaving the telomere and sub-telomere intact. A complementary splint is then annealed to the Telo-Adapter to create a cohesive end compatible with the sequencing adapter which is subsequently ligated.

Telo-seq workflow v0.4 LH edit Figure 3. Overview of the Telo-Seq library preparation.


Telo-Seq is currently in registration-based early-access, please register here to gain access to the Telo-Seq protocol and analysis pipeline.


Telomeric enrichment

To demonstrate the extent of telomeric enrichment that is possible through Telo-Seq, Table 1 shows the reported telomeric read outputs from a standard sequencing run with the Ligation Sequencing Kit V14 (SQK-LSK114) compared to Telo-Seq runs (with 1–15 μg input) from the same high molecular weight (HMW) genomic DNA (gDNA) sample. Without telomeric enrichment, there are very few telomeric reads in a standard sequencing run.

Telo seq Know-how V2 Table 1 Table 1. Telo-Seq telomeric read enrichment when compared to a conventional SQK-LSK114 library prep. Aggregate statistics of multiple MinION flow cells run for 48 hours. At all inputs tested, Telo-Seq demonstrates a significant increase in telomeric reads compared to standard approaches using the Ligation Sequencing Kit V14 (SQK-LSK114).


The Telo-Seq protocol recommends an input of 15 µg of HMW gDNA. As Figure 4 illustrates, starting with a higher mass of DNA is beneficial for Telo-Seq performance. Telo-Seq can be used to provide sequencing data for telomere length estimation and specific chromosome arm mapping (see Analysis section). Global estimation of telomere length may be achieved with 300–500 telomeric reads, whereas specific chromosomal arm telomere length estimation requires >1,000 reads to achieve >10X coverage per arm (see Length estimation section). Therefore, it is important to ensure that the input mass for the Telo-Seq protocol has been considered for the intended experimental objective.

Telo seq Know-how V2 Fig 4 Figure 4. An input titration of starting HMW DNA demonstrating Telo-Seq performance improves with increased input mass.


Length estimation

The Telo-Seq pipeline may be used to determine telomere lengths of each individual chromosomal arm, or in the absence of a genomic reference, report the global telomere length of the sample.

Where specific chromosomal arm coverage is required, we recommend a minimum of 10X coverage per telomere arm. 10X coverage yields an accurate median telomere length measurement, with increased coverage yielding increased precision (see Figure 5).

Spread across 92 telomere arms, 10X coverage may be achieved with 1 k telomeric reads, using a genome reference for mapping. Coverage per chromosomal arm may be uneven across different chromosomal arms, therefore 1 k telomeric reads is the minimum recommendation for this application, however, increased coverage is recommended. With fewer than 1 k telomeric reads, the coverage of individual chromosomal arms starts to decrease and may impact the accuracy of the telomere length measurement.

Where global telomere measurement is required, 300–500 telomeric reads are sufficient for a representative measurement without the need of a genome reference. However, it is important to note that global telomeric length measurement will not represent all the chromosomal arms equally. This is demonstrated in Figure 5 where global mean and median telomere length diverge with decreased sample depth.

Telo seq Know-how V2 Fig 5 Figure 5. Violin plots of telomere length distributions for of down-sampled read sets. Blue violins indicate the distribution of reads plotted against length. Median telomere length plotted in orange. Mean telomere length plotted in grey. Data sets are plotted against single arm alignment on the left and global arm alignments on the right. Top) non-down-sampled 20 k telomeric reads aligned to each chromosomal arm at maximum coverage. Middle) 1 k down-sampled telomeric reads represent each chromosomal arm in a similar trend compared to the non-down-sampled data, with >10X coverage of each arm. Bottom) 400 down-sampled telomeric reads, where read coverage per arm is <10X so a global telomere length is reported, and therefore a global measurement is more appropriate.
*In this example two chromosome arms are identical and therefore 91 chromosome arms are reported (Chr13_paternal_P = Chr13_paternal_P and Chr22_PATERNAL_P) of which Chr13_paternal_P telomere length shows a large distribution because the two arms have distinct telomere lengths but not sequence identity.


Sample input considerations

Fragment distribution

Optimal Telo-Seq performance is observed when most of the DNA fragments in the sample are longer than 10 kb. This is due to the inherent length of the telomere and sub-telomere. Sequencing long fragments allows the capture of chromosomal context for arm placement and alignment. To successfully align telomeric reads to a genomic reference, reads must contain sequence homology to the non-telomeric chromosomal sequence. For this reason, we recommend DNA inputs for Telo-Seq should not contain fragments shorter than 8 kb as shorter fragments will be less likely to map to chromosome arms, which may result in poor coverage of some chromosomes.

Several DNA extraction methods have been tested at Oxford Nanopore Technologies. Optimal fragment distributions for Telo-Seq performance have been observed in the following extraction methods:


Extraction methods shown to provide less appropriate DNA extractions are the QIAGEN DNeasy and QIAGEN Genomic-tip methods; use of these kits is not recommended for Telo-Seq.

If you do not have access to a means of assessing HMW gDNA fragment distributions, such as pulse-field gel electrophoresis or an Agilent Femtopulse, consider performing an SQK-LSK114 library preparation using 1 μg of your HMW gDNA extract to assess the fragment distribution of the sample, this may be sequenced on Flongle. Typically, samples that yield an LSK114 read N50 of >15 kb are appropriate for Telo-Seq.

If the sample you plan to perform Telo-Seq with has a large percentage of fragments below 10 kbp (or a read N50 <15 kb as determined by LSK114 sequencing), consider using the short fragment eliminator kit (EXP-SFE001). EXP-SFE001 can be used to size select HMW gDNA by depleting short fragments (<10 kb). The use of EXP-SFE001 on samples containing a high percentage of fragments below 10 kb has been shown to be beneficial for Telo-Seq performance (see Figure 6).

Telo seq Know-how V2 Fig 6 Figure 6. Telo-Seq performance of samples with sub-optimal fragment distributions before and after size selection using the EXP-SFE001 kit. Depletion of fragments <10 kb as measured by Agilent Femtopulse has a positive impact on Telo-Seq performance.


Sample origin

Telo-Seq development and validation at Oxford Nanopore Technologies has been performed using primarily HMW gDNA extracted from GM24385 cell culture, where the telomere and sub-telomere are an average of 8 kb long. Fundamentally Telo-Seq should be compatible with any DNA sample containing the repetitive telomeric n(TTAGGG) motif, although some organisms may have much longer telomeres or sub-telomeres which could impact chromosomal mapping. It is important to consider the cut site positions of the restriction enzyme. EcoRV is utilised in the protocol as most of the human chromosome cut site positions are 2 kb – 10 kb upstream from the telomeric 3’ overhang. We recommend carrying out an in silico digestion of the reference genome to determine theoretical cut sites and to check whether there is any cleavage within the telomere or sub-telomere.


Pre-hybridisation of the Telo-Adapters and Telo-Splint

Telomeres with the repetitive telomeric n(TTAGGG) motif may present a 3’ overhang in one of six different frames (see Figure 7). In humans there is evidence that there is a dominant frame of GGTTAG (Smoom et al. 2023).

Telo seq Know-how V2 Fig 7 Figure 7. The different frames of the n(TTAGGG) telomere 3' overhang. Here the first 7 bases of the 3’ overhang are highlighted to in blue, this is where the complementary Telo-Adapter ligates during the Telo-Adapter ligation.


To account for this, Telo-Seq uses a mix of six Telo-Adapters that make up the ‘Telo-mix’. Each of the six adapters represents a different frame so that all possible telomeric overhangs may be adapted. Figure 8 shows how pre-hybridisation of the Telo-Splint to the Telo-Adapters improves Telo-Seq performance. The protocol also includes a downstream splint annealing step which follows the Telo-Adapter ligation to maximise splinting. The inclusion of this splinting step helps improve performance consistency.

Telo seq Know-how V2 Fig 8 Figure 8. Pre-hybridisation of the Telo-Adapter with the Telo-Splint increases the enrichment of Telo-Seq.


Sequencing set-up and run parameters

We recommend the following parameters in MinKNOW:

  • Flow cell type: R10.4.1.
  • The latest release of MinKNOW (23.07.12 or newer).
  • Guppy (7.1.4) or Dorado (0.4.3 or newer).
  • Kit Selection: The standard Ligation sequencing kit script (SQK-LSK114).
  • Run options: Runtime limit of 48 hours, default minimum read length of 200 bp.
  • Analysis: HAC or ideally SUP basecalling (see the SUP vs HAC basecalling section of this document for more information).
  • Output: POD5 and FASTQ or BAM, default qscore threshold.

Example sequencing performance

Telo-Seq development and validation at Oxford Nanopore Technologies has been performed using primarily HMW gDNA extracted from GM24385 cell culture. Therefore, these expected outputs are based on the performance of Telo-Seq with this sample. A different output may be expected for alternative samples.

Telo seq Know-how V2 Table 2 Table 2. Representative outputs of Telo-Seq performed using HMW gDNA extracted from GM24385 cell culture.


Telo seq Know-how V2 Fig 9 Figure 9. A representative read length distribution for Telo-Seq.


Telo seq Know-how V2 Fig 10 Figure 10. The total of Gb sequenced increases over time, at 48 hours of sequencing output plateaus.


Telo seq Know-how V2 Fig 11 Figure 11. Qscore distribution over 48 hours of sequencing.


Telo seq Know-how V2 Fig 12 Figure 12. The activity of the pores over 48 hours of sequencing. It is expected that a proportion of pores will remain ‘Open’ for the duration of the run.


Telo seq Know-how V2 Fig 13 Figure 13. The health of the flow cell deteriorates more rapidly than with non-Telo-Seq experiments.


Telo seq Know-how V2 Fig 14 Figure 14. The translocation speed and flow cell temperature over 48 hours of sequencing.


Flow cell washing and reloading

If only a global estimation of telomere length is required, it may not be necessary to run a sequencing experiment for as long as 48 hours. For example: 300–500 telomeric reads are required for global length; with 3% reads on target, 10–17 k raw reads would be required. However, as the percentage of reads on target cannot be ascertained during sequencing, we recommend gathering an excess of reads to mitigate low telomeric outputs (see Table 3). Once sufficient raw reads have been accumulated, the experiment may be stopped, and the flow cell can be washed for later use using this method.

Telo seq Know-how V2 Table 3 Table 3. Example outputs of nuclease flushed flow cells. Two 5 μg gDNA input Telo-Seq libraries were prepared. The first library was loaded onto the MinION flow cell and sequenced for 2 hours to collect sufficient reads for a global telomere length estimation. After stopping the run, the flow cell was nuclease flushed and re-primed. The second library was loaded onto the MinION flow cell and sequenced for 2 hours to collect sufficient reads for a global telomere length estimation.


If individual chromosome arm telomere length estimation is required, >1 k telomeric reads are required. For example, with 3% reads on target, 34 k raw reads would be required for >1 k telomeric reads. A higher precision median telomere length measurement may be achieved with higher coverage; therefore, it is strongly recommended that an excess of reads is gathered. Note that flow cell flushing becomes less effective the longer the flow cell is run. After 48 hours the flow cell will be exhausted and flow cell flushing for re-loading is not advised beyond this point.


Analysis

Pass vs failed reads

When using a Guppy basecaller, sequencing reads are binned according to their q-score. Reads with a qscore below 9 are filtered into a fastq_fail output folder. Due to the inherent nature of the telomeric reads, some may have lower quality scores than the q-score threshold. Therefore, to ensure all telomeric reads sequenced are considered in the analysis, we suggest concatenating all fastq_pass and fastq_fail files together. If using the dorado basecaller with all the POD5 through command line, there is no qscore threshold unless stipulated through an optional parameter.

SUP vs HAC basecalling

While the telomere itself is a highly repetitive polymer of n(TTAGGG), it contains minor variations within the repeating sequence. For this reason, we recommend using the SUP basecaller model for the greatest sequencing accuracy. Figure 15 demonstrates the gains in on-target telomeric reads that may be achieved through SUP basecalling vs HAC basecalling.

Telo seq Know-how V2 Fig 15 Figure 15. A comparison of SUP vs HAC basecalling. SUP basecalling may yield a higher percentage of telomeric reads than HAC basecalling.


Analysis pipeline

Telo-Seq is currently in registration-based early-access, please register here to gain access to the Telo-Seq protocol and pipeline. Following registration, you will be provided with a link to the pipeline repository.

There are two pathways to utilise when analysing Telo-Seq data, based on the desired output:

  • Pathway 1: Samples only telomeric read counts and telomere length (‘Raw Reads’ - Unmapped). This takes ~5 minutes to run.
  • Pathway 2: Sample specific chromosome arm telomere read counts and telomere length. This takes ~1 – 3 hours to run (16 – 8 threads respectively). This pathway includes the pathway 1 results.

With pathway 2 there are three different filtering conditions reported in addition to sample only: ‘No Filter, ‘Lenient’ and ‘Strict’. These different filtering conditions are designed to be supportive of different user requirements when mapping to a reference. For instance, chromosome arm mapping for maternal/paternal references may be limited to one variant between arms so to reduce mis-mapping of high coverage samples, removal of reads that are not full length is recommended (strict). However, if coverage is low and/or the sample fragmented, then the small number of mismapped reads may be tolerated for the gain in coverage of reads with shorter sub-telomere length (lenient).

To minimise mismapping of a sample, the ‘strict’ filter is recommended. This filter uses only full length reads that extend to the enzyme cut site, subsequently ensuring the reads span the sub-telomere. This removal of fragmented reads reduces mismapping and noise and provides a higher accuracy in chromosome arm length estimation. However, due to the strict nature of this filter it does remove some fragmented, yet potentially useful telomeric reads.

If output is limited, it is advisable to use the ‘lenient’ filter. This filter uses reads that contain a complete telomere and at least 80bp of the sub-telomere for the chromosome arm length estimation.

The ‘No filters’ setting will use all reads with a map Q score above the default of 10 for the chromosome arm length estimation, but these can be modified by the user.

  • Raw reads = unmapped telomere reads
  • Mapped (no filters) = no additional filters
  • Mapped (lenient) = keep reads where the end mapping position is at least 80 bases beyond last telomere motif.
  • Mapped (strict) = keep reads where the start mapping position is before last telomere motif identification and end mapping position is within 25 bases of cut site (with exception of cut sites beyond 45 kb).

Pipeline output

Replicate Telo-Seq runs performed using HMW gDNA extracted from GM24385 cell culture have shown consistent coverage (see Figure 16) and telomere length distribution (see Figure 17) across the maternal and paternal chromosome arms using a sample with matching reference (HG002 v1.0 T2T). In this example two chromosome arms are identical which is why 91 chromosome arms are used (Chr13_paternal_P = Chr13_paternal_P and Chr22_PATERNAL_P) of which Chr13_paternal_P telomere length shows a large distribution because the two arms have distinct telomere lengths but not sequence identity. Samples that do not match the reference will experience mis-mapping and it is recommended for chromosome arm analysis to generate a sample specific reference.

Telo seq Know-how V2 Fig 16 Figure 16. Chromosome arm (single haplotype) coverage.
For chr22 one haplotype chr arm coverage is included in chr13 due to sequence identity. Chr21 maternal and paternal P arm are highly repetitive sub-telomeres and have cut site of ~65 kb – 150 kb upstream of the telomeric overhang, which is limited within the input sample fragment distribution. Chr5 maternal and paternal P arm have very short sub-telomeres which are also limited in size selection. Generally, there is consistent capture across the maternal and paternal arms with these exceptions.


Telo seq Know-how V2 Fig 17 Figure 17. Chromosome arm (maternal/paternal) telomere length.
For chr22 one haplotype chr arm coverage is included in chr13 due to sequence identity. Chr21 maternal and paternal P arm are highly repetitive sub-telomeres and have cut site of ~65 kb – 150 kb upstream of the telomeric overhang, which is limited within the input sample fragment distribution. Chr5 maternal and paternal P arm have very short sub-telomeres which are also limited in size selection. Generally, there is consistent capture across the maternal and paternal arms with these exceptions.


Methylation

Telo-Seq is a native library preparation, as such DNA modifications are retained on the nucleic acids that are sequenced. Sequencing data may be interrogated for these modifications, such as methylation. However, this is not something that is currently supported as part of the Telo-Seq early access.


References

Schmidt, T.T., Tyer, C., Rughani, P. et al. High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer. Nat Commun 15, 5149 (2024). https://doi.org/10.1038/s41467-024-48917-7

Lulkiewicz, M., Bajsert, J., Kopczynski, P. et al. Telomere length: how the length makes a difference. Mol Biol Rep 47: 7181–7188 (2020). https://doi.org/10.1007/s11033-020-05551-y.

Smoom, R, et al. Telomouse—a mouse model with human-length telomeres generated by a single amino acid change in RTEL1. Nat Commun 14: 6708 (Oct 2023). https://doi.org/10.1038/s41467-023-42534-6

Change log

Version Change
v2, Oct 2024 Addition of reference to article: High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer.
v1, Nov 2023 Initial publication

Last updated: 10/16/2024

Document options