Assessing DNA Methylation as a Predictor for Non-Coding RNA Expression

5-methyl-Cytosine (5mC) is an epigenetic modification of DNA linked to gene regulation. In human genomes 5mC typically occurs in regions with a Cytosine-phosphate-Guanine (CpG) motif3. Clusters of these motifs are called CpG islands. The quantity of methylated sites occurring near promoter regions (in conjunction with other regulatory factors) correlates with gene expression; low quantities of methylation indicate expression, high quantities of methylation indicate non-expression.

Here we assess de novo assemblies derived from Nanopore UL data with paired UL methylation information as a strategy for predicting non coding RNA expression.

1. De novo assembly, using Flye, with UL Data and 5mC Methylation calling.
2. tRNA position identification using tRNAscan-SE
3. Methylation score for 500, 1000, and 5000 bases upstream of tRNA positions.
4. Prediction of tRNA expression based on 1000 upstream Methylation score. (>15
not expressed, <= 15 expressed)
5.Comparison of predictions with observed direct tRNA sequencing results.

We obtained an accuracy of 84.35%, precision of 86.36%, recall of 97.24% and F1 of 91.40% for our best performing parameters; 0.5kb upstream window, with a 5mC cut-off of 30.

Download the PDF