JAFFAL: Detecting fusion genes with long read transcriptome sequencing

Massively parallel short read transcriptome sequencing has greatly expanded our knowledge of fusion genes which are drivers of tumor initiation and progression. In cancer, many fusions are also important diagnostic markers and targets for therapy. Long read transcriptome sequencing allows the full length of fusion transcripts to be discovered, however, this data has a high rate of errors and fusion finding algorithms designed for short reads do not work.

While numerous fusion finding algorithms now exist for short read RNA sequencing data, methods to detect fusions using third generation or long read sequencing data are lacking. Here we present JAFFAL a method to identify fusions from long-read transcriptome sequencing. We validated JAFFAL using simulation, cell line and patient data from Nanopore and PacBio.

We show that fusions can be accurately detected in long read data with JAFFAL, providing better accuracy than other long read fusion finders and within the range of a state-of-the-art method applied to short read data. By comparing Nanopore transcriptome sequencing protocols we find that numerous chimeric molecules are generated during cDNA library preparation that are absent when RNA is sequenced directly.

Finally, we demonstrate that JAFFAL enables fusions to be detected at the level of individual cells, when applied to long read single cell sequencing. JAFFAL is open source and available as part of the JAFFA package at https://github.com/Oshlack/JAFFA/wiki.

Authors: Nadia M Davidson, Ying Chen, Georgina L Ryland, Piers Blombery, Jonathan Goeke, Alicia Oshlack