Species-level resolution using the RESCUE pipeline and nanopore sequencing

Joseph Petrone is a bioinformatician at the American Type Culture Collection (ATCC), involved in the curation and expansion of the ATCC genome portal resource for deposited strains. Joseph’s research focusses on applying nanopore sequencing for the development of a microbial genome assembly pipeline that overcomes taxon-specific challenges. We caught up with Joseph to discuss his research into microbial genomics, how nanopore long-read sequencing technology is improving our understanding of microbial taxonomic classification and the impact this will have on microbial phylogenetic research.

You can hear about Jospeh’s work in his webinar “Targeted nanopore sequencing for bacterial and viral classification” on Tuesday the 12th of October 2023.

Watch talk

What are your current research interests?

My current interests involve the expansion and curation of the ATCC® genome portal as a resource for an authentic genomic database with direct linkages to deposited strains. As an exercise in assembling the genomes across our entire collection of microbes, a current interest involves the development of an assembly pipeline that mitigates taxon-specific challenges. In turn, we hope to polish a pipeline to assemble all microbes regardless of their taxonomy.

What first ignited your interest in bioinformatics and what lead you to focus on microbial genomics?

I would have to first credit the “spark” in my interest of bioinformatics to Dr. Valerie De Crecy-Lagard. Throughout her classes, Dr. de Crecy opened the door to the world of bioinformatics and tools available to the public. As the class progressed, we were then tasked with generating working predictions about a protein’s function based on the output of multiple tools. The way the class was set up, the “unknown” proteins we characterized belonged to sets of classification that were incorrect due to issues in databases and depositions of similar proteins. This is when I really got excited about the world of bioinformatics. Researching in Dr. Eric Triplett’s lab, I fell deeper in love with bioinformatics as I began to utilize comparative genomics to draw testable conclusions about the metabolism of Liberibacter spp. As my research branched out, I started finding a niche that sat in the intersection of all my focuses. Whole genome extraction and nanopore sequencing, targeted 16S analysis, high throughput rpsL screening, and writing code for these analyses guided me into a field that could be summed up as bioinformatics with a love for microbial genomics.

How is nanopore sequencing changing the way we approach microbial classification? How has it benefitted your work?

Nanopore sequencing has forever changed the next-generation sequencing field and how we approach microbial classification. No longer being limited by the length of reads, researchers can finally begin the jump towards high-throughput microbial classification using long reads. As our read length increases in experiments, I like to overuse the analogy of comparing it to reading a book. As our sentences become “paragraphs”, the story becomes more complete. While this technology has existed before, any researcher that used other long-read technologies before nanopore sequencing can sympathize with the difficulty and unmodifiable library prep, the inability to afford those sequencers in the lab, and the hours of work just trying to figure out the software for data analysis. When Oxford Nanopore Technologies rolled out into our labs, all of these points of difficulty were immediately resolved while simultaneously opening the door to a community-based company where researchers felt empowered to try experiments themselves and were rewarded by the novel triumphs of our peers. Nanopore sequencing has benefitted my work in all aspects, but one of my favorite parts is the live data analysis. Instead of submitting my sample to a core for analysis, I had the ability to start a daunting experiment, load up a flow cell, and immediately see the data being generated. It felt comforting knowing that I was always able to stop a sequencing run, wash, and re-load a flow cell if I wanted to change any constraints to my experiments.

What impact could the ability to accurately classify bacterial species with deeper taxonomic resolution have for researchers?

The ability to classify bacterial taxa with deeper classification truly opens the world to microbial ecologists. As a field of research, my viewpoint is that we grew too comfortable saying that a specific genus of a bacteria can be correlated to a specific disease manifestation or protection. Working with various strains of the same species, a researcher could comfortably tell you that even species-level analysis has its biases as genomic differences in strains have a phenotypical impact. As long-read sequencing and analysis continues to develop, the real benefit comes from the deeper classification of previous correlational studies. The deeper we can go in bacterial classification; the more specific researchers will be able to be in identifying correlations to these taxa. That in turn will open the floodgates to the “next steps” of research where one can work with a more targeted set of microbes to test the validity of these correlations.

What have been the main challenges in your work and how have you approached them?

One of the biggest challenges of my work would have to be “future-proofing” a study. As the fields of research were constantly changing in the background, I tried to approach all my projects with a mindset of “will this still be relevant in 10 years”. While that was a daunting challenge itself, it really opened my eyes to the realization that most tools and software that are commonly used weren’t recently created and some software aren’t even maintained on their github pages. This mentality also opened my eyes that a lot of the current databases we were using were collections of entries from decades ago with underlying data deposited from pyrosequencing, Solexa, and Ion Torrent.

What’s next for your research?

Next for my research is the expansion of the ATCC® genome portal. As our collection of available authentic genomes grows, it also opens the door for the downstream tools and analysis we will be able to create that utilizes this data. As my heart is always true to microbial genomics, the exercise in assembling and curating the genomes across ATCC’s wide taxonomy of microbes really allows us to learn the intricacies of taxon-specific genomics. Additionally, we hope an authentic database will open the possibilities of downstream researchers that were drawing incorrect conclusions about their research due to the quality of the databases they are using. This means enhanced confidence in whole genome analysis, predictive metabolomics, target amplicon databases, and so much more!