Dr. Elisa D’Angelo, Associate Professor

Affiliation: Department of Plant and Soil Sciences, College of Agriculture, Food, and Environment, University of Kentucky

Project Description

This request is driven by the need to determine the effects of changing physical and chemical conditions on the types of microorganisms and what they are doing in environmental samples using a relatively new approach called RNA-Seq (i.e. metatranscriptomics).

RNA-Seq is a powerful analytical method that simultaneously determines the identity and gene expression of millions of active microbial populations in a soil or water sample (of which >99% are unculturable) which has thus far not been possible by any other technique.

Briefly, RNA-Seq involves extracting ribonucleic acids (RNA) that are synthesized by millions of actively growing microbial cells in a sample, and sequencing RNA molecules using a Next Generation Sequencer, such as Illumina NextSeq 550. The sequencer generates tens of millions of sequences per sample which were produced by active microbial cells in the sample.

In collaboration with UK faculty Drs. Jason Unrine, Olga Unrine, and Mark Farman, and UK Ph.D student Anik Mahmoud, we plan to utilize RNA-Seq to evaluate changes in the abundance of active microbial populations and genes expressed in anaerobic bioreactors for coal slurry impoundment waste treatment. We hypothesize that microbial populations will be significantly different in different types of bioreactors and over time, so RNA samples will be collected every 3 months over the course of one year from bioreactors maintained under different conditions in the laboratory.

RNA sequences will be evaluated using a bioinformatics analysis pipeline called SAMSA2, which consists of several software programs that remove low quality and non-mRNA sequences from the data set (e.g. Trimmomatic, SortMeRNA), match sequences to those in well-annotated databases (e.g. DIAMOND alignments with NCBI database ), and determine statistical differences in gene expression between experimental treatments (DeSEQ2). SAMSA2 all other programs in the pipeline are web-based and freely available.A description of the SAMSA2 pipeline is available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963165/.

A typical RNA-Seq experiment generates several terabytes of data, and according to SAMSA2 documentation, requires 128 Gb of allocated RAM for data analysis. For these reasons, RNA-Seq experiments are only possible using high performance computing systems such as available at the Center for Computational Sciences at the University of Kentucky. To conduct research describe here, we request allocation of 5000 h of compute time.

If this compute allocation time request is approved, then we plan to collect preliminary RNA sequence data, analyze it using the SAMSA2 bioinformatic pipeline in the UK HPC system, and use results for inclusion in an NSF grant proposal and funding in September 2019.