Hunt, Arthur G

Hunt Lab Introduction

PI – Arthur G. Hunt, Department of Plant and Soil Sciences

My laboratory studies mechanisms of messenger RNA polyadenylation in plants and other eukaryotes. This research is multifaceted, involving genetics, molecular biology, and biochemistry. A key part of the toolkit is the application of high throughput sequencing; these technologies are tools that are deployed to study regulation (especially in the context of alternative polyadenylation), to characterize mutants, and to assess the activities of the poly(A) complex in vivo and in vitro.

Genetic approaches are used to study the roles of subunits of the complex that mediates mRNA polyadenylation. This complex (the PolyAdenylation Complex, or PAC) is composed of several subcomplexes – Cleavage and Polyadenylation Specificity Factor (CPSF), Cleavage stimulatory Factor (CstF), Cleavage Factors I and II (CFIm, CFIIm) – as well as scaffolding proteins (symplekin, FIP1, and RBBP6) and enzymes (poly(A) polymerase and nuclear poly(A) binding protein). In Arabidopsis, several mutants affected in the expression of various of these subunits have been isolated. For example, a null mutant that does not express the 30kD subunit of CPSF (CPSF30) has been the focus of my laboratory for 20 years. The study of these mutants reveals roles in development and responses of plants to environmental cues.

One approach for characterizing these mutants involves the adaptation of high throughput sequencing technologies. These are used in my laboratory to study the impacts of mutations on overall gene expression using standard RNASeq approaches. My laboratory pioneered an adaptation of RNASeq methods for the study of poly(A) site choice. These studies entail the production of sequencing libraries that query the mRNA-poly(A) junctions of transcripts, and subsequent computational analysis of results of mapping of sequencing reads. My laboratory has also adapted high throughput sequencing as an adjunct for RT/PCR and RACE studies, and as a means to study the biochemical (RNA-binding) activities of PAC subunits. The sequencing toolkit has been expanded to study epigenetic phenomena – DNA methylation, ChIPSeq determinations of RNA polymerase distributions, and patterns of m6A modifications of RNA. We have adopted several sequencing technologies to these ends – Illumina (MiSeq, HiSeq, NExtSeq), Ion Torrent, and PacBio. We plan on adding Oxford Nanopore sequencing to this toolkit.

In addition, we routinely draw from sequencing data that are available in public repositories (such as SRA and ENA) to test various hypotheses. For example, this approach is being used to analyze single cell RNASeq data to further characterize alternative polyadenylation at the cellular level.

I have adapted our in-house library preparation and data analysis pipelines to develop an RNASeq module that is taught in an upper-level undergraduate laboratory (ABT495). This module takes students through the RNASeq library production, and introduces students to different approaches for the analysis of RNASeq data. Specifically, in this class, students are provided with an introduction to selected command-line tools, and also to a popular and powerful web-based platform (Galaxy).

Following projects will be conducted using resources available on the HPC Cluster:

Genome-wide determinations of poly(A) site choice in plants and other eukaryotes.
Genome-wide determinations of gene expression using RNASeq approaches.
Patterns of DNA methylation in plants.
Global determinations of m6A modifications of RNAs in plants, using DARTSeq and other approaches.
ChIPSeq experiments that focus on RNA polymerase II (including various phosphorylated isoforms) and PAC subunits.
Genome and transcriptome assembly and annotation.
Application of high throughput sequencing to the study of RNA binding by various OPAC subunits.
Development of wet-bench and computational methodologies for novel applications of high throughput sequencing.
Adapting single cell transcriptomics to the study of alternative polyadenylation.

Students:

Lichun Zhou, Graduate, added on MCC cluster, 09/26/2022

Michael J Schlueter, UGraduate, added on MCC cluster, 09/26/2022

Caleb C Gooden, UGraduate, added on MCC cluster, 09/26/2022

Trinity M Love, UGraduate, added on MCC cluster, 09/26/2022

Computational methods:

We use a range of bioinformatics tools for sequence analysis, phylogenetic analyses, high throughput read mapping and downstream analyses. Many projects are conducted using on desktop computers using commercially-available software (primarily CLC Genomics Workbench), but we work with some datasets that exceed the computing power of the most powerful desktops. For example, current projects begin with raw data files of between 100 and 300 GB in size, files that CLC and command line tools often cannot handle owing to memory and disk storage limitations.

Software:

Listed on the HPC website (these are standard bioinformatics packages that we have used from time to time, and I expect may use through the HPC):

Anaconda

Augustus

Augustus-braker

Bcftools

Bcl2fastq2

Beagle-lib

Beast

Bedtools

Bioconductor-DESeq

Bioconductor-edgeR

BLAST

BLAT

Bonito-0.0.8

Bowtie

BRAKER

BUSCO

BWA

CD-HIT

Cell Ranger

Cufflinks

Cutadapt

FastQC

GATK

HISAT2

HTSeq

HyPhy

Java

MACS2

Maker

MapSplice

MashMap

Minoconda3

Minimap2

MrBayes

NumPy

Pocard

R

Rsem

Samtools

SRA toolkit

Star

StringTie

TopHat

Trinity

VCFtools

If possible, I would like to install a few additional packages (especially DEXSeq, but also one or two that have been developed in my lab).

Collaborator:

University of Texas Rio Grande Valley – Dr. Manohar Chakrabarti (Manohar Chakrabarti, manohar.chakrabarti@utrgv.edu).

Projects