Hunt, Arthur G
Hunt Lab Introduction
PI – Arthur G. Hunt, Department of Plant and Soil Sciences
My laboratory studies mechanisms of messenger RNA polyadenylation in plants and other eukaryotes. This research is multifaceted, involving genetics, molecular biology, and biochemistry. A key part of the toolkit is the application of high throughput sequencing; these technologies are tools that are deployed to study regulation (especially in the context of alternative polyadenylation), to characterize mutants, and to assess the activities of the poly(A) complex in vivo and in vitro.
Genetic approaches are used to study the roles of subunits of the complex that mediates mRNA polyadenylation. This complex (the PolyAdenylation Complex, or PAC) is composed of several subcomplexes – Cleavage and Polyadenylation Specificity Factor (CPSF), Cleavage stimulatory Factor (CstF), Cleavage Factors I and II (CFIm, CFIIm) – as well as scaffolding proteins (symplekin, FIP1, and RBBP6) and enzymes (poly(A) polymerase and nuclear poly(A) binding protein). In Arabidopsis, several mutants affected in the expression of various of these subunits have been isolated. For example, a null mutant that does not express the 30kD subunit of CPSF (CPSF30) has been the focus of my laboratory for 20 years. The study of these mutants reveals roles in development and responses of plants to environmental cues.
One approach for characterizing these mutants involves the adaptation of high throughput sequencing technologies. These are used in my laboratory to study the impacts of mutations on overall gene expression using standard RNASeq approaches. My laboratory pioneered an adaptation of RNASeq methods for the study of poly(A) site choice. These studies entail the production of sequencing libraries that query the mRNA-poly(A) junctions of transcripts, and subsequent computational analysis of results of mapping of sequencing reads. My laboratory has also adapted high throughput sequencing as an adjunct for RT/PCR and RACE studies, and as a means to study the biochemical (RNA-binding) activities of PAC subunits. The sequencing toolkit has been expanded to study epigenetic phenomena – DNA methylation, ChIPSeq determinations of RNA polymerase distributions, and patterns of m6A modifications of RNA. We have adopted several sequencing technologies to these ends – Illumina (MiSeq, HiSeq, NExtSeq), Ion Torrent, and PacBio. We plan on adding Oxford Nanopore sequencing to this toolkit.
In addition, we routinely draw from sequencing data that are available in public repositories (such as SRA and ENA) to test various hypotheses. For example, this approach is being used to analyze single cell RNASeq data to further characterize alternative polyadenylation at the cellular level.
I have adapted our in-house library preparation and data analysis pipelines to develop an RNASeq module that is taught in an upper-level undergraduate laboratory (ABT495). This module takes students through the RNASeq library production, and introduces students to different approaches for the analysis of RNASeq data. Specifically, in this class, students are provided with an introduction to selected command-line tools, and also to a popular and powerful web-based platform (Galaxy).
Following projects will be conducted using resources available on the HPC Cluster:
- Genome-wide determinations of poly(A) site choice in plants and other eukaryotes.
- Genome-wide determinations of gene expression using RNASeq approaches.
- Patterns of DNA methylation in plants.
- Global determinations of m6A modifications of RNAs in plants, using DARTSeq and other approaches.
- ChIPSeq experiments that focus on RNA polymerase II (including various phosphorylated isoforms) and PAC subunits.
- Genome and transcriptome assembly and annotation.
- Application of high throughput sequencing to the study of RNA binding by various OPAC subunits.
- Development of wet-bench and computational methodologies for novel applications of high throughput sequencing.
- Adapting single cell transcriptomics to the study of alternative polyadenylation.
Students:
Lichun Zhou, Graduate, added on MCC cluster, 09/26/2022
Michael J Schlueter, UGraduate, added on MCC cluster, 09/26/2022
Caleb C Gooden, UGraduate, added on MCC cluster, 09/26/2022
Trinity M Love, UGraduate, added on MCC cluster, 09/26/2022
Computational methods:
We use a range of bioinformatics tools for sequence analysis, phylogenetic analyses, high throughput read mapping and downstream analyses. Many projects are conducted using on desktop computers using commercially-available software (primarily CLC Genomics Workbench), but we work with some datasets that exceed the computing power of the most powerful desktops. For example, current projects begin with raw data files of between 100 and 300 GB in size, files that CLC and command line tools often cannot handle owing to memory and disk storage limitations.
Software:
Listed on the HPC website (these are standard bioinformatics packages that we have used from time to time, and I expect may use through the HPC):
Anaconda
Augustus
Augustus-braker
Bcftools
Bcl2fastq2
Beagle-lib
Beast
Bedtools
Bioconductor-DESeq
Bioconductor-edgeR
BLAST
BLAT
Bonito-0.0.8
Bowtie
BRAKER
BUSCO
BWA
CD-HIT
Cell Ranger
Cufflinks
Cutadapt
FastQC
GATK
HISAT2
HTSeq
HyPhy
Java
MACS2
Maker
MapSplice
MashMap
Minoconda3
Minimap2
MrBayes
NumPy
Pocard
R
Rsem
Samtools
SRA toolkit
Star
StringTie
TopHat
Trinity
VCFtools
If possible, I would like to install a few additional packages (especially DEXSeq, but also one or two that have been developed in my lab).
Collaborator:
University of Texas Rio Grande Valley – Dr. Manohar Chakrabarti (Manohar Chakrabarti, manohar.chakrabarti@utrgv.edu).
Grants:
Publications:
Center for Computational Sciences