Liu, Jinze


Transcriptome analyses using RNA-seq data

This project consists of a series of components focusing on several major research questions in genome/transcriptome analyses using high-throughput RNA sequencing data. Current components include RNA-seq read mapping, transcript reconstruction, transcript expression quantification, gene fusion study and differential transcription analyses. The goal is to study the mechanism of gene expression and transcription and to seek for factors associated with cell development and diseases. Data being analyzed includes RNA-seq data sets from human breast cancer projects and horse genome projects. The research activities is supervised by Dr Jinze Liu (Associate Professor from Computer Science) and is primarily conducted by Yin Hu (PhD student from Computer Science), Yan Huang (PhD student from Computer Science) and Zheng Zeng (staff from Computer Science). The software packages being used include several open-source packages developed by Dr Jinze Liu's group, such as MapSplice and DiffSplice, and other publicly available software packages such as SAMtools, Cufflinks and RSEM. Customized C/C++, Python and Shell scripts may also be used throughout the analyses.

Analyses of clinical exome sequencing data

The goal of this project is to identify genomic alterations (such as point mutations, insertions, deletions, and copy number variations) associated with a series of clinical outcomes in estrogen receptor (ER)-positive breast cancer patients based on whole exome sequencing data. Patient samples were collected by Drs. Suleiman Massarweh (Associate Professor from Markey Cancer Center) and Esther P Black (Associate Professor from Pharmaceutical Sciences). Exome sequencing was conducted by Otogenetics (Norcross, GA). Data analyses are being performed using HPC by Yin Hu (PhD student from Computer Science) and Zheng Zeng (staff from Computer Science) under the direction of Drs Jinze Liu (Associate Professor from Computer Science) and Chi Wang (Assistant Professor from Markey Cancer Center). The data analysis pipeline involves sequence alignment, SNP and genotype calling, and association analysis with clinical outcomes. All the analyses are based on publicly available software including BWA, Picard, GATK as well as customized C, JAVA and R scripts.

Transcript quantification using RNA-seq data

For human, over 95% of protein coding genes are alternatively expressed. This project aims at estimating the expression levels of these alternatively expressed transcripts in the transcriptome.

Personnel:

PI: Dr. Jinze Liu; phD

Caylin Hickey, Staff

Students:

Yan Huang
Yin Hu
Eamonn Magner

Software developed:

MultiSplice

Software

C, C++, Matlab

Publications:

A Robust Method for Transcript Quantification with RNA-Seq Data. Yan Huang, etc. JOURNAL OF COMPUTATIONAL BIOLOGY Volume 20, Number 3, 2013. Mary Ann Liebert, Inc. Pp. 167–187
DOI: 10.1089/cmb.2012.0230

Funding:

Grant: NSF (CAREER award grant number 1054631 to J.L.); the (ABI/EF grant number 0850237 to J.L. and J.F.P.), and NIH (grant number P20RR016481 to J.L.)

Transcript Assembly using RNA-seq data

Develop a method for transcript reconstruction in the transcriptome to detect novel transcripts and complete the transcripts database.

Personnel:

PI: Dr. Jinze Liu

phD students:

Yan Huang
Yin Hu
Eamonn Magner

Software developed:

Astroid

Software:

C, C++

Publications:

Isoform reconstruction through Molecule-level inference. In preparation.

Funding:

Grant: NSF (CAREER award grant number 1054631 to J.L.); the (ABI/EF grant number 0850237 to J.L. and J.F.P.), and NIH (grant number P20RR016481 to J.L.)

Differential expression analysis on the transcriptome

Detect differentially expressed transcripts between diseased cells and normal cells.

Personnel:

PI: Dr. Jinze Liu
PhD students:
Yin Hu
Yan Huang
Eamonn Magner

Software developed:

DiffSplice

Software:

C, C++

Publications:

DiffSplice: the Genome-Wide Detection of Differential Splicing Events with RNA-seq. Yin Hu, etc. Nucleic Acid Research, 2012.
doi: 10.1093/nar/gks1026.

Funding:

Grant: NSF (CAREER award grant number 1054631 to J.L.); the (ABI/EF grant number 0850237 to J.L. and J.F.P.), and NIH (grant number P20RR016481 to J.L.)

Genomics in Cancer for the Appalachian population of Kentucky


The goal of this project is to identify genomic alterations (such as point mutations, insertions, deletions, and copy number variations) in lung cancer patients from Appalachian Kentucky based on whole exome sequencing data. The project is lead by Dr Susanne Arnold (Professor from Markey Cancer Center). Data analyses will be performed using HPC by Jinpeng Liu (MS student from Computer Science) under the direction of Drs Jinze Liu (Associate Professor from Computer Science) and Chi Wang (Assistant Professor from Markey Cancer Center). The data analysis pipeline involves sequence alignment, SNP and genotype calling, and association analysis with clinical outcomes. All the analyses are based on publicly available software.

PI Lead on Project

Dr. Susanne Arnold, Markey Cancer Center

Students

Jinpeng Liu, Graduate

Collaborators

Dr. Jinze Liu, Computer Science
Dr. Chi Wang, Markey Cancer Center

Software

* Cutadapt (https://code.google.com/p/cutadapt/)
* BWA (http://bio-bwa.sourceforge.net/)
* SAMtools (http://samtools.sourceforge.net/)
* Picard(http://picard.sourceforge.net/)
*GATK(http://www.broadinstitute.org/gatk/index.php)
* SnpEFF(http://snpeff.sourceforge.net/)

Publications


Research project in Bioinfromatics and NGS Analysis


Students

Xinan Liu - Graduate
Luan Pham - Graduate
Tawfig M Salem - Graduate
Joel Lowery - Graduate
Eamoon Magner, Graduate

Publications

2014

  1. The Cancer Genome Atlas Research Network, “Comprehensive Molecular Profiling of Lung Adenocarcinoma”, to be published, Nature, July 31, 2014.
  2. Huang Y*, Hu Y*, Liu J, “Piecing the puzzle together: a revisit to transcript reconstruction problem in RNA-Seq”, accepted, BMC Bioinformatics, 2014.

2013

Bioinformatics

  1. Huang Y, Hu Y, Jones CD, MacLeod JN, Chiang DY, Liu Y, Prins JF, and Liu J, "A Robust Method for Transcript Quantification with RNA-seq Data”, Journal of Computational Biology, 2013, 20(3): 167-187.
  2. Hu Y, Huang Y, Du Y, Orellana CF, Singh D, Johnson AR, Monroy A, Kuan PF, Hammond SM, Makowski L, Randell SH, Chiang DY, Hayes DN, Jones C, Liu Y, Prins JF, Liu J, "DiffSplice: the genome-wide detection of differential splicing events with RNA-seq", Nucleic Acids Res. 2013 Jan;41(2):e39.
  3. Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, Alioto T, Behr J, Bohnert R, Campagna D, Davis CA, Dobin A, Gingeras TR, Harrow J, Jean G, Kosarev P, Li S, Liu J, Mason CE, Molodtsov V, Ning V, Ponsting H, Prins JF, Ribeca P, Seledtsov I, Solovyev V, Valle V, Vitulo V, Wang K, Wu TD, Zeller G, Rätsch G, Goldman N, Hubbard TJ, Harrow J, Guigó R, Bertone P, “Systematic evaluation of spliced aligners for RNA-seq data”, Nature Methods, doi:10.1038/nmeth.2722, PMID: 24185836, Nov. 2013.
  4. Cabanski CR, Wilkerson MD, Soloway M, Parker JS, Liu J, Prins JF, Marron JS, Perou CM, Hayes DN. "BlackOPs: increasing confidence in variant detection through mappability filtering." Nucleic Acids Res. 2013 Aug 8.
  5. Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff MF, Sharpless NE. “Circular RNAs are abundant, conserved, and associated with ALU repeats”, RNA 19(2):141-57, doi:10.1261/rna.035667.112, PMID: 23249747, Feb. 2013.
  6. Coleman SJ, Zeng Z, Hestand MS, Liu J, Macleod JN, "Correction: Analysis of Unannotated Equine Transcripts Identified by mRNA Sequencing", PLoS One. 19;8(9). Sep. 2013
  7. Schardl CL, Young CA, Hesse U, Amyotte SG, Andreeva K, Calie PJ, Fleetwood DJ, Haws DC, Moore N, Oeser B, Panaccione DG, Schweri KK, Voisey CR, Farman ML, Jaromczyk JW, Roe BA, O'Sullivan DM, Scott B, Tudzynski P, An Z, Arnaoudova EG, Bullock CT, Charlton ND, Chen L, Cox M, Dinkins RD, Florea S, Glenn AE, Gordon A, Güldener U, Harris DR, Hollin W, Jaromczyk J, Johnson RD, Khan AK, Leistner E, Leuchtmann A, Li C, Liu J, Liu J, Liu M, Mace W, Machado C, Nagabhyru P, Pan J, Schmid J, Sugawara K, Steiner U, Takach JE, Tanaka E, Webb JS, Wilson EV, Wiseman JL, Yoshida R, Zeng Z, "Plant-symbiotic fungi as chemical engineers: multi-genome analysis of the clavicipitaceae reveals dynamics of alkaloid loci", PLoS Genet. 2013;9(2):e1003323.

2012

Bioinformatics

  1. Huang Y, Hu Y, Jones CD, MacLeod JN, Chiang DY, Liu Y, Prins JF, and Liu J, "A Robust Method for Transcript Quantification with RNA-seq Data”, 16th Annual International Conference on Research in Computational Molecular Biology (RECOMB), 127-147, Barcelona, Spain, April, 2012. (acceptance rate <15%, 31/200+)
  2. The Cancer Genome Atlas Research Network (Over 150 co-authors), “Comprehensive genomic characterization of squamous cell lung cancers”, Nature, 489(7417):519–525, DOI:10.1038/nature11404, PMID: 22960745, 27 September 2012.
  3. Fardo DW, Liu J, Demeo DL, Silverman EK, and Vansteelandt S, “Gene-environment interaction testing in family-based association studies with phenotypically ascertained samples: a causal inference approach”, Biostat 13(3): 468-481, 2012.

2011

Bioinformatics

  1. Singh D, Orellana CF, Hu Y, Jones CD, Liu Y, Chiang DY, Liu J, and Prins JF, “FDM: A Graph-based Statistical Method to Detect Differential Transcription using RNA-seq Data”, Bioinformatics, 2011. 27(19): 2633–2640, DOI: 10.1093/bioinformatics/btr458. (Also presented in ISMB hitseq 2011)
  2. Fardo DW, Druen AR, Liu J, Mire L, Infante-Rivard J, and Breheny P, “Exploration and comparison of methods for combining population- and family-based genetic association using the Genetic Analysis Workshop 17 mini-exome”, BMC Proceedings, 2011, 5(Suppl 9):S28. DOI:10.1186/1753-6561-5-S9-S28.


2010

Bioinformatics

  1. Hu Y, Wang K, He X, Chiang DY, Prins JF, Liu J. A Probabilistic Framework for Aligning Paired-end RNA-seq Data. Bioinformatics 2010; doi: 10.1093/bioinformatics/btq336. Also presented HiTSeq special interested group, ISMB 2010.
  2. Wang K, Singh D, Zeng Z, Huang Y, Coleman S, He X, Perou C, MacLeod JN, Chiang YD, Prins JF, Liu J. MapSplice: Mapping RNA-seq Reads for Splice Junction Discovery. Nucleic Acids Research Methods Online, 2010, doi:10.1093/nar/GKQ622.
  3. Coleman SJ, Zeng Z, Wang K, Luo S, Khrebtukova I, Mienaltowski MJ, Schroth GP, Liu J, MacLeod JN. Structural Annotation of Equine Protein-coding Genes Determined by mRNA-sequencing, Animal Genetics, Volume 41, Issue Supplement s2, pp 121–130, 2010, DOI: 10.1111/j.1365-2052.2010.02118.x
  4. Huggins P, Li W, Haws D, Friedrich T, Liu J, Yoshida R. Bayes estimators for phylogenetic reconstruction. Systematic Biology. In press. Available at http://arxiv.org/abs/0911.0645.(external link)
  5. Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, and Tropsha A. Functional Neighbors: Relationships between Non-homologous Protein Families Inferred Using Family-Specific Fingerprints. IEEE Transcations on Information Technology in Biomedicine, 2010, in press.


Grants

Liu, Jinze IIS-1054631 "CAREER:Algorithms and Applications for Next Generation High-Throughput Sequencing Technologies" $503,509 National Science Foundation 4/15/2011 3/31/2016
Liu Jinze 5-32779 Unlocking transcript diversity via differential analyses of splice graphs University of North Carolina 5/23/2012 - 3/31/2015 $450,559


Center for Computational Sciences