Zhang, Hongbin

Hongbin Zhang -- Department of Biostatistics, College of Public Health

Research Overview

My methodological research in Statistical Computing has been on the handling of data complexities such as measurement errors, missing data, truncated data, and various forms of censored data in longitudinal and cohort studies. I explore methods that can simultaneously address multiple complexities, intending to balance accuracy and computational intensity through approaches such as maximum likelihood, approximated likelihood and other methods, including Bayesian. In what below, I list two ongoing projects.

HIV Surveillance Funded by NIH R21 (R21AI147933) where I am the PI, I have been developing joint modeling methods to estimate antiretroviral therapy (ART) initiation time, leveraging the HIV Registry data of New York City (NYC). I use a random change-point model under a mixture model framework to address the issues such as left-truncation, missing data and population heterogeneity. The findings will be informative to the jurisdictions across the United States on setting or adjusting HIV care policies and HIV treatment. There are number of computational challenges for the sample size of 30,000 people living with HIV in NYC from 2006 to 2015. For example, besides the Monto Carlo Expectation and Maximization (MCEM) algorithm and the linearization approaches, we (joined by my postdoctoral fellow Dr. Binod Manandhar who recently secured a faculty position at Clark Atlanta University) are also exploring the other variant of EM algorithms, such Stochastic version of EM (StEM) and Stochastic Approximation of EM (SAEM).

Neurocysticercosis Funded by an NIH R03 (R03NS111189-01A1) where I am one of MPI, we have been working on the issues of clustering, interval-censoring, left-censoring and loss to follow-up using a data from a randomized clinical trial conducted in Ecuador for patients with neurocysticercosis. We have completed some work using multistate modelling, but few extensions are foreseeable, e.g., utilizing competing risk models for missing data as well as approaches that relax the Markov assumption.

Software:

R, SAS, Stata, Python and C language.

Computational Needs:

Multi-core computing environment which supports the submitting of multiple (100 or so) simulations (same code with different seed) at same time.