/
Zhang, Xiaohua Douglas

Zhang, Xiaohua Douglas

Research Activities

This group employs various machine learning to identify biomarkers for detecting various medical conditions, focusing on early diagnosis and personalized treatment. By analyzing diverse datasets—genomic, proteomic, and clinical—we hope to uncover novel biomarkers that signal disease presence and progression. In one of our projects using the HPC, the primary aim is to identify reliable biomarkers or combinations of biomarkers that can be used to discriminate between patients with periodontitis and healthy patients. By applying various machine learning algorithms, we seek to determine the most accurate and robust predictive models. This involves training these algorithms on biomarker data combined with clinical labels for periodontitis status, allowing the models to uncover patterns and interactions that indicate the presence of the disease. Further research will involve a comprehensive comparison of various machine learning algorithms to determine which models are best suited for predicting periodontitis. This comparison will include traditional models as well as advanced techniques like ensemble and neural network models. Additionally, a separate branch of the project will use machine learning to analyze patient profiles, identifying which individuals are most responsive to specific treatments. By examining demographic, behavioral, and biomarker data, the models aim to uncover patterns that indicate treatment efficacy across different patient subgroups. In another project of ours using HPC, we will need to run various machine learning methods including xgboost, random forest, artificial neural network, partial least square discrimination analysis to analyze cytokine profiling data. One more example of our projects to use HPC is that we need to analyze the huge amount of data generated by wearable device such as continuous monitoring glucose device for diabetes research. Various machine learning methods need to be run in HPC.

List all projets

  1. saliva biomarker identification for dental diseases

  2. continuous glucose monitoring analysis for diabetes research

  3. cytokine profiling analysis for diabetes and dental research

  4. single cell RNA-seq analysis for diabetes research

  5. high-throughput antibody screening for diabetes prevention

Computational methods

All computational methods will be done in either R or Python using the packages that are found in both software programs. The computational methods are Xgboost, Adaboost, logitboost, artificial neural network, random forest, PLS-DA, k-means and hierarchical clustering and they are available in UKY. One method that requires HPC in one current project is Boosting. Boosting, as an ensemble technique, is invaluable for its ability to improve predictive accuracy by sequentially training weak models, with each new model learning from the errors of its predecessor. Advanced boosting methods like XGBoost, AdaBoost, and LogitBoost build a series of learners whose combined predictions deliver a more robust output. However, this approach is inherently resource intensive. Each boosting method requires many iterations, with XGBoost adding extra computational demands due to regularization and tree pruning complexities. Furthermore, hyperparameter tuning, essential for optimizing model performance, compounds the runtime requirements significantly.

List all Software

Software will be either R or Python

UKY Collaborators

Dr. Craig Miller. Dr. Simon Fisher. Dr. Barbara Nikolajczyk. Dr. Charlotte Peterson. Dr. Jean Fry. Dr. Ila Mishra

Center for Computational Sciences