Yin, Xiangrong*

Not a current user.


Introduction

A. My research group as so-called, Data Science Lab, focuses on sufficient dimension reduction (SDR) and sufficient variable selection (SVS). The goal of SDR and SVS is going to reduce large number of predictors to a few of linear combinations, and if possible, select informative predictor variables related to a response or responses. This area has wide applications such as supervised learning in computer science, identifying important genes in biology, discovering important factors in environmental science, and finding effects in business data. Especially it is closely related to big data, in which the number of predictors is particularly big, or sample size is huge.

We develop general methodology in SDR and SVS that requires machines with high-power of computing ability. Our methods will use model-free approaches such as Distance Covariance (DC). Note that model-free approaches are general in the sense more applicable to wide scope of data. However, its use of computing is extensive, for instance we use permutation test, which requires large iterations. Along this side, we also need to solve optimization problems with nonlinear constraint and penalized algorithm. This add extra computing as we need to select tuning parameters by using information criteria such as AIC and BIC, etc.

We have adjusted our code on desktop and laptop, and now we need to do large scale of simulations to demonstrate the usefulness of our proposed methods.

Projects

B. 1. Our first project is to develop a new informational index, which needs sample size up to 500, permutation test of 1000, but we need to do 10000 times.
2. Our second project is to develop a SDR method, which is based on the index in the first project above. This will involve nonlinear constraint problem. Matlab software is required.
3. Our third project is to develop a SVS procedure based on the index in the first project. This again will involve large scale of permutation tests.
4. Our fourth project is to develop an efficient method based on project 2 but adding a penalty of L1 constraint so that SDR and SVS can be achieved simultaneously.

C. We need to use optimization under nonlinear constraint. The one that is related to SVS is L1 constraint problem, which is available in R; the other that is related to SDR is L2 constraint problem, which is so far only available in MatLab.

Software

D. At this point, the main software we will use are R, and MatLab; However, we hope that we can access Fortran and C++, as we need to access other researchers algorithms which may be written in these two languages.

Students and Staff

E. The Lab has:
Director: Xiangrong Yin at UKY;
Student Qingocong Yuan at UKY;
Grad Student: Jin Xie at UKY

Center for Computational Sciences