Xiao, Yang

Research Activities

We will be using the HPC resources to conduct research in designing new distributed computing paradigms, including distribution/decentralized machine learning, decentralized applications, and fault-tolerant consensus mechanisms. Our main research goal is to explore whether these distributed systems can be deployed efficiently.

More specific research areas can be found on PI Xiao's research website: [https://yang-sec.github.io/|https://yang-sec.github.io/]

Projects

PI Xiao will use the HPC resources for the following projects:

Byzantine Resilient Federated Learning in Sporadically Connected Wireless Networks

Status: grant pending from NSF
Synopsis: This project aims to build Byzantine resiliency into Federated Learning (FL) systems, particularly for resource-constrained and sporadically connected networks. The research objectives include (1) a systems approach to derive a principled method to assess the trustworthiness of information from FL nodes, (2) a data-centric approach, aiming to develop robust and efficient learning algorithms that keep adversarial influence on the minimum, ensuring robust FL output even in the face of Byzantine inputs, (3) tackling the most challenging scenario where a group of FL nodes must work in a self-organized fashion to
How HPC will help: using parallel GPU-enabled computing instances to simulate decentralized FL nodes and their networking profile
Students involved: Yue Li (yue.li@uky.edu), GRA @ UK CS; Ifteher Alom (ifteheralom@uky.edu), GRA @ UK CS

CAREER: Foundations of Operational Resilience and Secure Communication for Networked Real-Time Systems

Status: grant pending from NSF
Synopsis: This project aims to develop the operational resilience and communication security foundations to enable the dependable and safe deployment of distributed, networked real-time systems. The primary objectives include: (1) operational resilience and correctness amid component failures and adaptive
adversaries on the distributed networks, (3) secure communication against network attacks at onboard and inter-system levels, (4) real-time assurance of task execution and communication, and (5) scalability and composability to support wide-area deployments.
How HPC will help: using powerful computer instances to run Reinforcement Learning agents for consensus optimization.
Students involved: TBD

Computational Methods

Parallel (5 to 50) GPU-enabled for distributed machine learning applications.
We do not need powerful GPUs like those for generative AI. Regular consumer-grade ones like NVIDIA GeForce RTX 40 series or equivalent will suffice.

Software

PyTorch; TensorFlow

Group Members

Yue Li, GRA, Computer Science Department @ UK
Ifteher Alom, GRA, Computer Science Department @ UK