Smith, Evelyn

Research Activities

The field of machine learning and AI is witnessing a rapid evolution, particularly in integrating human-in-the-loop (HITL) systems, addressing data poisoning threats, enhancing cybersecurity, and advancing unsupervised learning techniques. Our integrated approach aims to create robust, secure, and highly adaptable AI systems, crucial for applications in diverse sectors such as defense, healthcare, finance, and autonomous systems.

Our research focuses on highlighting issues of human vulnerabilities in training ML/AI systems by testing supervised learning, unsupervised learning, and Human-in-the-loop (HITL) systems against various tasks.

Human-in-the-loop (HITL) systems are pivotal in enhancing the performance and reliability of machine learning models. By incorporating human expertise and feedback into the learning process, HITL systems can refine model predictions and improve decision-making accuracy, especially in complex and high-stakes environments. The interaction between human experts and AI systems necessitates the development of intuitive interfaces and adaptive algorithms that can effectively learn from and respond to human input. This symbiotic relationship enhances the transparency and interpretability of AI models, ensuring that human oversight remains a key component of critical decision-making processes.

However, the efficacy of HITL systems and other machine learning models is increasingly challenged by data poisoning attacks. These attacks involve the deliberate introduction of malicious data into the training sets, which can significantly degrade model performance or lead to incorrect predictions. Research in this area focuses on identifying and mitigating these vulnerabilities through robust detection mechanisms and resilient algorithm design. Techniques such as anomaly detection and adversarial training are essential in safeguarding the integrity of the training data and, consequently, the model's reliability.

Cybersecurity plays a crucial role in protecting machine learning systems from various threats, including data breaches, adversarial attacks, and model inversion. Ensuring the security of AI systems is paramount, particularly as they are integrated into sensitive applications. Research efforts are directed towards developing secure architectures, employing cryptographic techniques, and establishing frameworks for continuous monitoring and threat assessment. The emphasis on data privacy, especially in federated learning where data is decentralized, is also a critical aspect of this research. Protecting the confidentiality and integrity of data ensures the trustworthiness of AI systems in handling sensitive information.

Unsupervised learning, which involves training models on unlabeled data to discover underlying patterns, is a significant area of research within this integrated approach. This technique is invaluable when labeled data is scarce or expensive to obtain. Advanced algorithms such as autoencoders, generative adversarial networks (GANs), and self-organizing maps are explored to perform tasks like clustering, dimensionality reduction, and anomaly detection. The challenge lies in evaluating the performance of unsupervised models, as traditional metrics from supervised learning do not directly apply. Innovations in unsupervised learning have the potential to unlock new insights in fields ranging from bioinformatics to natural language processing, where the ability to derive meaningful information from large datasets is essential.

The convergence of HITL systems, data poisoning prevention, cybersecurity, and unsupervised learning represents a comprehensive strategy for advancing machine learning. By addressing the inherent vulnerabilities and enhancing the robustness of AI models, researchers aim to develop systems that are secure, adaptable, and capable of high performance across various applications. This holistic approach is poised to drive significant advancements in machine learning, making AI systems more reliable and effective in tackling real-world challenges.

 

List of Projects

We are working on three projects at this time.

The first project is a testing of machine learning models against industry standard public datasets and datasets provided by partner universities ([https://omnisoc.iu.edu/about/index.html|https://omnisoc.iu.edu/about/index.html]) for their performance at different level of human involvement. This will include attributes of data poisoning, data distribution changes and machine learning architecture changes. The performance outcomes of these AI models will be judged against not only their mathematical preference differentials but also for their cybersecurity, data privacy and risk management attributes. We will be applying to conferences and publications to release our results. We will open source our results and software once we are accepted for a conference or publication under an Apache 2.0 open-source license.

The second project builds off the first project by injecting synthetic datasets into our analysis. These synthetic datasets will be informed by game-theoretic models based on capitalist market assumption and cybersecurity dynamics in an agent-based modeling framework. Utilizing these synthetic datasets, we hope to prove various game theoretic proofs that generate high quality business strategy recommendations for business leaders to use when adopting AI and machine learning technologies into government regulated markets where cybersecurity, data privacy and risks management is of high importance. Industry markets out theory will inform such industries as defense, healthcare, finance, and autonomous systems. We will open source our code results and software once we are accepted for a conference or publication under an Apache 2.0 open-source license.

The third project is about generative multimodal foundational AI models (LLMs) and red-teaming them for various cybersecurity, ethical, and social issues. Hate-speech detection, issues of AI bias and fairness, data privacy and cybersecurity issues are common associated problems with generative AI models. We plan to test foundational AI technologies and create performance metrics based on the issues found in these AI models. The metrics can then be used and generalized for business purposes for specific industry tasks to recommend business strategies to firms looking to adopt generative AI. We will open source our code results and software once we are accepted for a conference or publication under an Apache 2.0 open-source license.

 

Computational Methods

The computational methods that will be used are all open source and currently available freely. All computational methods are compatible with UK resources. Examples of open source software libraries we currently or in the future anticipate to use include Pandas, TensorFlow, PyTorch, and many more.

 

List of Software

The list of software we will be using is constantly change based on the open source ecosystem. As a basic overview we will be using Python 3.6 as a programing language, which may call other programming languages (C++, C, Java, etc.), the Jupyter project, the Anaconda project, and many software libraries. All software we will be using is open source and will not results in a licensing cost.

Group Members

Evelyn Smith, PhD (Assistant Professor, Department of Marketing and Supply Chain, Gatton College of Business and Economics, University of Kentucky)

Antino Kim, PhD (Associate Professor, Department of Operations & Decision Technologies, Kelley School of Business, Indiana University) - Currently working with the NSF funded OmniSoc Project at University of Indiana where we will be receiving some of our research data from.

Samuel Zaruba S., PhD(c) (Lecturer, Department of Accountancy and the Business Analytics Center, Gatton College of Business and Economics, University of Kentucky)

Sean Robinson, PhD (US Gov Pacific Northwest National Laboratory) - Government Advisor to the Project

Aaron McCloud - Senior Software Engineer

 

 

Center for Computational Sciences