Description Length Analysis for Supervised Learning and Graph Models
Date: Thu, November 30, 2023
Time: 10:30am - 12:00pm
Location: Holmes Hall 389
Speaker: Mojtaba Abolfazli, candidate for PhD, advisor: Dr. June Zhang
Abstract
Model selection and anomaly detection remain central problems in many scientific domains, especially with the continuous growth of volume and variety of data. Furthermore, in today’s data-driven era, machine learning is a versatile tool for different applications such as medical diagnosis, self-driving cars, and generating images/texts. Key to the success of machine learning is model selection, which highlights the importance of the chosen model not only to perform well on training data but also to generalize well to test data. For example, a better image segmentation model in self-driving cars will translate to a safer driving experience.
We adopt an information-theoretic approach to address some of issues with existing methods for model selection and anomaly detection. Our approach relies on description length, measured as the number of bits to describe the data exactly. A well-known example of using description length for data analysis is model selection via minimum description length (MDL) principle. The MDL operates on the principle of lossless coding of data alongside the model describing the data and, finally, selecting the model that yields the shortest description length.
The aim of this research is to extend MDL in new directions, i.e., new types of data and problems. This entails developing new lossless coding methods for graphs as well as combining machine learning with coding. The specific problems we examined include: 1) model selection in supervised learning, 2) Gaussian graphical model selection, 3) group anomaly detection, and 4) structure learning in Bayesian networks.
Biography
Mojtaba Abolfazli is a Ph.D. candidate in the Department of Electrical and Computer Engineering at University of Hawaii at Manoa. His research interests lie in machine learning and the intersection of statistical learning and information theory.
Online available, register for connection info at https://forms.gle/yeGtuLSFYqgbEJg86