University of Hawaii

Electrical Engineering

Latent Variable Identification using Identifiable Matrix Factorization Methods

Date: 2018-03-22           Add to Google Calendar
Time: 10:00am
Location: Holmes Hall 389
Speaker: Dr. Kejun Huang


Latent variable identification is a unifying problem formulation technique for unsupervised machine learning and big data analytics. Interesting applications include topic modeling, community detection, hyperspectral unmixing, to name just a few. Identifiability arises as a fundamental issue since it amounts to answering whether the latent structure can truly be learned without the help of labeled data. Among many approaches that have identifiability guarantees, this talk focuses on nonnegative matrix factorization (NMF)-type methods. NMF is widely and successfully used in many applications, but a theoretical understanding on why it is able to identify latent variables used to be very limited. The take-home point of this talk is that a latent variable can be uniquely identified if it is sufficiently scattered, an assumption inspired by convex geometry, using either plain NMF model or in addition with a "volume" regularization. This principle is demonstrated in the application of hidden Markov model (HMM) identification, which shows that a HMM can be uniquely identified from the pairwise co-occurrences, which is particularly suitable for applications where the possible outcomes of the observations is relatively large, for example in topic modeling. We show that we can learn topics with higher quality if documents are modeled as observations of HMMs sharing the same emission (topic) probability, compared to the simple but widely used bag-of-words model.


Kejun Huang received the Ph.D. degree in Electrical Engineering from the University of Minnesota, Minneapolis, MN, USA in 2016. He is currently a Postdoctoral Associate at the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA. His research interests include signal processing, machine learning, big data analytics, and optimization, with special focus on identifiability analysis and non-convex algorithm design for latent variable models.