Slow mixing Markov processes
Historically, the field of statistics developed around multiple independent observations of phenomena. Yet progress in modern applications from biology to social networks increasingly requires deeper understanding of observations that are dependent. Dependent observations introduce biases unseen in independent sampling. If not interpreted properly, they often lead to misleading conclusions. This research develops a statistical framework to interpret samples generated by dependent observations. The problem is then turned around to develop algorithms that detect meaningful dependencies in data. Such algorithms are used, for example, to identify genes working together, or find the extent to which social networks are polarized on certain issues.
Stationary properties of Markov processes may not be very well reflected by finite samples, no matter how large the sample size. This bias is often formalized as mixing, and is unseen in independent sampling. This research analyzes Markov samples even before the samples reflect the asymptotic stationary properties, and develops a framework for inference, analysis and simulation of slow mixing Markov processes. Then, algorithmic primitives based on coupling from the past are developed for the widely studied task of community detection in a graph, where vertices are partitioned into clusters so that intra-cluster vertices are connected tighter than those in disparate clusters. As several resampling procedures in machine learning such as cross validation and bootstrap are premised on independent sampling and have no good analogs in the Markov setting, this research develops new resampling approaches for potentially slow mixing Markov processes.
Statistical Framework of Finitely Many Errors
Our work tries to reshape the way we think about some statistical problems. At a high level, consider the nature of scientific discovery. We keep refining theories as we see more data. The natural question therefore is---will the refinements ever end? Will it happen that at some point, we can make up our minds and say with confidence that our inference is good enough and no more data will change it substantially? There are a number of nuances to this broad perspective, and our recent publications listed below summarize some of our latest publications in this line of work. A more complete list of publications is available via my CV here, and for a full list of publications on this topic, please email me. C. Wu and N. Santhanam. Prediction with finitely many errors almost surely. In Proceedings ofThe 24th International Conference on Artificial Intelligence and Statistics, 2021. C. Wu and N. Santhanam. Entropy property testing with finitely many errors’. In Proceedingsof IEEE Symposium on Information Theory, Virtual conference due to covid-19, 2020.
We study the problem of online learning a hypothesis class and a given binary 0-1 loss function, using instances generated independently and identically according to a given distribution. The goal of the online learner is to make only finitely many errors (loss 1) with probability 1 in the infinite horizon. In the binary label case, we show that hypothesis classes are online learnable in the above sense if and only if the class is effectively countable. We extend the results to hypothesis classes where labels can be non-binary. Characterization of non-binary online learnable classes is more involved for general loss functions and is not captured fully by the countability condition even for the ternary label case. In the computational bounded setup, we compare our results with well known results in recursive function learning, showing that the class of all total computable functions is indeed learnable with computable online learners and randomized sampling. Finally, we also show that the finite error guarantee will not be affected even when independent noise is added to the label.Our latest publication on this topic is below, for a more complete listing of results and publications, please email me. C. Wu and N. Santhanam. Non-uniform consistency of online learning with random sampling. InProceedings of the 32nd International Conference on Algorithmic Learning Theory, 2021.