Data Discovery and Anomaly Detection Using Atypicality
Date: Tue, September 05, 2017
Time: 3:00pm
Location: Holmes Hall 389
Speaker: Elyas Sabeti, PhD Candidate
Abstract:
One characteristic of modern era is the exponential growth of information, and the ready availability of this information through networks, including the Internet - "Big Data." The question is what to do with this enormous amount of information. One possibility is to characterize it through statistics - think averages. The perspective of our approach is the opposite, namely that most of the value in the information is in the parts that deviate from the average, that are unusual, atypical. The same could be true for venture development and scientific research.
We define atypicality as follows: a sequesnce is atypical if it can be described (coded) with fewer bits in itself rather than using the (optimum) code for typical sequences. Using this definition we introduce an atypicality measure, then we analyze the properties of this measure for both binary and real-valued models. Finally we use our atypicality framework to find anomalies in various sources of Big Data such as heart rate Holter monitoring, DNA, 15 years of stock market and 2 years of oceanographic data to find arrhytmias, viral and bacterial infections, unusual stock market behavior and whale vocalization, respectively.