University of Hawaii

Electrical Engineering

Analytic Pattern Matching: From DNA to Twitter

Date: 2016-02-25           Add to Google Calendar
Time: 2:30pm 4:00pm ROOM CHANGE!! 243
Location: Holmes Hall 243 -- NOTE ROOM CHANGE!
Speaker: Wojciech Szpankowski, Saul Rosen Distinguished Professor of Computer Science and Electrical and Computer Engineering, Purdue University

Repeated patterns and related phenomena in words are known to play a central role in many facets of computer science, telecommunications, coding, data compression, data mining, and molecular biology. One of the most fundamental questions arising in such studies is the frequency of pattern occurrences in a given string known as the text. Applications of these results include gene finding in biology, executing and analyzing tree-like protocols for multiaccess systems, discovering repeated strings in Lempel--Ziv schemes and other data compression algorithms, evaluating string complexity and its randomness, synchronization codes, user searching in wireless communications, and detecting the signatures of an attacker in intrusion detection. In this talk after a brief motivation, we review several pattern matching problems (e.g., exact string matching, constrained pattern matching, generalized pattern matching, and subsequence pattern matching), and then we discuss a few applications (e.g., spike trains of neuronal data, Google search, Lempel-Ziv'77 and Lempel-Ziv'78 data compression schemes, and string complexity used in Twitter classification). Finally, we illustrate our approach to solve these problems using tools of analytic combinatorics, which we discuss in some depth.

Wojciech Szpankowski is Saul Rosen Distinguished Professor of Computer Science and Electrical and Computer Engineering at Purdue University. He received his M.S. and Ph.D. degrees in Electrical and Computer Engineering from Gdansk University of Technology. He held several Visiting Professor/Scholar positions, including Stanford, Hewlett-Packard Labs, INRIA, Ecole Polytechnique, the Newton Institute, Cambridge, UK, and ETH, Zurich. He is a Fellow of IEEE, the Erskine Fellow, and 2010 recipient of the Humboldt Research Award. His research interests cover analysis of algorithms, information theory, bioinformatics, analytic combinatorics, and stability problems of distributed systems. He published the book "Average Case Analysis of Algorithms on Sequences", John Wiley & Sons, 2001. Szpankowski has been a guest editor and an editor of several technical journals, including ACM Transaction on Algorithms, Algorithmica, IEEE Transactions on Information Theory, and Combinatorics, Probability, and Computing. In 2008 he launched the interdisciplinary Institute for Science of Information whose mission is to extend classical information theory to modern settings, including knowledge discovery and information extraction from massive datasets. He is the Director of the NSF Science and Technology Center for Science of Information.