Theses and Dissertations

On the Compression of Unknown Sources

Add to Google Calendar
Date:  Tue, September 11, 2018
Time:  1:00pm
Location:  Holmes Hall 389
Speaker:  Maryam Hosseini, PhD Candidate

Abstract:

Usually, data is considered as outcome of probability sources and to get insight from data, we need to know more about its underlying distribution. Although it is very helpful to know the precise characterization of the source, most of the time such information is not available. However, we usually know that the underlying distribution is not completely arbitrary and belongs to a general class of models, such as the class of iid or Markov distributions. While universal compression of finite support distributions has been well studied, we look into more involved classes---in particular distributions over countable supports, as well as the relations between compression and estimation in Markov setups without mixing assumptions. In the first part of the dissertation, we investigate "compressibility" of a class of distributions. The exact identity of each distribution in the class is unknown, so we aim to find a universal encoder to compress all distributions in the class. But since the universal encoder does not match exactly to the underlying distribution, the average number of bits we use is higher, and the excess bits used over the entropy is the redundancy. We study the redundancy of universal encodings of strings generated by independent identically distributed (i.i.d.) sampling from a set of distributions over a countable support. We show that universal compression of length-n i.i.d. sequences is characterized by how well the tails of distributions in the collection can be universally described, and we formalize the later as the tail-redundancy of collection. We also consider the redundancy of universally compressing strings generated by a binary Markov source without any bound on the memory. We prove asymptotically matching (in order) upper and lower bounds on the redundancy. Apart from the abstract the analysis of a collection of unknown distributions, we design and implement an algorithm to compress the data obtained from an unknown source. Compression can be lossless or lossy. For lossless compression, Lempel and Ziv proposed a universal implementable algorithm and prove that their algorithm achieves the theoretical bound asymptotically. However, many applications can tolerate such amount of distortion which may allow for additional compression. We adapt Codelet parsing, a lossy Lempel-Ziv type algorithm. It sequentially parses a sequence to phrases which we call sourcelet and maps them to codelet in the dictionary. We develop concept "strong match" and use Cycle Lemma to show that strong match method does not remove the optimal leaves from the tree. We study different properties of this dictionary and monitor how the leaves of the dictionary evolve.

Return to Theses and Dissertations

EE & CENG

Mission Statement

Objectives & Outcomes

ALOHAnet

Contact Us

Giving

Prospective Students

Admission Information

Tuition Information

Current Students

Degree Requirements

Electrical Engineering

Computer Engineering

BAM (BS & MS)

Concentrations

Courses

Course Descriptions

Course Syllabi

Class Schedules

Planned Offerings

EE Technical Electives

CENG Technical Electives

Engineering Breadth

Highlighted Courses

x96 Projects

ECE 296

ECE 396

ECE 496

Instructions for ECE 496

Advising

TA Office Hours

Forms

Scholarships & Financial Aid

Assistantship Opportunities

Prospective Students

How to Apply

Tuition Information

Admitted Students

Local Information

Current Students

Degree Requirements

Master's Degree

Doctoral Degree

BAM (BS & MS)

Courses

Course Descriptions

Course Syllabi

Class Schedules

TA Responsibilities

Student Rights

Forms

Seminar Policy

Scholarships & Financial Aid

Assistantship Opportunities

Biomedical Engineering

Groups & Projects

Related Graduate Courses

Computer Engineering

Groups & Projects

Related Graduate Courses

Electrophysics

Groups & Projects

Related Graduate Courses

Power & Energy

Groups & Projects

Related Graduate Courses

Systems & Data Science

Groups & Projects

Related Graduate Courses

Computer Engineering

Electrophysics

Systems & Data Science

Affiliate / Cooperating

Staff

Latest News

ECE Seminars

Theses and Dissertations

Undergraduate Presentations

Affiliate Events

Conferences