Repository logo
 

Codes for Synchronization in Channels and Sources with Edits


Type

Thesis

Change log

Authors

Abroshan, Mahed 

Abstract

Edit channels are a class of communication channels where the output of the channel is an edited version of the input. The edits are considered to be deletions and insertions. DNA-based data storage system is one of the motivations for this model. This thesis studies various problems related to edit channel and also edit synchronization problem. Varshamov-Tenengolts (VT) codes are first introduced. These codes can correct a single deletion or insertion and have a linear-time decoder. The problem of efficient encoding of the non-binary version of VT codes is addressed, where a simple linear-time encoding method to systematically map binary message sequences onto VT codewords is proposed. Another model that is studied is segmented edit channels, where we have the additional assumption that the channel input sequence is implicitly divided into segments such that at most one edit can occur within a segment. A code construction is proposed for this model based on subsets of VT codes chosen with pre-determined prefxes and/or sufxes. Also an upper bound is derived on the rate of any zero-error code for the segmented edit channel in terms of the segment length. This upper bound shows that the rate scaling of the proposed codes as the segment length increases is the same as that of the maximal code. Edit synchronization is another problem studied in this thesis. In this model, there are two remote nodes (encoder and decoder), each having a binary sequence. The sequence X, available at the encoder, is the updated sequence and differs from Y (available at the decoder) by a small number of edits. The goal is to construct a message M, to be sent via a one-way error-free link, such that the decoder can reconstruct X using M and Y. A coding scheme is devised for this one-way synchronization model. The scheme is based on multiple layers of VT codes combined with off-the-shelf linear error-correcting codes and uses a list decoder. Motivated by the sequence reconstruction problem from traces in DNA-based storage, the problem of designing codes for the deletion channel when multiple observations (or traces) are available to the decoder is considered. A simple binary and non-binary code is proposed that splits the codeword into blocks and employs a VT code in each block. The availability of multiple traces helps the decoder to identify deletion-free copies of a block, and to avoid mis-synchronization while decoding. The encoding complexity of the proposed scheme is linear in the codeword length; the decoding complexity is linear in the codeword length and quadratic in the number of deletions and the number of traces. The list decoding technique for the proposed code is also considered.

Description

Date

2019-05-09

Advisors

Guillen i Fabregas, Albert

Keywords

Edit channels, Coding for synchronization, VT codes

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge