Repository logo
 

Computational Methods for the Measurement of Protein-DNA Interactions


Type

Thesis

Change log

Authors

James, Daniel Peter 

Abstract

It is of interest to know where in the genome DNA binding proteins act in order to effect their gene regulatory function.

For many sequence specific DNA binding proteins we plan to predict the
location of their action by having a model of their affinity to short DNA
sequences. Existing and new models of protein sequence specificty are
investigated and their ability to predict genomic locations is evaluated.

Public data from a micro-fluidic experiment is used to fit a matrix model of
binding specificity for a single transcription factor. Physical association
and disassociation constants from the experiment enable a biophysical
interpretation of the data to be made in this case. The matrix model is
shown to provide a better fit to the experimental data than a model
initially published with the data.

Public data from 172 protein binding micro-array experiments is used to fit
a new type of model to 82 unique proteins. Each experiment provides
measurements of the binding specificity of an individual protein to
approximately 40000 DNA probes. Statistical, `DNA word', models are assessed
for their ability to predict held back data and perform very well in many
cases.

Where available, ChIP-seq data from the ENCODE project is used to assess the
ability of a selection of the DNA word models to predict ChIP-seq peaks and
how they compare to matrix models in doing so. This $\textit{in vitro}$ data
is the closest proxy to the true sites of the proteins' regulatory action
that we have.

Description

Date

2017-02-01

Advisors

Hubbard, Tim
Down, Thomas

Keywords

computational biology, protein binding microarry, binding, prediction

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge