Computational Methods for the Measurement of Protein-DNA Interactions
Repository URI
Repository DOI
Change log
Authors
James, Daniel Peter
Abstract
It is of interest to know where in the genome DNA binding proteins act in order to effect their gene regulatory function.
For many sequence specific DNA binding proteins we plan to predict the
location of their action by having a model of their affinity to short DNA
sequences. Existing and new models of protein sequence specificty are
investigated and their ability to predict genomic locations is evaluated.
Public data from a micro-fluidic experiment is used to fit a matrix model of
binding specificity for a single transcription factor. Physical association
and disassociation constants from the experiment enable a biophysical
interpretation of the data to be made in this case. The matrix model is
shown to provide a better fit to the experimental data than a model
initially published with the data.
Public data from 172 protein binding micro-array experiments is used to fit
a new type of model to 82 unique proteins. Each experiment provides
measurements of the binding specificity of an individual protein to
approximately 40000 DNA probes. Statistical, `DNA word', models are assessed
for their ability to predict held back data and perform very well in many
cases.
Where available, ChIP-seq data from the ENCODE project is used to assess the
ability of a selection of the DNA word models to predict ChIP-seq peaks and
how they compare to matrix models in doing so. This $\textit{in vitro}$ data
is the closest proxy to the true sites of the proteins' regulatory action
that we have.
Description
Date
2017-02-01
Advisors
Hubbard, Tim
Down, Thomas
Down, Thomas
Keywords
computational biology, protein binding microarry, binding, prediction
Qualification
Doctor of Philosophy (PhD)
Awarding Institution
University of Cambridge