A statistical model relating transcription factor concentrations to positional information in the early Drosophila embryo
The idea of morphogen gradients encoding positional information for a developing organism has long been discussed in the field of developmental biology, but only recently have quantitative models been proposed that relate measured transcription factor concentrations to enhancer activity. However, successful models are typically computationally time-consuming, thus limiting full exploration and interpretation of the data. This thesis addresses these problems using standard statistical techniques applied to a comprehensive data set with the even skipped (eve) locus as a test case. The first part of the thesis introduces the data set. This is the precellular Virtual Embryo from the Berkeley Drosophila Transcription Network project. It comprises expression measurements of almost 100 genes in more than 6,000 individual nuclei at six time points. Different modelling approaches are evaluated in the context of this data set leading to a justification of logistic regression and the methods used to prepare the data set for further analysis. The second part applies logistic regression to describe the response of the eve enhancers to known regulating transcription factors such as Hunchback. Predictions of behaviour under regulator perturbation are consistent with experimental results and the functional form is shown not to be arbitrarily flexible, both in terms of the regulators and regions of the embryo included. The third part uses the framework developed above to find minimal explanatory models in the context of statistical model selection. It is found that the best scoring models depend on well-known regulators. The model selection techniques are then extended by directing the process using previous biological observations to analyse the eve 2 and eve 3+7 enhancers. The results are consistent with published research, but suggest specific additional hypotheses for the enhancers’ regulation. Finally, the thesis concludes by proposing a general model of positional information and discussing the biological implications of the results. Overall, the results show how transcriptional control can be allocated to discrete enhancers and that characterising their activity in relatively simple terms is sufficient to explain their precise spatially-defined response to transcription factor concentrations.