Laboratory for Development & Evolution, University Museum of Zoology, Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, UK
Wellcome Trust Sanger Institute, Hinxton, UK
EMBL/CRG Systems Biology Research Unit, Centre de Regulació Genòmica (CRG), Universitat Pompeu Fabra, Dr Aiguader 88, 08003 Barcelona, Spain
Abstract
Background
The use of reverse engineering methods to infer gene regulatory networks by fitting mathematical models to gene expression data is becoming increasingly popular and successful. However, increasing model complexity means that more powerful global optimisation techniques are required for model fitting. The parallel Lam Simulated Annealing (pLSA) algorithm has been used in such approaches, but recent research has shown that island Evolutionary Strategies can produce faster, more reliable results. However, no parallel island Evolutionary Strategy (piES) has yet been demonstrated to be effective for this task.
Results
Here, we present synchronous and asynchronous versions of the piES algorithm, and apply them to a real reverse engineering problem: inferring parameters in the gap gene network. We find that the asynchronous piES exhibits very little communication overhead, and shows significant speedup for up to 50 nodes: the piES running on 50 nodes is nearly 10 times faster than the best serial algorithm. We compare the asynchronous piES to pLSA on the same test problem, measuring the time required to reach particular levels of residual error, and show that it shows much faster convergence than pLSA across all optimisation conditions tested.
Conclusions
Our results demonstrate that the piES is consistently faster and more reliable than the pLSA algorithm on this problem, and scales better with increasing numbers of nodes. In addition, the piES is especially well suited to further improvements and adaptations: Firstly, the algorithm's fast initial descent speed and high reliability make it a good candidate for being used as part of a global/local search hybrid algorithm. Secondly, it has the potential to be used as part of a hierarchical evolutionary algorithm, which takes advantage of modern multicore computing architectures.
Background
The driving aim of systems biology is to understand complex regulatory systems. A powerful tool for this is reverse engineering, a topdown approach in which we use data to infer parameter values for a model of an entire system. This differs from the traditional bottomup approach of building up the larger picture through individually measured simple interactions. Many methods have been developed for reverse engineering of gene regulatory networks, most of which are based on expression data from gene expression microarrays. However, most of these approaches do not consider temporal or spatial aspects of gene expression. Examples of this are methods that infer regulatory modules from expression data across different experimental conditions
There are many systems for which the spatial aspects of gene expression are essential. Even in singlecelled organisms, spatial localisation of regulatory factors is important
Here, we are going to consider a computational technique which allows the inference of explicitly spatiotemporal developmental gene networks. The approachcalled the gene circuit method
One of the main problems with this approach is that both the number of equations and the number of parameters of the modeland thus the time required to run the optimisation for fitting the model to the dataincrease rapidly as more genes are considered. It is thus extremely important to design effficient global optimisation algorithms to keep up with the everincreasing scope of systems biology; efficiency in this case means both directly increasing the speed of the algorithms, as well as allowing the algorithms to run efficiently in parallel. The most commonly used algorithm for fitting gene circuit models has been parallel Lam Simulated Annealing (pLSA)
Methods
The Problem: Drosophila Gap Gene Circuits
One of the few developmental systems that the reverse engineering approach has been applied to so far is segment determination in the early
Mathematical Model
We model the gap gene network using the connectionist gene circuit formalism developed by Mjolsness
The three main terms of the equation correspond to protein production, decay and diffusion. We will discuss each of these terms separately.
The production term is equal to some fraction of the maximum production rate
which takes on values between zero and one.
The diffusion term
The decay term
The half life of each protein is given by ln 2/
The model takes account of cell division. The lengths of interphase and mitosis occur according to a well determined schedule
Quantitative Spatial Gene Expression Data
Quantitative expression data for segmentation genes in the early
Model Fitting by Optimisation
In our reverseengineering approach (the gene circuit method) we wish to find estimates for parameter values, which are best able to explain the data. We can frame this as an optimisation problem in which we attempt to find the set of parameter values
where the sum is over all time classes
is the vector of parameters to be estimated, with a length of
As mentioned above, optimisation for complex problems such as these is nontrivial. The system of equations is nonlinear with a large number of parameters to be estimated, and the fitness landscape is multimodal. A full search is impossible, and a local search (moving downhill until it finds the lowest value of the objective function) is likely to get stuck in a local minimum. Thus, we must use a global optimisation algorithm; we shall compare the parallel Lam Simulated Annealing (pLSA) algorithm with our newly developed parallel Evolution Strategy (both synchronous and asynchronous versions).
To get a value for
Optimisation Algorithms
Search Space Constraints
We do not need or want to search the entire, unbounded parameter space; there are certain values that we know
For the gene network problem, we use a penalty function for the regulatory parameters
where Λ is a control parameter, and
The production, decay and diffusion rates,
Serial Island Evolution Strategy
The evolutionary algorithm we are using is a parallel Island (
The island ES algorithm operates on
We denote the set of all possible individuals as
Selection is performed to produce a set of
Recombination is then performed on the offspring, using a recombination operator
Second,
where
The
for
Next, we mutate the parameters
for
Finally, we apply exponential smoothing to the step sizes, to reduce fluctuations
for
Every
Parallel Island Evolution Strategy
Parallelisation of the serial islandES (iES) relies on running each population on a separate processor. Since selection, recombination and mutation operate strictly within populations, only the migration operation, checking termination criteria and recording information for log files need to be parallelised. The simplest parallelisation of the serial iES is a synchronous parallel islandES (piES). The algorithm is synchronous in the sense that all communication occurs simultaneously across all processes; when migration or other exchange of information is required, each processor halts until all other processes have caught up, and then all information is exchanged. The synchronous algorithm does not modify the behaviour of the serial algorithm, and is deterministic in the sense that serial and parallel runs with the same set of random seeds will produce exactly the same solution.
Migration occurs according to the following scheme: A node designated the master node generates a migration schedule, in which every population is assigned another population to migrate an individual to, and this schedule is broadcast to all nodes. The individual nodes then communicate with each other pointtopoint, with each individual sending the parameter values for its highestranking individual to its designated receiver, and replacing its lowest ranking individual with the best individuals of the population for which it is a designated receiver.
The collection of data related to descent speed and the checking of termination criteria are performed together. Every
The disadvantage of the synchronous algorithm is that processors spend a significant amount of time idle. The asynchronous piES algorithm avoids this by having the processors communicate asynchronously; for migration and other communication, each processor sends information to a memory buffer associated with the process it is communicating with, which can then receive it at a later time (whenever it is ready to receive), avoiding waiting times.
For migration, every
Parallel Lam Simulated Annealing
We use the parallel Lam Simulated Annealing (pLSA) algorithm developed by Chu
To compensate for processes which leave the quasiequilibrium regime (due to the increased rate of temperature decrease compared to the serial case), a mixing of states is performed every
In order to avoid the final solution being affected by the initial conditions, the algorithm performs an 'initial burn', in which each processor spends
There are two types of potential stopping conditions that could be used for the algorithm: the absolute condition, and the freeze condition. In the absolute condition, the algorithm terminates after the absolute mean value of
Algorithm Performance Metrics
All algorithm implementations write the current value of the objective function and running times to log files at regular intervals (every 20 generations for the piESs, and every 10000 iterations for pLSA); these log files are used to calculate mean descent curves with standard errors and 95% ranges for each such interval. These curves can be used to compare the value of the objective function for different algorithms at any time during an optimisation run, and give an estimate of the variability in the algorithm's performance. We choose two target values of the objective function
In order to assess how effective our parallelisation of the Evolution Strategy was, we calculate relative and absolute speedup. The relative speedup is defined as
and represents a measure of the efficiency of the parallelisation in terms of communication overhead (if the relative speedup is equal to the number of processors, then the algorithm does not loose speed due to communication overhead). The absolute speedup is defined as
where
We estimated 95% confidence intervals on absolute and relative speedup using Fieller's theorem
Note that in the Simulated Annealing literature, the value of the objective function is often called the 'energy', and in the Evolutionary Algorithms literature the same value is often referred to as the 'fitness'. To avoid confusion, we use 'value of the objective function' or 'objective value' instead.
Code Implementation
The parallel (
For both algorithms, we used a parameter scrambling procedure to give the problem different starting conditions; the pLSA algorithm reads in initial parameter values, which were randomised prior to starting each instance of the program, while the piES algorithm generates its starting conditions according to a random seed, which is itself randomised for each optimisation run.
Both implementations were compiled using the Intel C++ Compiler (ICC), and both implementations make use of the QLogic implementation of the Message Passing Interface (MPI). Data analysis was performed using the statistical programming language R
Source code is available from the authors upon request.
Optimisation runs were performed on the Darwin parallel cluster of the University of Cambridge High Performance Computing Facility (HPC;
Results
Analysis of the Serial Island Evolution Strategy
The performance of the serial islandES algorithm is affected by the number of islands it uses
We performed 48 optimisation runs each using the serial algorithm with 1, 2, 4, 8, 20 and 50 islands. The number of individuals on every island was kept constant (125), resulting in a metapopulation size of 125 ×
The amount of time required to reach both 'goodenough' and 'good' solutions is shown in Table
Run Times for Serial iES.
N. Islands
Time (Goodenough)
Solutions × 10^{6}
Time (Good)
Solutions × 10^{6}
1
3:35 (±0:54)
2.4 (±0.7)
9:51 (±1:05)
6.8 (±1.7)
2
3:49 (±0:42)
2.6 (±0.6)
10:27 (±1:01)
6.6 (±2.1)
4
4:33 (±1:03)
3.1 (±0.7)
13:24 (±1:00)
9.0 (±2.4)
8
7:30 (±2:08)
5.3 (±1.8)
28:02 (±0:47)
19.8 (±1.6)
20
10:09 (±1:52)
7.2 (±1.2)
31:17 (±0:58)
23.6 (±0.4)
50
16:03 (±1:39)
12.0 (±1.2)
The time taken to reach 'goodenough' and 'good solutions' for the serial Island ES algorithm with different numbers of islands. The number of ODE Solutions is also given. Times are given in hours and minutes (H:M), and the values in parentheses are 95% confidence intervals.
Behaviour of the serial islandES
Behaviour of the serial islandES. (A) The effect of the number of islands on serial algorithm performance. We plot the inverse of the time needed to reach 'goodenough' and 'good' solutions (values of the objective function less than 550000 (blue) and 350000 (red) respectively) against the number islands used. (B) Prediction of the maximum achievable absolute speedup of a piES algorithm across different numbers of islands, calculated by assuming that each island is running on a separate processor, and there is no communication overhead. To achieve this we divide the time needed for an
We calculated the theoretical speed of a perfect parallel algorithm on
Parallelisation Efficiency
To estimate the efficiency of our parallel algorithm, we performed 50 runs each using both synchronous and asynchronous implementations of the piES on 10, 20 and 50 processors. Running times and speedup values are given in Table
Run Times for piES Algorithms.
Algorithm
Time (Goodenough)
Time (Good)
Relative Speedup
Absolute Speedup
Serial iES
 1 island
3:35 (±0:54)
9.51 (±1:05)


Sync piES
 10 nodes
0:56 (±0:07)
3:55 (±0:16)
8.7 (7.59.8)
3.8 (2.94.8)
 20 nodes
0:41 (±0:03)
4:09 (±0:14)
14.9 (12.317.4)
5.2 (4.16.4)
 50 nodes
0:33 (±0:03)
3:40 (±0:16)
29.2 (25.832.6)
6.5 (5.08.0)
Async piES
 10 nodes
0:47 (±0:06)
3:34 (±0:11)
10.3 (9.011.7)
4.6 (3.55.6)
 20 nodes
0:40 (±0:05)
3:44 (±0:13)
15.2 (12.418.1)
5.4 (4.16.7)
 50 nodes
0:25 (±0:02)
3:23 (±0:12)
38.5 (34.242.8)
8.6 (6.710.5)
The time taken to reach 'goodenough' and 'good solutions' for the optimal serial Island ES, and the two parallel piES algorithms, along with the relative and absolute speedup for each parallel algorithm (comparison is between 'goodenough' solutions). Times are given in hours and minutes (H:M) and the values in parentheses are 95% confidence intervals.
Speedup Curves for piES Algorithms
Speedup Curves for piES Algorithms. The relative and absolute speedup curves for the synchronous and asynchronous piES algorithms are shown; the solid black line corresponds to perfect speedup, and the broken black line corresponds to the predicted maximum absolute speedup from Figure 1B. Error bars are 95% confidence intervals on the mean.
The absolute speedup (as defined in equation 12) remains significant regardless of the number of nodes used, showing that the parallel algorithm is always faster than the best serial algorithm. As expected, the absolute speedup is generally lower than the relative speedup: these two measures increasingly diverge as the number of processors increases, reflecting the negative effect of adding islands beyond the optimum. However, the asynchronous algorithm continues to gain speed as more nodes are added all the way up 50 nodes, with the parallel algorithm running on 50 processors nearly 10 times faster in absolute terms than the best serial algorithm.
Comparison of Algorithms
To compare all three algorithms (synchronous and asynchronous piES versus pLSA), we ran 50 pLSA runs each on 10, 20 and 50 processors. Example descent curves for the asynchronous piES and pLSA are shown in Figure
Descent Curves for piES and pLSA Algorithms
Descent Curves for piES and pLSA Algorithms. (A) Mean descent curves for the asynchronous piES and pLSA. (B) Mean descent curves for the asynchronous and synchronous piES. Error bars show 95% confidence intervals on the mean. Coloured regions show the area in which 95% of runs fall. All curves are for 10node runs. Results for 20 and 50node runs were similar (data not shown).
A comparison between the times required to reach the 'goodenough' target show an approximately linear increase in speed with the number of processors for the asynchronous piES (Figure
Number of ODE Solutions for Parallel Algorithms.
Algorithm
Solutions × 10^{6 }(Good enough)
Solutions × 10^{6 }(Good)
Sync piES
 10 nodes
8.4 (±1.1)
34.5 (±3.7)
 20 nodes
12.2 (±1.0)
69.8 (±5.0)
 50 nodes
23.0 (±2.2)
149.7 (±12.1)
Async piES
 10 nodes
8.6 (±1.1)
37.6 (±3.4)
 20 nodes
14.6 (±1.8)
73.7 (±5.9)
 50 nodes
22.4 (±1.9)
168.0 (±10.1)
pLSA
 10 nodes
21.4 (±4.0)
26.1 (±5.3)
 20 nodes
20.1 (±1.7)
28.8 (±5.4)
 50 nodes
31.5 (±15.9)
43.3 (±20.7)
The mean number of ODE Solutions performed before reaching 'goodenough' and 'good' solutions for the Synchronous and Asynchronous piES algorithms, and pLSA.
Comparison of Algorithms for 'Goodenough' Solutions
Comparison of Algorithms for 'Goodenough' Solutions. A comparison of (A) the speed and (B) the robustness or reliability of the three algorithms (asynchronous and synchronous piES, and pLSA) for achieving a 'goodenough' solution (an objective function value of 550000 or less). The speed is the inverse of the time, in hours, taken to achieve the target value, and the robustness or reliability is the proportion of runs that reached the target objective value. Error bars represent 95% confidence intervals on the mean.
The behaviour of the algorithms changes significantly when they are required to reach the 'good' target (Figure
Comparison of Algorithms for 'Good' Solutions
Comparison of Algorithms for 'Good' Solutions. A comparison of (A) the speed and (B) the robustness or reliability of the three algorithms (asynchronous and synchronous piES, and pLSA) for achieving a 'good' solution (an objective function value of 350000 or less). Axes and error bars as in Figure 4.
Discussion
Parallel Efficiency of the Evolutionary Strategy
The results from the investigation of the serial algorithm show an interesting outcome: increasing the number of islands gives increased descent speed per generation, even if the number of islands is very large. In the serial case, this increased search capacity becomes outweighed by the increased computational load of the population when the number of islands increases beyond the optimal number of 1. On the other hand, it suggests that the algorithm has the capacity to be parallelised to a large number of processors.
The rate at which the serial algorithm speed decreases with the number of islands places a limit on the efficiency of the parallel algorithm (as shown by the predicted limits in Figure
Both synchronous and asynchronous parallel implementations of the islandES scale well with the number of nodes (Figure
Simulated Annealing vs Evolutionary Strategy
The most striking feature of the descent curves of pLSA and the piES algorithm (Figure
First, the piES mean descent curve begins at a much lower objective function value than that of pLSA. This is caused by differences in the initialisation procedure of the two algorithms. The pLSA algorithm begins with a single starting solution for all processors, which undergoes a high temperature 'burn period' to erase dependence on the initial condition. This is followed by a period of statistics collection (in parallel) at constant, high temperature in order to initialise estimator values for the temperature schedule. This means that each of the
Second, the initial speed of descent is much higher for the piES. This is probably due to a particular difference in the early operation of the two algorithms. During the early stages of descent, the annealing temperature is high for pLSA, and thus there is little selection for better solutions; once the temperature is lowered the selection for better solutions increases, but simultaneously the solution is getting closer to the minimum, and the slowing associated with the decreased move size counteracts the lower temperature. The piES algorithm, in contrast, begins a full selection schedule straight away, allowing descent at maximum speed from the very start of the algorithm. Note that the reason that the piES can afford to start fast, but pLSA cannot, is that the multiindividual nature of evolutionary algorithms allows a diversity of individuals (and thus lower objective value solutions) to remain despite a decrease in mean objective value; if pLSA was to decrease at this rate, it would lose quasiequilibrium and fail to converge, becoming stuck in a local minimum.
Third, the piES algorithm converges to a lower mean objective value across all runs than pLSA. The reason for this appears to be driven by the unreliable nature of pLSA compared to the more robust performance of the piES, as shown in Figures
The unreliable nature of pLSA has been commented on before
Further Methodological Improvements
The complexity of models in systems biology is constantly increasing, and thus the speed required of optimisation is always growing. The
Comparisons between gap gene circuits with 4 or 6 genes indicate that even a very moderate increase in model complexity can lead to a significant decrease in reliability of the pLSA algorithm (J. Jaeger; unpublished results). Our observations indicate that lack of robustness of the pLSA algorithm is due to the fact that most pLSA runs fail. On the other hand, those runs that converge, do so very rapidly. Therefore, efficiency of the pLSA algorithm could be improved significantly, if we managed to find a reliable method to separate failing from promising runs early on during optimisation. Such an approach has been suggested previously
While the piES is both faster and more reliable than pLSA, Figure
One method for increasing the speed of both pLSA and piES algorithms comes from the observation that local searches tend to converge very rapidly and reliably to the global minimum, given initial conditions which are sufficiently close to the global solution
An alternative approach for increasing parallel efficiencyspecific to evolutionary algorithms such as the piESis the hierarchical approach
Conclusions
Progress in systems biology crucially depends on efficient and innovative computational methods. In the case of the gap gene network, it was an innovative approachthe gene circuit method
It was not our intention here to achieve a systematic benchmark comparison of different optimisation strategies. This has been achieved elsewhere
Authors' contributions
LJ implemented both versions of the piES algorithm, performed optimisation runs, algorithm comparison and the statistical analysis of optimisation results. JJ proposed the research and supervised the work. LJ and JJ wrote the manuscript.
Acknowledgements
JJ was supported by the UK Biotechnology and Biological Sciences Research Council (grant number BB/D00513) and the MECEMBL agreement for the EMBL/CRG Research Unit in Systems Biology. LJ wishes to thank the Cambridge Computational Biology Institute for funding and support. This work was performed using the Darwin Supercomputer of the University of Cambridge High Performance Computing Service