The diverse effects of phenotypic dominance on hybrid fitness

Abstract When divergent populations interbreed, their alleles are brought together in hybrids. In the initial F1 cross, most divergent loci are heterozygous. Therefore, F1 fitness can be influenced by dominance effects that could not have been selected to function well together. We present a systematic study of these F1 dominance effects by introducing variable phenotypic dominance into Fisher's geometric model. We show that dominance often reduces hybrid fitness, which can generate optimal outbreeding followed by a steady decline in F1 fitness, as is often observed. We also show that “lucky” beneficial effects sometimes arise by chance, which might be important when hybrids can access novel environments. We then show that dominance can lead to violations of Haldane's Rule (reduced fitness of the heterogametic F1) but strengthens Darwin's Corollary (F1 fitness differences between cross directions). Taken together, results show that the effects of dominance on hybrid fitness can be surprisingly difficult to isolate, because they often resemble the effects of uniparental inheritance or expression. Nevertheless, we identify a pattern of environment‐dependent heterosis that only dominance can explain, and for which there is some suggestive evidence. Our results also show how existing data set upper bounds on the size of dominance effects. These bounds could explain why additive models often provide good predictions for later‐generation recombinant hybrids, even when dominance qualitatively changes outcomes for the F1.

: Size-dependent distribution of dominance coefficients   Figure S5. The heterozygous effect of a mutation on a given trait is a mut (1 + δ mut ). Here, a mut is the mutation's additive effect, which is normally distributed with vanishing mean and standard deviation V amut (eq. 60). The dominance coefficient, δ mut , is drawn from a beta distribution, whose parameters vary with the size of of the additive effect according to eqs. 61-62. The aim is to reflect empirical observations that lethal or highly deleterious mutations are often recessive, while small-effect mutations are more likely to be semi-dominant (Orr, 1991;Manna et al., 2011).  Figure S2: Transgressive F1 trait variation can increase or decrease with genetic divergence. Curves show the probability that the F1 trait value lies outside of the range of values in the two parental lines (eq. 48). The variance in the dominance coefficients, V ∆ increases linearly with genomic divergence, d (e.g. eqs. 6-7). If the divergence in the parental phenotypes also increases with d (e.g. due to directional selection) then the probability of transgressive variation decreases with d (grey curve). If the parental phenotypes remain roughly constant, e.g., due to effective stabilizing selection, then the probability of transgression increases (black curve).
Both regimes have been observed in data (Stelkens and Seehausen, 2009  Results also hold in simulations where each mutation had a single value of δ mut applying to all n traits; such that heterozygous effects differed from the homozygous effects in their length but not in their direction. This shows that the effects of dominance reported here do not depend on the variation in direction.     56 is shown for various values of n, namely (dark-to-light) n = 2, 5, 10, 100 and 1000. The expectation (evaluated numerically) is close to the value of the crude approximation 2/n (eq. 20), unless the quantity Υ 2 /V a V δ deviates greatly from unity. This quantity compares the deviations due to dominance, and the deviations due to lack of autosomal co-adaptation with the absent paternal X chromosome (see Figure 5a), and depends mainly on the variance in the dominance effects and the proportion of uniparentally inherited loci (eq. 58).

Appendix 1: Derivations
In this Appendix, we derive the analytical results in the main text. The relevant main text equations are listed next to each section. We present results under a quadratic fitness function, but in case of higher fitness landscape curvature, our results apply to the squared distance to the optimum instead. For some datasets, the curvature can be estimated (eg. if F1 fitness has been measured across a range of parental divergence, assuming selection was predominantly stabilizing) and this can be used to transform fitness measures to obtain squared distances.

Ffitness with biparental inheritance
Many results in the main text are given in terms of the total dominance deviations on each trait, that is, the ∆ i as defined in eq. 4. The strongest assumptions that we use are the normality and isotropy of the ∆ i (i.e. that all ∆ i have equal variances). However, neither assumption is required for all results, and the assumption of isotropy is made largely for simplicity.

Expected log fitness (eqs. 3-11)
To obtain the expected log fitness of the midparent given in eq. 3, we use the definition of cosine similarity that captures the similarity between two vectors, which in our case are the displacements of the two parental lines from the optimum.
With the geometric mean of two numbers defined as the square root of their product, we can replace √ ln w P1 × ln w P2 with −ln w P (g) to obtain eq 3. Note that because the substitution effects are defined with respect to the P1 genotype, the log fitness of the midparent can also be expressed in terms of the substitution effects as ln To calculate the expected log fitness of the F1 we need to consider the heterozygous effects of the substitutions with dominance where we have used the definition of ∆ i provided in eq. 4. After rotating the axes such that the midparent falls short of the optimum only on trait 1 such that z MP,i = 0 for all i ̸ = 1, the cross product simplifies to −2 |lnw MP |∆ 1 . Then we can separate trait 1 from the others to obtain eq.

5.
So far, results are exact. However, we will usually not know the ∆ i and therefore we would like to know what its expected value and variance are under different scenarios of parental divergence.
In other words, we want to find the expected log fitness of the F1, where the expectation is across realisations of the divergence history of the parental lines, i.e. over the complete sets of a ij and δ ij that give rise to different ∆ i . We begin by expanding the brackets in eq. 5: Then we take expectations over the possible values of the dominance deviations, to account for their relative independence of the form of selection acting on the parental lines.
Equations 9-10 follow directly, if we focus only on trait 1.
and so eq. 6 follows directly if we use the notation: For eq. 7, we use equivalent notation to refer to the effects of the individual substitutions, which under stabilizing selection, we can treat as increments in a random walk, with variance where a i δ i is a randomly chosen dominance effect on trait i, and the expectations are over realizations of the evolutionary process. We can also define an equivalent quantity for the additive effects.
where a i is a randomly chosen additive effect on trait i. V δ cannot generally be defined in the same way, except in the special case of isotropy, and no covariance between the squares of the additive and dominance effects. In this case V δ = Var (δ i ) such that it has a more intuitive definition, equal to the variability across realisations of divergence in the dominance coefficients of fixed substitutions. We can see this if we write V aδ in terms of moments of additive effects and dominance coefficients, where similar to the above we use δ i to refer to the dominance coefficient on trait i of a randomly chosen substitution.
With exchangeable parental lines, and recalling that the effects of a substitution are defined with respect to P1, irrespective of whether they are ancestral or derived, it follows that E (a i ) = Without directional dominance Cov (a i , δ i ) = 0, and whenever larger substitutions have no tendency to have more extreme dominance coefficients, then Cov (a 2 i , δ 2 i ) = 0, and we get Under isotropy, the distribution of substitutions is the same across traits, so in this case we obtain the more intuitive interpretation V δ = Var (δ i ) and also V a = Var (a i ), where both are a constant.

Directional selection (eqs. 9-11)
Under directional selection on trait 1, we expect additive effects in the direction of the optimum to be more dominant. Therefore, for these adaptive substitutions (denoted by an asterisk * ), Cov (a * 1 , δ * 1 ) will not vanish. However, we can obtain results in a simplified case in which the initial bout of adaptation took place in both parental lineages, taking them both from an old optimum to a new optimum, and where these optima differed only on trait 1. We will assume that the adaptation took place in exactly d dir substitutions, and that these substitutions affected only trait 1. We will further assume that their additive effects and dominance coefficients covary in sign, but not in size, such that Cov (|a * 1 |, |δ * 1 |) = 0. In other words, large adaptive substitutions are no more likely to be dominant or recessive than small adaptive substitutions, but substitutions pointing from the MRCA towards the optimum (adaptive substitutions that are derived in P2) have δ * 1 , a * 1 > 0 whereas substitutions pointing towards the MRCA and away from the optimum (adaptive substitutions derived in P1) have δ * 1 , a * 1 < 0. Taken together, the chain of adaptive substitutions runs from the new optimum to the old optimum and then back, and so its total length is 2d dir |a * 1 |, and we have d dir |a * 1 | = |ln w MRCA |, where w MRCA is the fitness of ancestral genotype (adapted to the old optimum), in the environment characterized by the new optimum. With these assumptions, we can now derive the moments of ∆ 1 after both the adaptation, and a further period of d − d dir substitutions, accrued under stabilizing selection: where Here, we used the random-walk assumptions, described above, for the substitutions accrued after the adaptation. Equation 35 also uses the stronger assumption that covariance between the dominance effects of the adaptive substitutions can be ignored. After making these assumptions, eq. 11 in the main text follows directly.
We can also give a more complete result that applies during the ongoing adaptation to the new optimum. During the adaptation phase i.e. before the parental lines have reached the optimum, to denote the proportion of the distance to the new optimum the parental lines have already travelled. Considering only the trait under directional selection, we find For the overshoot of the F1, we find These expressions are plotted in Figure S4 for substitutions with dominance coefficients of identical size |δ|.

Variance in F1 log fitness (eq. 12)
To calculate eq. 12 we also require the variance in log F1 fitness. Its behaviour is easiest to see if we assume normality and independence of the ∆ i . In this case, we find: If the midparent remains well adapted, but the dominance deviations are large, then the second term will come to dominate. Because V ∆ grows with the divergence, this will typically hold at large divergences, when d ≫ 1, and so Using the same limit in eq. 6 yields eq. 12.
1.3 F1 fitness in preferred parental environment (eqs. 14, 15) When the parental lines are adapted to two different optima, then the F1 will tend to be fitter in one of the two parental environments (i.e. its "preferred" environment). In that environment, we can write its log fitness using a small modification of eq. 24: If we assume that the parental lines are each optimal in one of the habitats, then distance of midparent from either optimum is half of the distance between the optima, such that ln w MP = ln w P /4, where ln w P is the log fitness of the maladapted parental line in any given habitat. Next, we assume that ∆ 1 is normal, with a vanishing mean, so that E(|∆ 1 |) = Var(∆ 1 )2/π. Equation 14 then follows directly if we assume equal variances for the dominance deviations on all traits, so To derive equation 15, we differentiate eq. 14 by V ∆ , and set to zero. This yields the value of V ∆ that maximises the expected log fitness. Substituting that value into eq. 14 yields eq. 15.
As with eq. 39, we can also derive the variance in lnw F1 pref using the assumptions of normality, independence and vanishing means for the ∆ i : At large divergences, the first term will come to dominate, supporting the assertion in the main text that eq. 12 still holds for lnw F1 pref .

Probability of lucky heterosis (eq. 13)
Now let us consider the F1 between phenotypically identical parental lines, that are both maladapted. To calculate the probability of their F1 being heterotic (Fig 4b), we first note that, with isotropy, such that all V ∆ i = V ∆ , then, from eq. 5, the quantity | ln w F1 |/V ∆ is the sum of n squared normal variables, one with a non-zero mean of | ln w MP |/V ∆ . As such, this quantity will follow a non-central chi-squared distribution, with n degrees of freedom, and non-centrality parameter given by | ln w MP |/V ∆ = | ln w P |/V ∆ . Here, we have used the fact that the midparent will be identical to the two (identical) parental phenotypes. We can write the cumulative distribution function of this variable as where we have used the Marcum Q function (Marcum, 1950). The F1 will be heterotic if it is closer to the optimum than the midparent and so This is the exact result plotted in Figure 4b. This result can also be written in terms of modified Bessel functions of the first kind, as follows: (Helstrom, 1995). Approximating these functions as I j (u) = e u √ 2πu (see, e.g. Guo et al., 2001), yields eq. 13 of the main text. Figure S6 shows similar results when the parental lines are equally maladapted, but no longer phenotypically identical. To derive these results, we simply change the non-centrality parameter (i.e., the first argument of the Q function in eq. 44), which represents the position of the midparent.
When the parents are maladapted in opposite directions, then the midparent matches the optimum, such that this argument is 0 (blue points and lines in Fig. S6). When the parents are maladapted in orthogonal directions (e.g., on different phenotypic traits), then this argument is 4| ln w P |/V ∆ (red points and lines in Fig. S6).

Transgressive segregation in the F1
The procedure used in the previous section can also be used to calculate the probability of transgressive variation in a single trait: i.e., an F1 trait value that lies outside of the range of values among the parental lines (Stelkens and Seehausen, 2009). If there is no directional dominance, then the mean F1 trait value will be the midparental value, and its variance will be equal to the variance of the dominance deviations.
where we have ordered the parents by their trait value. Now, assuming normality, the proba-bility of the F1 trait value lying outside of the parental range is Because V ∆ grows linearly with d, the probability of transgressive variation will increase with d if the divergence between the parental phenotypes also increases with d (at least as fast as √ d).
If, by contrast, the difference between the parental traits remains static, then the probability of transgressive variation will decrease with d. This is illustrated in Figure S2. Both regimes were observed by Stelkens and Seehausen (2009).

Uniparental inheritance and recombinant hybrids
In this section, we present results for a more general type of hybrid, in which some loci are either hemizygous or homozygous. To do this, we will assume that the the parental lines are under stabilizing selection, as described above, such that the dominance deviations undergo a random walk in phenotypic space. We also use existing results for recombinant hybrids, conditional on the F1 fitness (Simon et al., 2018;Schneemann et al., 2020;De Sanctis et al., 2022).

Expected log fitness and bounds on V δ (eqs. 16-18 and 21)
An arbitrary hybrid can be characterized by its hybrid index, 0 ≤ h ≤ 1 (i.e. the proportion of divergent alleles from one of the parental lines) and by its heterozygosity, 0 ≤ p 12 ≤ 1 (the proportion of divergent loci carrying one allele from each parental line). Let us first begin by defining A i as the analogue to ∆ i for the additive effects.
With this notation, the midparental log fitness can be written as It then follows from eq. 3, that when the populations are maladapted from their optimum in random directions, such that E (ln w MP ) = ln w P 2 , then Now, using the assumptions discussed above for stabilizing selection, we can combine eq. 52 with the published results, to yield the expected log fitness of an arbitrary hybrid.
Note that eq. 53 contains three terms, the second capturing the deleterious effects of dominance, and the third capturing the deleterious effects of lack of coadaptation among the additive effects.
Equation 53 also reduces to results in the main text for a fully heterozygous F1, for which h = 1/2 and p 12 = 1. Equation 21 follows from comparing this F1 to the arbitrary hybrid, when parents are well adapted (ln w P = 0). Equations 16-18 also follow simply. For an XO F1 hybrid, all of the heterozygosity is found on the autosomes, such that p 12 = 1 − x while the hybrid index is h = (1 ± x)/2, depending on which parental line contributed the X. It follows, therefore, that (56) So this quantity is not expected to change in any consistent way with the genetic divergence, d. We have not been able to evaluate this expression analytically, but numerical integration shown in Figure S10, shows that, to a rough approximation which agrees with the simulation results reported in the main text (Figure 6c-d) and in Figure   S5e-f. We can also anticipate the typical value of Υ 2 /V aδ , using eq. 16. This shows that suggesting that Υ 2 /V aδ ∼ 1 will often hold for plausible parameter values.
All simulations used α = 1/2. In all cases, we simulated pairs of allopatric diploid populations, which were initially identical with no segregating variation. The ancestral population was phenotypically optimal for all simulations except for those presented in Figures 3e-f (modelling directional selection) and 4d (modelling locally adapted parents), where the ancestor was placed at unit distance from the optimum, such that the ancestral population had a relative fitness of e −α ≈ 61%, and the simulations began with a bout of directional selection towards the optimum.
In all cases, the diverging populations were of constant size N individuals (2N chromosomes), and contained simultaneous hermaphrodites with discrete non-overlapping generations.
To produce the next generation, 2N prospective parents were sampled with replacement, with probabilities proportional to their fitness. Gametes were produced with free recombination among all loci, and assuming infinite sites, and subject to mutation. The total number of mutations was drawn from a Poisson distribution with mean 2N U and distributed randomly across the 2N gametes. There was no back mutation. The mutation rate U was chosen such that multiple mutations generally segregated together (N U ≥ 1). The additive effects of each mutation on each each trait, were drawn independently from a normal distribution, with vanishing means and variance denoted V amut . We set this mutational variance at: Because the homozygous effect of a mutation was twice its additive effect, eq. 60 ensures that the mean selection coefficient acting on a homozygous mutation occurring in an optimally fit individual was approximately −s, irrespective of the number of traits under selection, n.
The heterozygous effect of a mutation was equal to its additive effect, a mut , multiplied by (1 + δ mut ), where δ mut is its dominance coefficient. For simulations under phenotypic additivity, we simply set δ mut = 0 for all mutations. With variable phenotypic dominance, the δ mut were drawn from shifted beta distributions, which we bounded at −1 (for a wholly recessive mutation) and 1 (for a wholly dominant mutation).
For the simulations reported in Figures 3, 4d, 6, and S3, we used the same shifted beta distribution for all mutations, regardless of their additive effects. To do this, we set set E(δ mut ) = 0 and Var(δ mut ) = V δmut . These distributions are shown in Figure 3a. Generating dominance coefficients in this way allowed us to adjust the amount of variance in dominance coefficients of new mutations V δmut , which was reflected in V δ .
For the simulations reported in Figures 2c, 5b and S5, we generated the δ mut values to reflect some of the known properties of large and small effect mutations (Billiard et al., 2021). Specifically, small effect mutations were close to additive with little variance, whereas large effect mutations were predominantly recessive but with a higher variance, such that some would be dominant (Orr, 1991;Manna et al., 2011). This was achieved by sampling the dominance multipliers from a beta distribution such that E(δ mut ) ≡ µ = 1 − 2 1 + exp −2 |amut| √ Va mut and Var(δ mut ) = µ(µ 2 − 1) The distributions generated by eqs. 61-62 are illustrated in Figure S1.
For each set of parameters, we simulated 200 independent populations that each evolved until they fixed 1000 substitutions. We then randomly paired populations to obtain 100 pairs for the for-mation of F1 hybrids. We considered the substitutions that had fixed in the number of generations it took for the first population of the pair to reach 1000 substitutions. To produce the F1, for standard biparental inheritance (and also for homogametic hybrids), we took the heterozygous effect of all substitutions. For our heterogametic hybrids, which were our example case of uniparental inheritance, we assigned a random proportion x of substitutions to be on the X chromosome, and considered their homozygous rather than their heterozygous effects. This is equivalent to a form of dosage compensation, involving up-regulation on the X (see main text). Note that these X-linked substitutions were chosen after simulating the divergence. This meant that all sites were biparentally inherited and expressed during the divergence process. This guaranteed that, for our simulations, the distribution of dominance coefficients did not differ between the X and autosomes, and both matched the mutational input, such that (V δ ≈ V δmut ). While this simulation procedure does not incorporate the expectation that X-linked loci will have more extreme dominance effects (Charlesworth et al., 1987), it allowed us to illustrate the conditions for Haldane's Rule, eq. 18 (see Figure 6). Furthermore, the bound on the size of dominance effects that follows from eq. 18 applies to X-linked loci, so if V δ is smaller for autosomal loci, the bound still applies to V δ across all sites.