Spin Purification in Full-CI Quantum Monte Carlo via a First-Order Penalty Approach

In this article, we demonstrate that a first-order spin penalty scheme can be efficiently applied to the Slater determinant based Full-CI Quantum Monte Carlo (FCIQMC) algorithm, as a practical route toward spin purification. Two crucial applications are presented to demonstrate the validity and robustness of this scheme: the 1Δg ← 3Σg vertical excitation in O2 and key spin gaps in a [Mn3(IV)O4] cluster. In the absence of a robust spin adaptation/purification technique, both applications would be unattainable by Slater determinant based ground state methods, with any starting wave function collapsing into the higher-spin ground state during the optimization. This strategy can be coupled to other algorithms that use the Slater determinant based FCIQMC algorithm as configuration interaction eigensolver, including the Stochastic Generalized Active Space, the similarity-transformed FCIQMC, the tailored-CC, and second-order perturbation theory approaches. Moreover, in contrast to the GUGA-FCIQMC technique, this strategy features both spin projection and total spin adaptation, making it appealing when solving anisotropic Hamiltonians. It also provides spin-resolved reduced density matrices, important for the investigation of spin-dependent properties in polynuclear transition metal clusters, such as the hyperfine-coupling constants.


INTRODUCTION
Strongly open-shell molecules present a number of challenges to quantum chemical methods, arising from the large number of nearly degenerate states with different total spin quantum number, S, which exist in such systems and are in general hard to resolve. In these systems, spin contamination is a major problem for an accurate description of their electronic spectrum. Such systems usually exhibit a strong multireference character, with numerous dominant electronic configurations featuring similar weights in the configuration interaction (CI) expansion. Furthermore, when a high-spin state is the groundstate, states of the same symmetry but with lower spin are impossible to obtain with ground state projective techniques. For these reasons, there has been much interest in recent years in developing spin-adapted approaches, which work in Hilbert spaces of configuration state f unctions (CSFs), rather than Slater determinants (SDs). 1−16 In these approaches, Ŝ2 symmetry is explicitly enforced, ensuring zero spin contamination, and enabling the targeting of any desired spin state. The Graphical Unitary group approach (GUGA) 17−26 is one such example of a fully spin-adapted approach, which was implemented within the stochastic full-CI quantum Monte Carlo (FCIQMC) 9,27−30 and the Stochastic-CASSCF 10,31,32 frameworks. Recently, we have discovered a strategy within GUGA that allows an unprecedented reduction of the multireference character (compression) 32−35 of ground-and excited-state wave functions and the unique possibility to perform state-specific optimizations of ground-and excited-state wave functions. 34,35 These properties arise from a unique block-diagonal structure of the GUGA Hamiltonian matrix, even within the same spinsymmetry sector, that follows chemically/physically motivated molecular orbital transformations. 34 This strategy has been applied to exchange-coupled polynuclear transition metal clusters with a large number of localized open-shell orbitals 32−35 and to one-dimensional Heisenberg and Hubbard model Hamiltonians. 36 In the latter cases, a connection with the concept of alternancy symmetry can be envisioned. 37,38 Other sparse FCI solvers 2,39−50 could also benefit from the enhanced sparsity of the Hamiltonian and wave functions that follow the above-mentioned strategy.
However, such sophisticated approaches to spin adaptation incur a number of complications related to their increased algorithmic complexity, including matrix element calculation and excitation generation process. 9,32 In addition it is possible 51−53 but complicated to describe spin projection properties in a spin adapted basis, which is, e. g., necessary for anisotropic Hamiltonians or the calculation of spin polarization. Furthermore, in systems with a more delocalized character (i. e., covalency), the aforementioned compression advantages of the GUGA method are less evident. For these reasons, it is highly desirable to have a Slater determinant based approach to spin adaptation, which is possible via spinpurification concepts.
For cases with an even number of unpaired electrons, such as the oxygen dimer discussed later, it is also possible to place constraints on the spin by applying time-reversal symmetry and by working with pairs of spin-coupled functions. 54 This reduces the size of the Hilbert space by a factor of 2, while reducing any spin contamination, as in the reduced space either all even or odd spin states can be excluded. However, this strategy cannot separate singlet from quintet, nor can it operate in cases with an odd number of unpaired electrons.
The aim of the present article is to introduce one such method, based on a simple first-order spin-penalty approach, within the context of Slater determinant based FCIQMC. Spin purification techniques, including the first-order spin penalty approach, have recently been discussed in details by Levine and co-workers 55 and have already been utilized in the context of renormalization approaches. 56,57 We build on the existing literature by explaining the origin for a range of optimal spin penalty parameters.
The remainder of the article is organized as follows. In the next section, we explain the theory behind the first-order spinpenalty method and explain the origin of the optimal spin penalty parameter. Section 3 contains two applications, the oxygen dimer and a trinuclear manganese cluster, to showcase the general applicability of this method in FCIQMC. Section 4 contains a summary and conclusion of our results. The Appendix (Section 5) contains detailed derivations of the convergence speed for different choices of the spin penalty parameter.

THEORY
2.1. First-Order Spin Penalty Method. We write the total spin operator, Ŝ2, in terms of the spin projection, Sẑ, and the ladder operators, S+ (raising) and S− (lowering) (see also Given two SDs, |D i ⟩ and |D j ⟩, the expression for ⟨D i |Ŝ2|D j ⟩ is then given by where n α OS is the number of unpaired (open-shell, OS) α electrons. The off-diagonal elements are nonzero only for exchange excitations and are equal to sgn(D i , D j ) = ±1, where the sign is given by the product of Fermionic phase factors. 58 Since exchange excitations require the same orbital configuration in both determinants, the Ŝ2 matrix features an interesting block-diagonal structure. Larger blocks are characterized by a common n α OS value, while the sub-blocks are characterized by a common occupation number vector. This block-diagonal and sparse structure (see Figure 1) is particularly well suited for FCIQMC.
In the first-order spin penalty approach, a modified Hamiltonian is utilized. If J is chosen such that the low-spin state becomes the lowest state in the modified Ĥ′ Hamiltonian, ground state methods, including FCIQMC, will converge to that state. The on-the-fly evaluation of modified Hamiltonian matrix elements does not require additional memory, and has negligible runtime costs for the evaluation of the Ŝ2 correction. Since Ĥand Ŝ2 commute, the eigenstates of Ĥ′ are still eigenstates of Ĥand the eigenvalues of Ĥcan be directly calculated from the corresponding eigenvalues of Ĥ′ by subtracting J·S(S + 1). Note that this subtraction can be performed in a well-defined manner only for converged eigensolutions. For unconverged, intermediate results, for example along FCIQMC dynamics and before stationary conditions are reached, it is necessary to evaluate directly the original Hamiltonian Ĥ. In the present work we calculate the latter as an expectation value from the stochastically sampled one-and two-body reduced density matrices (RDMs).
2.2. Range of Optimal J Values. In the following, we discuss an optimal choice of the J value. For the unique J that makes all high-spin states energetically above or degenerate to the targeted spin state (f irst f lipping point), a spin-symmetrybroken wave function is to be expected, which is an arbitrary admixture of the degenerate spin states in the modified Hamiltonian. This f lipping point satisfies the following relation where E ls,0 and E S,0 are the nonpenalized ground-state energies of the targeted low-spin (ls) and the higher spin (S) state and Δ⟨Ŝ2⟩ = S(S + 1) − S ls (S ls + 1) corresponds to the difference in their spin expectation values. For J values larger than the first flipping point the desired low-spin state is obtained in the longtime limit of the FCIQMC dynamics. However, the speed of convergence and stability of the imaginary-time propagation in FCIQMC depends on how far J is from the first flipping point. At first glance, one could expect that, above the first flipping point, higher J values only improve the speed of convergence, because they increase the energy of the high-spin states, so an imaginary-time propagation with the modified Hamiltonian  (8,6) active space with S z = 0, corresponding to a minimum active space for the singlet state of oxygen. White denotes zero, while blue denotes ±1 entries. The diagonal is omitted. exp(−τ(Ĥ′ − E 0 )) projects them out faster. But, as we (vide infra) and Levine et al. 55 observed, there exist J values beyond which the convergence deteriorates. Intuitively this can be understood, because for increasingly larger J values the modified Hamiltonian can be interpreted as mainly a Ŝ2 operator, corrected by a small perturbation represented by Ĥ. Diagonalization of the pure Ŝ2 operator results in numerous degenerate eigenstates with equivalent spin eigenvalues (not energies), the lowest being S(S + 1) = 0 or + = S S ( 1) 3/4, for even and odd numbers of electrons, respectively. This degeneracy is in part lifted by the Hamiltonian Ĥ. However, for very large J values, the Ĥcorrection becomes relatively small compared to J·Ŝ2, and projecting out the high-energy states of same spin becomes harder.
If we look at qantitative estimates for the speed of convergence (the detailed derivations are provided in the Appendix (Section 5)), an interval of J values exists inside which the speed of convergence is nearly constant and optimal (see Figure 2). The lower bound of this optimal range can be estimated by the second f lipping point where E ls,1 is the energy of the first low-spin excited state of desired spin. The speed of convergence increases proportionally to the energy separation between the lowest and the second-lowest state. For J f,1 < J < J f,2 , we have E ls,0 ′ < E hs,0 ′ < E ls,1 ′ , and the spread between lowest and second to lowest energy state increases with J. For J > J f,2 , we have E ls,0 ′ < E ls,1 ′ < E hs,0 ′ , and the energy separation between lowest and second-lowest energy state is unaffected by J (under the assumption that they have the same spin multiplicity). Thus, for J > J f,2 , but still below the upper bound discussed in the following, the convergence is nearly independent of J (plateau in the speed of convergence, Figure 2).
For optimization techniques based on the imaginary-time Schrodinger equation, such as FCIQMC, the upper bound of the optimal range of J is given by the τ-f lipping point, J f,τ , which denotes the point where the maximum time step starts to be dominated by a 1/J proportionality. In the case of deterministic imaginary-time propagations we have Δτ = (E max ′ − E 0 ′) −1 . If we assume that the highest energy state in Ĥ′ (for increasing J > J f,1 ) and the energetically lowest spin-state have the same spin multiplicity, their energy difference will not be affected by J for a large range of J. For typical full-valence active space calculations this is a well-founded assumption. For J > J f,τ , the spread of Ĥ′ becomes increasingly dominated by the spread of spin expectation values, Thus, larger J values lead to smaller optimal Δτ values, with a consequent reduction of the speed of convergence.
In the case of stochastic imaginary-time propagation, as in FCIQMC, the J f,τ is more complicated to find. The time step has to be chosen differently compared to the deterministic case to achieve stable dynamics. The conventional 9,27 choice for Δτ in a SD basis is which not only depends on the spectrum of Ĥbut also on the excitation generator in use and the determinant pairs i, j where spawns actually happened during a calculation. Note the implicit assumption, that the stochastic Δτ from eq 6 is smaller than the deterministically chosen Δτ, that depends purely on the spread of the spectrum of Ĥ. As in the deterministic case, there will be a J f,τ from which the time step purely follows a (1/ J) dependency. If a weighted excitation generator is in use, the p gen for exchange excitations will increase with J as do the H i,j and depending on the dynamics there will be different pairs i, j fullfilling the minimum of eq 6. Hence unlike the deterministic case there might be changes of Δτ already before reaching J f,τ . We also observe that for too large J the stochastic noise of the Monte Carlo simulation increases.
In summary, the convergence improves with increasing J for J f,1 < J < J f,2 . Also, there exists a J f,τ after which a 1/J dependency of the time step follows, negatively affecting the speed of convergence. Between J f,2 and J f,τ there is a plateau of nearly optimal J values. We would like to point out (see Figure   Figure 2. Number of iterations required to achieve convergence up to 1 × 10 −5 E h using deterministic imaginary-time propagation for the Γ (1/2) state of the manganese cluster in an (9,9) active space. J f,1 was calculated using eq 4, J f,2 was calculated using eq 5, and J f,τ was calculated using the spread (E max ′ − E 0 ′), which was approximated from the spread of diagonal values of Ĥ′.
The Journal of Physical Chemistry A pubs.acs.org/JPCA Article 2) that the convergence deteriorates slower for J > J f,τ , than for J < J f,2 ; i. e., it is better to select a bit too high J values than too small ones. In addition a too large J > J f,τ affects the convergence rate, while a too small J < J f,1 leads to spin contamination. Therefore, it is advisable to choose J values inside the range but closer to J f,τ .
In practical applications, it is difficult to calculate the lower bound J f,2 because it requires knowledge of the spin-state energetics, which are exactly the purpose of the calculation. However, the J f,τ can be estimated from monitoring the time step and stochastic noise during FCIQMC training runs (using low walker populations) for different J values. The first flipping point J f,1 can be found by monitoring the spin expectation value for different J; if it converges to a high-spin state, the J is too small. We assume J f,τ − J f,2 ≈ J f,τ − J f,1 and select a trial J between J f,1 and J f,τ . In the case of stochastic imaginary-time propagation, it is generally advised to also monitor the stochastic noise and reduce J accordingly.

APPLICATION
The robustness of the spin penalty method in SD-based FCIQMC has been explored in two crucial test-case applications. We investigated the vertical 1 Δ g ← 3 Σ g transition in the O 2 molecule using a full-CI expansion in a double-ζ quality basis set, and the vertical Γ (1/2) ← Γ (3/2) (and Γ (9/2) ) transition in a [Mn 3 (IV) O 4 ] trinuclear cluster. In both cases, the ground state is the higher spin-state.
3.1. Oxygen Dimer. We used a distance of 1.203 Å, and correlated 16 electrons in the 28 orbitals of an ANO-RCC-VDZP basis. 59,60 The Full-CI calculations were performed on the basis of the state-specific CASSCF(8,6) orbitals. A spinpure calculation using GUGA-FCIQMC served as reference. The J parameter was set to 0.12 E h .
The ⟨Ĥ⟩ expectation value calculated from RDMs is shown in Figure 3. The triplet converges faster than the singlet, and its total energy nearly matches the energy of the GUGA reference calculation for the same walker number. Generally, convergence with respect to the number of walkers in initiator-FCIQMC is mainly influenced by the compactness of the respective wave function. The triplet calculation started from a SD with |M S | = 1, which is also the spin-pure configuration that dominates the FCI wave function for this electronic state. On the contrary, in the case of the singlet spin state, a multideterminantal wave function is required to correctly describe the spin-pure reference space; therefore, the calculation converges slower with respect to the walker number than the GUGA-based one. However, the wall clock time to achieve the same quality of convergence is roughly comparable, since SD-based FCIQMC is generally faster for a given walker number, as discussed in the literature. 9,33 For all walker populations, the spin expectation values have been used to confirm convergence to the correct spin state and monitor spin contamination. The deviation was larger for the singlet whose mean spin expectation value was 1.30 × 10 −5 , compared to the theoretical 0.
3.2. Manganese Cluster. Two active spaces have been defined to test the spin penalty approach on the [Mn 3 (IV) O 4 ] trinuclear cluster (Figure 4). A small CAS (9,9) is utilized to    (55,38) has been employed to demonstrate the numerical stability of the method in more realistic scenarios. The CAS (9,9) consists of the nine singly occupied t 2g orbitals on the three magnetic centers. The large active space consists of the 15 3d orbitals and their nine electrons, the 12 doubly occupied 2p orbitals of the bridging oxygen atoms, the 10 doubly occupied peripheral lone-pair orbitals pointing at the metal sites, and two doubly occupied orbitals of the −OH group, of σ and π character. A similar active space has been previously chosen for a similar [Mn 3 O 4 ] cluster. 35,64 Through experimental investigation, Armstrong showed that the ground state of this system is a Γ (3/2) spin state. 61 The too small CASSCF (9,9) erroneously predicts a Γ (9/2) ground state. Nonetheless, this small model calculation represents an interesting test case to explore the applicability of the spin penalty strategy. The larger CAS(55,38) describes qualitatively well the spin-state ordering, with a = S 3/2 ground state and a = S 1/2 state at slightly higher energy, in line with Armstrong's findings.
In Figure 5, we show the convergence behavior of the FCIQMC dynamics for different J applied to the CAS (9,9) wave function. For FCIQMC dynamics with J = 0 E h or too small spin penalties (J = 1 × 10 −5 E h ) the flipping point is not reached, and the FCIQMC dynamics converges to the highspin state = S ( 9/2), which is the ground state for the small CAS(9,9) model active space. For J values above the flipping point, the low-spin state wave function is obtained. These results are confirmed by the total spin expectation value (Figure 5b). Speed of convergence increases for larger penalty values, and a large range of J values (1 × 10 −4 E h ≤ J ≤ 2 × 10 −2 E h ) exists that provides stable and fast converging FCIQMC dynamics. Too large J values (>1 × 10 −1 E h ) lead to convergence problems, which is in line with J f,τ = 0.1 E h .
For the CAS(55,38) model active space the competing doublet = S ( 1/2) and quartet = S ( 3/2) spin state wave functions have been optimized. GUGA-FCIQMC has been utilized as a reference. Three choices of J were used that permitted the characterization of the doublet spin state, namely J = 1 × 10 −2 E h , 1 × 10 −3 E h , and 1 × 10 −4 E h . Figure 6 shows the energetics for the = S 1/2 and = S 3/2 states, as a function of the walker population. We notice that all dynamics are stable and fast converging. The choice of the large parameter, J = 1 × 10 −2 E h , results in a nearly exact matching of the spin-purified total energy with the one obtained from the spin-adapted GUGA-FCIQMC approach, at the same walker population.
Lower J values result in lower total energies for low walker populations. The lower energies for smaller J values are not to be interpreted as a faster convergence of the spin penalty approach for lower J. Instead, considering that the spin expectation value for the smaller J = 1 × 10 −3 E h is higher than the expected value ( Figure 7) we are brought to the conclusion that the unconverged wave function (low population) is in a broken-symmetry state, that results from the mixture of the  The Journal of Physical Chemistry A pubs.acs.org/JPCA Article target spin state (Γ (1/2) ) and the higher spin states (for example Γ (3/2) ). Admixing the higher spin states artificially lowers the total energy. For larger walker populations and for larger J, the spin expectation value gets closer to the targeted value, eliminating any spin contamination from the optimized wave function. It is worth noting that the 38 active orbitals have been localized and reordered to reach maximum compression of the GUGA wave function (see refs 33−35 for details). In Figure 11 of ref 34, we have shown that GUGA-FCIQMC converges faster than the Slater determinant FCIQMC counterpart, when using a tailored (in this case localized) MO basis. Thus, we expect that the chosen one-electron basis utilized for the CAS (55,38) calculation in general favors the GUGA-FCIQMC approach. However, we observe very similar convergence of the GUGA and the Slater determinant based spin-penalty approaches (for J = 1 × 10 −2 E h ). Moreover, it is interesting to notice that for an equivalent wall-clock time, the spin penalty approach can be run at higher walker population (2 × 10 8 walkers in the spin penalty method versus 1 × 10 8 walkers in GUGA) and reaches a lower total energy. These results suggest an overall better performance of the spin-purification approach. However, the GUGA strategy has two crucial advantages that we have documented in recent works: 32−35 (a) within GUGA the space of the leading electronic configurations (CSFs) can be greatly reduced and directly connected to physical concepts, and (b) the GUGA CI Hamiltonian matrix has a unique quasiblock-diagonal structure, allowing for unprecedented statespecif ic optimizations of ground and/or excited states.
As a final remark, we observe that the spin-penalty strategy enables the combined S-and M S -adaptation. This scheme is thus more flexible than the GUGA S-adaptation, and allows for the treatment of anisotropic Hamiltonians. One trades simplicity and universality for lower dimensionality when going from the M S -adapted space to the S-adapted one. Moreover, while stochastically sampled higher-order density matrices are already available within the SD-based FCIQMC approach, allowing for multireference second order perturbation theory (PT2) methods, 65,66 GUGA-FCIQMC three-and four-body density matrices are not available, preventing for the moment GUGA-based PT2 strategies. Additionally, it is possible to envision spin-pure similarity-transformed FCIQMC calculations based on transcorrelated methods 67−74 using the current spin penalty approach, while technical difficulties exist within the GUGA scheme, because of the presence of threebody interactions in the transcorrelated Hamiltonian.

CONCLUSION
In conclusion, we have demonstrated that spin purification based on a first-order spin penalty can be efficiently applied to the Slater-determinant based FCIQMC algorithm. We have also explained the origin of an optimal range of J values and that too large J parameters are to be avoided as they result in smaller time steps, deteriorating convergence. The method was successfully applied to calculate the 1 Δ g ← 3 Σ g transition in the O 2 dimer and the Γ (1/2) ←Γ (3/2) (and Γ (9/2) ) transition in a [Mn 3 (IV) O 4 ] trinuclear cluster model. The range of applicability of the spin penalty FCIQMC approach is very broad, including the coupling of large active space Stochastic-CASSCF and Stochastic-GASSCF wave functions to methods capable of recovering dynamic correlation outside the active space, such as MC-PDFT, 75,76 and tailored-CC, 77 and crucially methods that require high-order interactions (for example in the form of three-and four-body RDMs) such as PT2 and similarity-transformed techniques. A large range of chemical systems and models for solid state materials can be investigated, including ferromagnetic superconductors of practical interest, such as UGe 2 78 and URhGe. 79 The method can also be extended to model Hamiltonians, such as the Hubbard model, often used to investigate spin interactions in strongly correlated materials. By this approach we are able to tackle anisotropic Hamiltonians and, as spin-resolved density matrices are available, spin-dependent properties, such as the hyperfine coupling tensors (pivotal in characterizing spin interactions in polynuclear transition metal clusters), are within reach. These aspects will be the subject of future work.

■ 5. APPENDIX
In the following, we prove the existence of an optimal range of J values for the speed of convergence of the first-order spin penalty approach with the modified Hamiltonian in FCIQMC and discuss strategies for selecting it. The derivation that follows applies to any other method that performs a stochastic or nonstochastic linearized imaginarytime propagation. We write the imaginary-time propagation of Ψ(τ = 0) as without renormalization. After linearization we arrive at the projector where E 0 is the ground-state energy and the working equations arise from repeated application of Ĝwhich is nothing else than the power method for the operator Ĝ. We will first discuss the speed of convergence of power methods and apply it to the linear projector Ĝ.

Power Method
The speed of convergence for the power method is given by the following theorem. 80 Theorem 1 (Speed of convergence power method). Let  ∈ A n n , be a real, symmetric matrix with positive eigenvalues λ i sorted in decreasing order and so the largest eigenvalue λ 1 is non-degenerate. The eigenvectors corresponding to λ i are denoted with u i . We denote the starting guess with x (0) and define the guess-vector and guess-eigenvalue of the kth iteration as .
For J > J f,1 the lowest eigenstate u 1 is the same in all cases. Since we also use the same starting guess in all cases the deviation angle of the starting guess θ (0) is the same everywhere and we will omit the sin 2 (θ (0) ) term from further equations.
For further analysis we would like to emphasize that eq 13 is an upper bound, and depending on the energetically higher eigenvalues, λ i , i > 2, differences in convergence behavior may further occur.
We would like to showcase this with the three diagonal matrices from  4,4 : and a starting vector The λ i , 1 ≤ i ≤ 3 eigenvalues were not chosen randomly, but were calculated as the first three eigenvalues of the projector Gî n the (9,9) active space on the manganese cluster for different values of J. A corresponds to J = (J f,1 + J f,2 )/2, B corresponds to J = J f,2 + 0.0001 E h , and C corresponds to = τ J J 2 3 f, . The matrices A, B, and C have the same eigenvectors and the same spread of λ 1 − λ 4 = 1. By the ratio of λ 2 /λ 1 and theorem 1 it is expected that matrices B and C have the same convergence behavior and converge faster than A.
In Figure 8, we see that the convergence in the long-time limit is indeed described by Theorem 1 and is only depending on the ratio of λ 2 /λ 1 . Nevertheless in the beginning there is faster convergence for C because the third component is projected out faster, due to a smaller λ 3 . So even for a constant ratio λ 2 /λ 1 it is still advantageous to spread the distance between λ 1 and the other eigenvalues.

Power Method Bounds in FCIQMC
In order to apply Theorem 1 to Ĝwe need to assume 16) and that the ground-state is non-degenerate. The assumption from eq 16 is usually fulfilled in FCIQMC, since the reduction of stochastic noise often requires even smaller time steps. We denote the eigenvalues of Ĝwith λ i . By construction we have 1 = λ 1 , and by assumption (eq 16), we have λ n ≥ 0 which gives in total 1 = λ 1 > λ 2 ≥··· λ n ≥ 0. Note that the highest eigenvalue λ 1 of Ĝbelongs to the state with the lowest eigenvalue of H. We will write the eigenvalues of the modified Hamiltonian and resulting variables like the time step with a prime superscript, ′. To simplify the discussion herein, we assume deterministic imaginary-time propagation and that Δτ is chosen maximal 0 l s , 0 h s , 0 l s , 1 We want to determine the relevant eigenvalues of By construction we have with ΔE hs = E hs,0 − E ls,0 and Δ⟨S⟩ being the difference of Sê xpectation value between the high-and low-spin state. Because of the assumption for Δτ (eq 17) we get Figure 8. Error against the iteration number for the power method applied to matrices A, B, and C from eq 14.
The Journal of Physical Chemistry A pubs.acs.org/JPCA Article In total, we obtain (λ 1 − λ n ) = 1 and λ = λ λ 2 2 1 . As discussed in the main text we assume that the time step is not yet affected by J, so Δτ′ ≈ Δτ. We conclude that the convergence is bounded as  (23) which gets tighter with increasing J. 5.2.2. Case: J f,2 < J < J f,τ . The ordering of eigenvalues of Ĥ′ is given by Since the energetic spacing between eigenstates of same spin is not affected by J·Ŝ2 we get with ΔE ls = (E ls,1 − E ls,0 ). As in the previous case and by definition of J f,τ we assume that the time step is not yet affected by J, so Δτ′ ≈ Δτ. As before we conclude that the convergence is bounded as which is independent of J. Note that the other eigenvalues of Gl ike λ 3 are still affected by J, and the convergence is still slightly improved by increasing J. 5.2.3. Case: J > J f,τ . A tilde will be used for this case. The ordering of the eigenvalues is given by The energetic spacing between the two lowest states is independent of J and we get: Again, we assume a maximum choice for the time step, so As before, we conclude For very large J, the spread of eigenvalues is dominated by the spacing between the spin eigenvalues, so we get where Δ⟨S⟩ max is the difference between the lowest and highest possible spin state for that system. We get  (33) which gets tighter with decreasing J.

Analyzing All Cases Together
We summarize that the convergence improves with increasing J for J f,1 < J < J f,2 . For J > J f,τ , the convergence deteriorates with increasing J. Between J f,2 and J f,τ , there is a plateau of nearoptimal J values. If we take the derivative of the two bounds (1 − Δτ(ΔE hs + JΔ⟨S⟩)) and after J we can see that in the case of J > J f,τ the rate of change is smaller. This means that when in doubt it is better to choose J > J f,τ than to choose J <J f,2 ; thus, it is better to "overshoot". It might be tempting to equate the two upper bounds from the separate cases to determine a unique J opt . This is not a defined operation because both expressions were separately derived under the assumption of either high or low J. In the scope of one of the bounds, the other one is not valid.

Numerical Tests for J Choice
In order to numerically confirm our derivations for the choice of optimal J values and to stay as close as possible to the assumptions in our previous derivation, we also performed a fully deterministic imaginary-time evolution for the systems that were discussed in our main text. The fully deterministic propagation was achieved by defining a semi-stochastic space for FCIQMC that ranged over the whole Hilbert space. The spacing of eigenvalues for the calculation of Δτ (eq 17) was estimated from the spread of diagonal values of the Hamiltonian matrix.
In Figure 9, we see the number of iterations required to achieve convergence up to 1 × 10 −5 E h in a deterministic imaginary-time propagation for the 1 Δ g state of oxygen in the minimal (8,6) active space. If we keep the doubly logarithmic scale in mind, we see the previous derivations numerically Figure 9. Number of iterations required to achieve convergence up to 1 × 10 −5 E h in a fully deterministic imaginary-time propagation for the 1 Δ g state of oxygen in an (8,6) active space.
The Journal of Physical Chemistry A pubs.acs.org/JPCA Article confirmed. For J f,1 < J < J f,2 , the convergence improves drastically with increasing J. For J f,2 < J < J f,τ , a plateau forms, but it is still better to choose J closer to J f,τ . For J > J f,τ the convergence deteriorates with increasing J. For a stochastic imaginary-time propagation the criterion of convergence up to 1 × 10 −5 E h cannot be easily evaluated. Instead the moving average of the projected energy for three different J values is drawn in Figure 10. For this system, we had J f,2 = 0.03 E h and J f,τ ≈ 0.07 E h . The blue line describes a dynamic with J < J f,2 , and the convergence improves considerably when going to higher J values as in the orange line. For values considerably above J f,τ (green line), the quality of the calculation decreases again.