The controls on the thermal evolution of continental mountain ranges

This paper examines the controls on the thermal evolution of continental mountain ranges and investigates how pressure (P) and temperature (T) estimates from metamorphic rocks can be used to reconstruct the thermal histories of orogenic belts. We show that one‐dimensional models can approximate the temperature structure of most continental mountain belts, and find that the metamorphic conditions typical of such tectonic settings can most easily be achieved by thickening of the mid‐ to upper‐crust above a rigid lower crustal substrate. Homogeneous thickening of the entire lithosphere can only result in these conditions under unusual circumstances. This result implies that thickening above rigid lower crust has been the dominant mode of mountain‐building preserved in the rock record. In this situation, modest rates of vertically distributed internal radiogenic heating (1±0.5 μW/m3) are able to produce typical metamorphic conditions and timescales, without the requirement for an external source of heat. In a given model, peak temperature conditions for rocks originally positioned closely in depth (e.g. 5 km apart) occur at times that can be separated by over 10 Myr, highlighting the spectrum of ages and conditions that can be produced by a single geometrically simple mountain‐building event. The direction of approach in P–T space towards the highest‐T conditions experienced by a rock (i.e. the curvature of the P–T loop) is the most diagnostic feature that allows the thermal and tectonic history of a mountain belt to be estimated, highlighting the importance of determining prograde P–T paths. Combining estimates of the peak‐T conditions with multiple other sources of information (e.g. the timescales of metamorphism or the initial crustal thickness) can also allow the reconstruction of the mountain range evolution under most circumstances. Many configurations of continental mountain‐building result in rocks transiently passing through regions with low average thermal gradients (e.g. 10–15°C/km), suggesting that such gradients should not be used to infer the presence of subduction. Analysis of existing P–T loops from the 3.2 Ga metamorphism in the Barberton terrane suggests that mountain‐building processes at that time and place were similar to those in present‐day mountain ranges, implying that large regions of rigid continental lithosphere were already in existence, and were underthrusting the margins of mountain ranges.

result in these conditions under unusual circumstances. This result implies that thickening above rigid lower crust has been the dominant mode of mountain-building preserved in the rock record. In this situation, modest rates of vertically distributed internal radiogenic heating (1AE0.5 μW/m 3 ) are able to produce typical metamorphic conditions and timescales, without the requirement for an external source of heat. In a given model, peak temperature conditions for rocks originally positioned closely in depth (e.g. 5 km apart) occur at times that can be separated by over 10 Myr, highlighting the spectrum of ages and conditions that can be produced by a single geometrically simple mountain-building event. The direction of approach in P-T space towards the highest-T conditions experienced by a rock (i.e. the curvature of the P-T loop) is the most diagnostic feature that allows the thermal and tectonic history of a mountain belt to be estimated, highlighting the importance of determining prograde P-T paths. Combining estimates of the peak-T conditions with multiple other sources of information (e.g. the timescales of metamorphism or the initial crustal thickness) can also allow the reconstruction of the mountain range evolution under most circumstances. Many configurations of continental mountain-building result in rocks transiently passing through regions with low average thermal gradients (e.g. 10-15 C/km), suggesting that such gradients should not be used to infer the presence of subduction. Analysis of existing P-T loops from the 3.2 Ga metamorphism in the Barberton terrane suggests that mountain-building processes at that time and place were similar to those in present-day mountain ranges, implying that large regions of rigid continental lithosphere were already in existence, and were underthrusting the margins of mountain ranges.

K E Y W O R D S
Barrovian metamorphism, mountain ranges, P-T loops, thermal models 1 | INTRODUCTION This paper examines the controls on the thermal evolution of mountain ranges in continent-continent collision zones, with particular emphasis on understanding the controls on the characteristics recorded by suites of metamorphic rocks. An overall aim is to establish how to extract the maximum possible amount of information from these rocks regarding the behaviour and evolution of the mountain building events that caused their metamorphism. The fields of thermal modelling and thermobarometry (the thermodynamic estimation of the pressures and temperatures experienced by rocks during metamorphism) have progressed in tandem. Early studies of the thermal evolution of mountain ranges focused on one-dimensional models of thermal diffusion, vertical advection and radiogenic heating. After an initial perturbation to a geotherm (due to thickening or overthrusting), the temperature structure evolves towards a new steady-state, set by the new geometry of the lithosphere and the distributions of thermal properties and radiogenic heating (e.g. England & Richardson, 1977;Oxburgh & Turcotte, 1974). The inclusion of erosion in such models results in rocks following P-T 'loops', in which metamorphic conditions reach peaks in pressure (P) and temperature (T), before decreasing ( Figure 1). P-T loops can be either clockwise or anti-clockwise, depending on the relative importance and timing of crustal thickening, erosion and heat movement by igneous intrusion (e.g. Harley, 1989). A key concept resulting from these early studies was that of a 'metamorphic field gradient' (also known as a 'P-T array', 'metamorphic geotherm' or 'piezothermic array'), which represents a line in P-T space joining the highest-temperature points on the P-T loops experienced by rocks undergoing a common metamorphic event but with different starting depths (Figure 1) (e.g. Richardson, 1992;Spear, 1993;Todd & Engi, 1997). Because preserved metamorphic assemblages tend to relate to the peaktemperature conditions the rocks have experienced, owing to dehydration reactions for most lithologies having steep lines in P-T space (Guiraud & Powell, 2001), the field gradient is often one of the main pieces of information that can be obtained from exhumed metamorphic belts.
Ongoing developments in obtaining estimates of the P-T conditions rocks have experienced (e.g. Green et al., 2016;Powell & Holland, 2008a;White et al., 2014) and the timings and duration of metamorphic events (e.g. Kylander- Clark et al., 2013;Taylor et al., 2016) have occurred at the same time as dramatic increases in computing power. The combination of these developments has resulted in suites of numerical models that examine the thermal evolution of mountain ranges in two or three dimensions, sometimes dynamically coupled to the deformation of the ranges (e.g. Batt & Braun, 1997;Bollinger et al., 2006;Craig et al., 2020;Herman et al., 2010;Huerta et al., 1999;Jamieson et al., 2002;Jamieson & Beaumont, 2011;Ruppel & Hodges, 1994;Sonder & Chamberlain, 1992). Such models have the ability to simulate entire orogenic cycles, to provide insights into the important coupling between the thermal and mechanical structures of mountain ranges, and to investigate the controls on the resulting dynamics of the deformation. Nevertheless, one-dimensional models have also continued to be used (e.g. Clark et al., 2015;Clark et al., 2011;Depine et al., 2008;Jackson et al., 2008;McKenzie et al., 2019;Smye et al., 2011;Sandiford & McLaren, 2002). This range of approaches reflects the different strengths and weaknesses of the one-dimensional and higherdimensional approaches, and our understanding of the thermal evolution of the lithosphere benefits from synthesizing the results of both styles of model. Two-and three-dimensional models are able to make use of the spatial patterns of lithologies and deformation, and are able to model all tectonic settings, regardless of the rates and geometries of motion. As such, effects including rapid horizontal advection can be included in the models. However, the computational time required is greater than for one-dimensional models, and a greater number of model parameters are required. Additionally, coupling such thermal models to dynamic deformation models includes an additional computational cost. It is therefore possible to understand in detail the behaviour of a subset of combinations of model parameters, but not to investigate all possible combinations of geologically feasible input parameters. One-dimensional models are therefore used when the aim is to investigate the effects of large ranges of parameter combinations (as is the case here), in model settings where lateral motions are less important than vertical diffusion and vertical motions in controlling the temperature (discussed in detail below), or in cases where models with small numbers of adjustable parameters are desired in order to isolate particular aspects of the model behaviour. In this paper, we seek to explore a wide range of the geologically plausible parameter space and so make use of one-dimensional models. We examine below which mountain belts these models can be applied to, which is most (but not all) of the configurations seen at the present day.
Coupled to observations of metamorphic rocks, deformation structures, and low-temperature thermochronology, a number of important concepts and debates have emerged from previous thermal models. For example, a long-running research theme has revolved around efforts to constrain the rate and distribution of radiogenic heating in the lithosphere, particularly in the crust, with implications for what sources of heat drive metamorphism and melting (e.g. Sandiford & McLaren, 2002). Some models suggest that the tectonic redistribution of radiogenic elements, for example in accretionary wedges, is required in order to reproduce the observed characteristics of metamorphism in mountain belts (e.g. Jamieson et al., 1998). A second class of models emphasizes the convective removal of the lower lithosphere as a means of heat input to drive crustal metamorphism (e.g. Houseman et al., 1981;Ueda et al., 2012). However, other models are able to match a range of observational constraints with modest rates of heat flux from the convecting mantle, and crustal radioactivity playing the dominant role in increasing the temperature within mountain belts (e.g. Craig et al., 2020;. A second prominent open question regarding the thermal evolution of mountain belts concerns the relative importance of vertically coherent thickening of the entire crust or lithosphere (e.g. Chang et al., 2017;England & McKenzie, 1982), and the underthrusting of relatively rigid foreland material (e.g. Argand, 1924;Barazangi & Ni, 1982;Molnar & Lyon-Caen, 1988;Nabelek et al., 2009), which is a balance that probably varies between mountain belts, and even between different parts of the same range (e.g. Flesch F I G U R E 1 (a) The two styles of thickening discussed in this paper: homogeneous thickening of the entire lithosphere (left) and thickening of the upper-to mid-crust above rigid lower crust (right). (b) Example P-T paths, showing the style of loops resulting in different relative magnitudes of ΔT and ΔZ, as defined on the upper right panel. ΔT is the temperature difference between the maximum-pressure and maximum-temperature points of each P-T loop, and ΔZ is the depth difference between these same points. The lower left panel shows the metamorphic field gradient, as defined by the maximum temperature conditions of a series of samples from the same exhumed mountain belt. The centre panel illustrates where the displayed P-T loops plot in ΔZ-ΔT space. The P-T trajectories are commonly known as (clockwise from upper left): isothermal decompression; open P-T loops; isobaric heating; and hairpin P-T loops Penney & Copley, 2021). These different mechanisms of thickening predict, and possibly depend upon, significantly different patterns of thermal evolution, as discussed below. Finally, significant recent attention has focussed on the links between earthquake locations and depths, temperature, the past history of partial melting and metasomatism, and continental rheology and deformation (e.g. Austrheim et al., 1997;Burov, 2010;Copley et al., 2021;Jackson et al., 2021;Menegon et al., 2021;. This debate has partially been based upon, and also motivates, renewed efforts to explore the thermal evolution of the continents, and the causes and rheological consequences of the evolution of metamorphic facies. Some aspects of the debates described above require additional observations in order to address. However, in some respects, we can make progress using the thermal models presented in this paper. We use one-dimensional models, which we show below to be applicable to the majority of mountain belts, and make use of the computing power now available to perform tens of thousands of model runs, varying all input parameters within geologically reasonable bounds. The wide sweep through the parameter space that we use (which is similar in approach to studies that have been undertaken in extensional settings (e.g. Hansen & Nielsen, 2002;Sandiford, 2003) allows us to develop new conclusions regarding the thermal evolution of mountain ranges, and the diagnostic features in the metamorphic record that shed light on key aspects of the evolving temperature structure. In particular, (1) we show that the curvature of P-T loops allows us to place direct constraints on the rates of radiogenic heating, thickening, and erosion, and the initial crustal thickness, which are some of the main uncertainties in models of mountain-building; (2) we describe the characteristics that can be used to infer whether past mountain-building events involved the vertically coherent thickening of the entire crust or the underthrusting of foreland material; (3) we explore the range of average geothermal gradients that are transiently sampled by rocks in evolving mountain belts, thereby demonstrating under what circumstances such gradients can be used to infer the presence of subduction; and (4) we show that these one-dimensional models can drastically narrow down the parameter space that would need to be searched by future higher-dimensional models of a region, thereby allowing such models to more efficiently search only the relevant parts of the parameter space. In tandem, we also show the trade-offs between often poorly known model parameters (e.g. the initial crust and lithosphere thickness, and the rate of radiogenic heating), which should be carefully considered when interpreting the results of previous higher-dimensional models that were not able to explore full ranges in these parameters due to computational constraints.
We begin by describing the methodology of our models and under what conditions our one-dimensional configuration is applicable. We then present the model results and describe the dominant controls on the resulting P-T loops. We then apply our model results to documented suites of metamorphic rocks for which only fragments of the P-T loops have been determined. We demonstrate these fragmentary records can be used to infer the history of mountain building in these regions, producing results in agreement with independent constraints. Finally, we discuss the wider implications of our results for understanding past and present mountain building events using analyses of metamorphic rocks.

| Applicability of calculating temperatures in one dimension
One-dimensional thermal models are not applicable to all mountain ranges. Modelling a vertical column through the lithosphere allows us to model vertical diffusion and the vertical motion of the rocks (which causes advection of the temperature field). Our models are inappropriate for cases where horizontal advection or diffusion controls the temperature structure. The absolute rate of horizontal motion of the entire lithosphere does not affect the temperature field, which is why the cooling of the oceanic lithosphere can be calculated taking into account only vertical diffusion (e.g. Parsons & Sclater, 1978). However, the relative horizontal motion of a depth interval within the lithosphere relative to the rest of a vertical column (i.e. vertical gradients of horizontal velocity) does reduce the applicability of one-dimensional models. This effect arises because the rocks from a neighbouring region are transported into part of the vertical column being considered, bringing their heat with them. A common example of where this process occurs is where subduction advects cool lithosphere into the convecting mantle.
We here outline the situations in which the thermal evolution of a mountain range can be approximated using a one-dimensional model. The relative importance of advection and diffusion of heat is given by the Peclet number, Pe ¼ lu=κ, where l is the characteristic lengthscale, u is the velocity and κ is the thermal diffusivity. However, we cannot directly apply this formulation to mountain ranges in order to establish the relative importance of advection and diffusion, because there are two lengthscales involved: the vertical lengthscale that governs diffusion, and the horizontal lengthscale, coupled with the rate of motion, that governs the advection of heat by the underthrusting of foreland lithosphere on the edges of mountain belts. We therefore proceed by examining the characteristic timescale for each process. For vertical diffusion through a slab of material, the characteristic timescale is given by τ d ¼ l 2 v =π 2 κ (Carslaw & Jaeger, 1959;Molnar & England, 1990;Parsons & Sclater, 1978), where l v is the vertical lengthscale. The timescale for horizontal advection is given by τ a ¼ l h =u, where l h is the horizontal lengthscale. By taking the ratio of these timescales, we can define R ¼ τ d =τ a ¼ l 2 v u=π 2 κl h , where larger values correspond to increasing importance of horizontal advection over vertical diffusion.
The relevant vertical lengthscale is the thickness of the crust, as recent studies have demonstrated that radiogenic heating in, and diffusion through, the crust dominates the diffusive behaviour in mountain belts, and much of the lithospheric mantle changes temperature only modestly during mountain-building (e.g. Craig et al., 2020;McKenzie & Priestley, 2016). If the lithospheric mantle is removed during mountain-building, then a second vertical lengthscale will be introduced into F I G U R E 2 The ratio of diffusive and advective timescales (as described in the text), as a function of crustal thickness and underthrusting velocity, for values of l h of (a) 100 km and (b) 300 km. For values less than one (blue), the diffusive timescale is shorter, and for values greater than one (red), the advective timescale is shorter. Bold contours show powers of 10. Mountain belts that are part of the Himalaya-Tibet-Pamir system are shown as italic labels, and those in the rest of the world are shown as bold labels. In cases where the crustal thickness increases over a horizontal distance greater than 100 km from the range-front, the thickness used on the l h = 100 km graph is that at 100 km from the range-front. In other cases, and on the l h = 300 km graph, the crustal thicknesses used are the maximum observed values. The mountain ranges are (1) northwest Zagros (Khorrami et al., 2019;Motaghi et al., 2017); (2) southeast Zagros (Khorrami et al., 2019;Paul et al., 2006); (3) eastern Greater Caucasus (Gok et al., 2011;Khorrami et al., 2019); (4) western Greater Caucasus (Gok et al., 2011;Khorrami et al., 2019); (5) Kopeh Dagh (Khorrami et al., 2019;Motaghi et al., 2012); (6) Alborz (Khorrami et al., 2019;Radjaee et al., 2010); (7) Peruvian Sub-Andes (Ryan et al., 2016;Villegas-Lanza et al., 2016); (8) Bolivian Sub-Andes (Ryan et al., 2016;Weiss et al., 2016); (9) Columbian Eastern Cordillera (Mora-Paez et al., 2019;Poveda et al., 2015); (10) The Dinarides (Metois et al., 2015;Stipcevic et al., 2020); (11) western Tien Shan (Vinnik et al., 2004;Zubovich et al., 2010); (12) eastern Tien Shan (Li et al., 2020;Lu et al., 2019;Zheng et al., 2017); (13) Longmen Shan (Tian et al., 2020;Zheng et al., 2017);(14) northeast Pamir (Li et al., 2012;Schneider et al., 2019); (15) western Pamir/ Hindu Kush (Ischuk et al., 2013;Schneider et al., 2019); (16) northern Tibet (Tarim margin) Zheng et al., 2017); (17) Qilian Shan (Tian et al., 2014;Zhang et al., 2004); (18) Nepal Himalaya (Priestley et al., 2019;Stevens & Avouac, 2015); (19) Garwal-Kumoan Himalaya (Priestley et al., 2019;Stevens & Avouac, 2015); (20) Pakistan Himalaya (Priestley et al., 2019;Stevens & Avouac, 2015) the formulation, which is the new thickness of the lithosphere. However, this lengthscale is larger than the crustal thickness, and this geological process is itself now not thought to be widespread or significant following the seismological imagining of thick lithosphere beneath the world's mountain belts, and petrological estimates of it having a low enough density to resist convective removal (e.g. McKenzie & Priestley, 2016). The relevant horizontal lengthscale is the distance into the mountain range that rocks are advected by underthrusting, before being exhumed (illustrated on Figure 2). The relevant velocity is the relative horizontal velocity between the underthrusting lithosphere and the surface of the over-riding mountain belt. Care must be taken not to confuse this value with the overall convergence rate between the plates bounding a deformation zone. One of the hallmarks of continental tectonics is that the deformation is spatially distributed, which means that the overall convergence between bounding plates is distributed between numerous active deformation belts (e.g. England & Jackson, 1989;Molnar & Tapponnier, 1975). For example, of the India-Asia convergence, roughly half is accommodated by the Himalaya, and the remainder in a series of mountain belts (e.g. the Tien Shan, the Mongolian Altai) distributed over thousands of kilometres north of the Himalaya (e.g. Calais et al., 2003;England & Molnar, 1997b;Stevens & Avouac, 2015;Zubovich et al., 2010). Similarly, at the longitude of central Iran, the overall Arabia-Eurasia convergence is distributed between the Zagros Mountains, the Alborz mountains and underthrusting beneath the Apsheron-Balkhan sill in the central Caspian (e.g. Allen et al., 2004;Jackson et al., 2002;Vernant et al., 2004). The relevant velocity to use when assessing the importance of horizontal advection is therefore the underthrusting rate on the margin of the mountain belt in question, rather than the overall convergence rate between the plates bounding the entire distributed collision zone.
The value of R is shown on Figure 2, on graphs of crustal thickness against the velocity of underthrusting. A value of 1Â10 À6 m 2 /s was used for the thermal diffusivity (κ), which represents an average of the temperature-dependent values over the relevant temperature range (Miao et al., 2014;McKenzie et al., 2005;Whittington et al., 2009), although the full temperaturedependence of this parameter is used in the calculations described below. Two graphs are shown, which represent different horizontal lengthscales (l h ), the relevant one of which depends on the mountain belt being considered. For example, in some mountain belts, distances of 100 km from the range-front represent the down-dip limit of brittle faulting on range-bounding thrusts, so the temperatures are in the range 300-400 C, the deformation is dominantly brittle, and no mid-to highgrade metamorphism is occurring (e.g. Bollinger et al., 2006;Herman et al., 2010). In such cases, any mid-to high-grade metamorphic rocks now exposed near the range-front must be sourced from significantly greater horizontal distances within the range, and the graph for l h = 300 km is relevant. However, in other mountain belts, high temperatures can be achieved much closer to the range-front (e.g. Sandiford, 2002), and a value of l h of 100 km is more appropriate. For this reason, graphs for l h of 100 and 300 km are shown. Note that this parameter is different to the distance between mapped metamorphic rocks and the adjacent range-front thrust, because l h represents the distance of underthursting required to reach the furthest location sampled within the mountain belt, before subsequent exhumation towards the range-front (as illustrated on Figure 2).
In the regions of Figure 2 shown in blue, the timescale for vertical diffusion is shorter than that for horizontal advection, meaning that one-dimensional models provide an acceptable approximation to the temperature field. Although such models may not be correct in fine detail, vertical diffusion has a dominant enough effect that the principles underlying the thermal evolution can be understood. This applicability is reduced if the crustal thickness is high (which increases the diffusive timescale) or if the rate of underthrusting is high (which reduces the advective timescale). Also shown on Figure 2 are the parameters of 20 present-day mountain belts. In cases where the crustal thickness increases over a horizontal distance greater than 100 km from the range-front, the thickness used on the l h = 100 km graph is that at 100 km from the range-front. In other cases, and on the l h = 300 km graph, the crustal thicknesses used are the maximum observed values. Italic labels show individual mountain belts that are part of the Himalaya-Tibet-Pamir system, and bold labels show a variety of the Earth's other mountain belts (with the key provided in the figure caption). This figure demonstrates that for the parameters relevant to the great majority of present-day mountain belts, one-dimensional models are an acceptable approximation to the temperature structure. The exceptions are in some parts (but not all) of the Himalaya-Tibet-Pamir system, and close to the rangefront (i.e. l h =100 km) in the western Tien Shan. The crustal thickness and rates of motion in ancient mountain belts are less well known, but we expect our onedimensional models to also be applicable in the majority of these mountain belts (e.g. Sandiford, 2002). However, we emphasize that our results should not be applied to the subset of the fastest and thickest mountain belts, such as the central Himalaya.

| Model geometry and parameters
We model advection and diffusion in vertical columns through the lithosphere, and radiogenic heating. Diffusion and radiogenic heating, with temperature-dependent thermal parameters, are given by where T is temperature, t is time, ρ is density, C is heat capacity, k is the thermal conductivity and H is radiogenic heating. The advective component of the thermal evolution is given by where v is the vertical velocity (which incorporates the effects of both thickening and erosion). Diffusion, advection and radiogenic heating operate simultaneously, and we solve these coupled equations using operator splitting methods, as described below. We use temperature-dependent thermal conductivity and heat capacity, which has an important effect on the resulting modelled temperatures (e.g. McKenzie et al., 2005;Whittington et al., 2009). For the crust, we use the granodiorite values of Miao et al. (2014), and for the mantle, the olivine values from Hofmeister (1999) (using the paramaterization of McKenzie et al., 2005). These temperature-dependent thermal parameters are now well enough known that the associated uncertainties in the resulting thermal evolution of the lithosphere are small when compared to the uncertainties associated with other model parameters (e.g. radiogenic heating, the initial geometry and the evolution of thickening and erosion). The upper surface of the model is held at a temperature of 0 C, and the base of the lithosphere at the isentropic temperature for that depth, calculated using a mantle potential temperature of 1315 C. The initial temperature distribution is set by the initial crust and lithosphere thicknesses and the distribution of radiogenic heating. Lithosphere in this configuration is allowed to transfer heat by diffusion for 1 Ga in order to attain thermal steady-state, before thickening begins. We use operator-splitting to separately calculate the effects of diffusion and advection (e.g. Emmerson & McKenzie, 2007;Press et al., 1992;Weller et al., 2019). The diffusive term is solved using a Crank-Nicholson (joint implicit-explicit) finite difference scheme, and the advective term by upwind differencing (e.g. Press et al., 1992). The temperature dependence of the thermal conductivity and heat capacity are incorporated into the models by iteratively looping over each time step until the initial and final temperature distributions for that time-step are consistent with the thermal parameters calculated for the start and end of the time-step.
We model two different styles of thickening of the lithosphere: the homogeneous thickening of the entire lithosphere and the thickening of the upper crust above a rigid lower crustal substrate (Figure 1a). The second configuration is similar to that on the margins of many present-day mountain ranges, where thickening of the upper crust occurs above rigid underthrusting foreland lithosphere (e.g. Craig et al., 2012;Lamb et al., 1997;Ni & Barazangi, 1984;Nissen et al., 2011). The presence of rigid lower crust in such settings is thought to be due to the presence of pre-existing anhydrous lithologies, based on the analysis of earthquake depths, elastic thickness estimates, thermal structures, and the compositions of igneous rocks (e.g. Jackson et al., 2008). In both models, the thickening is achieved by vertically stretching the depth interval that is straining, equivalent to vertical planes thickening by pure shear. Relative to the surface, there is therefore a linear velocity gradient to the base of the thickening layer (either the upper/mid crust, or the entire lithosphere, depending on the model set-up). For thickening above rigid lower crust, the velocity relative to the surface is therefore constant in all material beneath the base of the layer undergoing strain. This style of thickening is directly applicable to folding or spatially continuous ductile strain, and is an acceptable approximation to thickening by the addition of distinct thrust sheets provided that those thrust sheets are thin enough that their diffusive thermal time constant is small compared with the duration of the modelled mountain building. As an example, a 15 km-thick thrust sheet has a diffusive thermal time constant of less than 1 Myr, suggesting that this approximation is valid, given the commonly observed range of thrust sheet thicknesses in regions where that style of shortening occurs (e.g. DeCelles et al., 2001;Searle et al., 2019).
The geometry, rates and timescales of the mountain building and erosion are varied in our models, and Table 1 shows the parameters, ranges and sampling increments we have used. We explore models where the initial crust and lithosphere thicknesses vary in the range 20-50 km and 100-250 km, respectively, in order to span the dominant range of these parameters on the Earth at the present day. This range includes continental margins but excludes regions where thickening is currently underway, as these parameters reflect the situation before thickening began. We do not distinguish between crystalline basement and sedimentary cover, all of which is considered as part of the crustal thickness. This decision is based upon the lengthscale of sedimentary compaction being short enough (e.g. <5 km; Athy, 1930;Hoggard et al., 2016) that in all except the very shallow sedimentary section the rocks have similar thermal parameters to the crystalline basement (e.g. Copley et al., 2009). Additionally, the measured rates of radiogenic heating in sedimentary rocks cover the same range as those from igneous and metamorphic rocks (e.g. Hasterok et al., 2018).
We use a rate of radiogenic heating that is the same throughout the crust, including in the rigid lower crustal substrate in models where it is present (discussed further in Section 2.3) and which is varied in the range 0.5-3.0 μW/m 3 in order to encompass the majority of measurements from crustal rocks (Hasterok et al., 2018). We use no radiogenic heating in the mantle, because this value is poorly known, thought to be at least an order of magnitude lower than the rate of radiogenic heating in the continental crust, and geotherms calculated with no radiogenic heating in the mantle provide a good match to xenolith P-T estimates (e.g. Rudnick et al., 1998;Rudnick & Nyblade, 1999). The duration of thickening is varied between 5 and 55 Myr, which spans the likely duration of thickening at any given point within an orogenic system (e.g. Weller et al., 2021). Individual mountain ranges may persist for longer than this time period, but thickening at any given point within them is unlikely to occur for longer than this value, before thickening then migrates to the edges of the growing range. The vertical thickening rate is varied between 0.5 and 4.5 mm/year for thickening above rigid lower crust. For homogeneous thickening of the entire lithosphere, we use values of 5-30 mm/year, which corresponds to rates of thickening in the crustal portion of the lithosphere of 0.7-6 mm/year for cases where the crust occupies 15-25% of the total initial thickness of the lithosphere. Erosion is modelled as being zero for crustal thicknesses less than 40 km (i.e. the surface is likely to be topographically low or submarine) and is modelled as being linearly dependent on crustal thickness (and so elevation, due to isostatic compensation) for thicknesses above 40 km. We use a relationship between erosion rate and thickness given by E ¼ CðZ c À 40Þ=10 12 , where E is the erosion rate in m/s, Z c is the crustal thickness in km, C is the erosion coefficient, and the factor of 10 12 is present to result in rates of erosion on the order of millimetres per year. We use values of C of 0.5-4.5/s, corresponding to erosion rates for 80 km thick crust of 0.5-5.7 mm/year, spanning estimates from a variety of present-day mountain ranges (e.g. Montgomery & Brandon, 2002). Changing the erosion law to depend on the square or exponential of the thickness would have the effect of steepening the high- Surface temperature 0 C Mantle potential temperature 1315 C pressure sections of our calculated P-T loops and decreasing the gradient of the low-pressure sections. We note that different parts of a mountain belt may have different erosion coefficients, for example if parts of the range are drained by rivers connected to the lowlands and parts are an internally drained plateau. For models of thickening above rigid lower crust, based on observations from present-day mountain ranges, we vary the initial depth to the top of the rigid crust between 5 and 20 km (e.g. Betka et al., 2018;Devlin et al., 2012;Elliott et al., 2016Nissen et al., 2011Scharer et al., 2004;Triep et al., 1995). The depth to the top of the rigid part of the crust increases through time due to the thickening in the overlying material, and decreases due to erosion of the overburden. We run the models for a total of 150 Myr, and thickening begins at the start of the model run, so there is sufficient time for erosion and cooling. In our models, during thickening, we limit the maximum depth of the base of the lithosphere to be 280 km. This value was chosen based upon seismological observations that indicate the thickest lithosphere at the present day is 250-280 km, beneath the Tibetan Plateau (Priestley & McKenzie, 2013). Following erosion of the thickened part of the crust from Tibet, the remaining lithosphere thickness would be similar to the $180-220 km observed in cratonic areas of the continental interiors that have been tectonically stable since formation (Priestley & McKenzie, 2013). Taken together, these observations imply that there may be a maximum thickness of lithosphere that can be attained before the base is removed, possibly by small-scale convection or shear erosion of (potentially metasomatized) material on the base and margins of lithospheric roots (e.g. Cooper & Conrad, 2009;Griffin et al., 2009;Lenardic & Moresi, 1999). Therefore, in our models, if the lithosphere thickens beyond 280 km, at each timestep we remove the base and apply the isentropic temperature at that depth. The time constant for diffusion through the entire lithosphere of such thicknesses is long enough that this measure has a limited effect on the temperatures in the crust, but is included here for completeness.
As kinematic models of mountain building, our results express temperature as a function of depth and time. Confining pressures are much greater than deviatoric stresses (e.g. Copley, 2018;England & Molnar, 1991;Flesch et al., 2001) meaning that we can simply convert from depth to pressure, for comparison with metamorphic pressure estimates. In the following text, we therefore treat depth and pressure as equivalent and convert between them using a crustal density of 2800 kg/m 3 and a mantle density of 3300 kg/m 3 . In all models, the reference frame is fixed to the Earth's surface, so thickening results in downwards advection of rocks, and erosion results in upwards advection.

| Effects not included in the model
We use a constant rate of radiogenic heating throughout the crust, which we do not re-distribute if the temperature exceeds the solidus (e.g. as done by McKenzie & Priestley, 2016). This approach is taken because of recent results suggesting that partial melting has a limited effect on the rates of radiogenic heating in the partitioned melt and residue, due to the growth and/or retention of relatively high-Th accessory minerals in the residue compensating for the removal of other heat-producing elements during anatexis and melt migration (Alessio et al., 2018;Weller et al., 2020;Yakymchuk & Brown, 2019). This result is consistent with the lack of observed correlation between metamorphic grade and rate of radiogenic heating in global compilations (Hasterok et al., 2018). However, the presence of high rates of radiogenic heating in some (but not all) granites (Hasterok et al., 2018) implies that in some cases, a mechanism exists to concentrate the heat-producing elements. Our assumption of constant radiogenic heating throughout the crust means that the values we discuss below represent averages over the thickness of the crust. We note that our assumption contrasts with estimates of the average composition of stratified continental crust (e.g. Rudnick & Gao, 2014;Hacker et al., 2015). It is an open question how to reconcile these models and the observations of limited differences in radiogenic heating between sub-and suprasolidus rocks (Alessio et al., 2018;Bea & Montero, 1999;Weller et al., 2020;Yakymchuk & Brown, 2019). We therefore take the simple approach of assuming constant radiogenic heating throughout the crust. Once we have discussed below the behaviour of the models, we will then return to this point and discuss the effects of redistributing radiogenic heating within the crust, which we show does not significantly affect our results.
We also neglect the effects of shear heating. If differential stresses are large, then at geological strain rates, shear heating can be an important factor in the thermal evolution of a mountain belt (e.g. Burg & Gerya, 2005;Wang et al., 2013). However, a range of observational and modelling studies have found that differential stresses in both the brittle and ductile parts of the lithosphere are significantly lower than predicated by laboratory estimates, and in deformation belts are thought to be dominantly ≤20 MPa, with an upper bound of ≤50 MPa. Such values are required by (1) seismological and geodetic analyses of earthquakes, interseismic and postseismic deformation (e.g. Bollinger et al., 2004;Copley & Jolivet, 2016;Kanamori & Anderson, 1975); (2) the geometry, evolution and thermal effects of fault systems (e.g. Copley, 2018;Herman et al., 2010;Lachenbruch & Sass, 1980); and (3) the deformation that occurs in response to the forces that drive the relative motions of the tectonic plates, and that arise from the lateral gradients in gravitational potential energy that result from crustal thickness contrasts (e.g. Copley & McKenzie, 2007;Dalmayrac & Molnar, 1981;England & Molnar, 1997a;Flesch et al., 2001). For a strain rate of 10 À14 /s, which is as large as is likely to be routinely achieved throughout significant volumes of rock (Fagereng & Biggs, 2019), such differential stresses would result in rates of shear heating of order 10 À7 W/m 3 (as dissipation in the viscous regime is governed by the product of the deviatoric stress and the strain rate). This heating rate is significantly less than that due to radiogenic heating, suggesting the effect can be neglected in our calculations.
We have not included the effects of melt moving heat with it through the crust. This assumption means we limit our analysis to regions with little large-scale plutonism (see Wells, 1980, for an example of the opposite situation, where intrusion dominates). This approach is supported by the limited size of contact metamorphic aureoles surrounding large intrusions, which are generally thinner than the widths of the associated intrusions (e.g Ashworth & Tyler, 1983;Beyssac et al., 2019;Tilley, 1924). This observation implies that the thermal imprint of heat advection by magma only affects a small proportion of the crust, and therefore does not dominate the temperature field except in regions containing multiple large, closely spaced, intrusions, where our models should not be used. The effects of heat transport by magma could be included in future models, once the dynamics of melt formation, movement and emplacement are better understood.
We also do not include the thermal effects of partial melting, expressed as the latent heat of fusion. For typical values of this parameter in crustal rocks (e.g. 300-400 kJ/kg; Clark et al., 2011;Schorn et al., 2018), and the melting of fertile lithologies by large amounts (e.g. 30 vol. %), with a heat capacity of 1200 J/K kg (e.g. Miao et al., 2014), then the resulting temperature perturbation compared to models that do not include this effect will be $100 C. This rough calculation is confirmed by the results of Clark et al. (2011) and Schorn et al. (2018), who both demonstrated that heating to temperatures of over 1000 C resulted in temperature differences of 50 C (Clark et al., 2011) to 100 C (Schorn et al., 2018) between models that did and did not consider latent heat. For crustal sections in which there is a limited volume of rock able to melt by tens of percent at low enough temperatures (Schorn et al. (2018) used a pure pelite composition), this effect will be smaller. We therefore do not include this effect in the models. However, it should be noted that in the hottest models included here, if the crust is dominantly pelitic, then lower crustal temperatures could be up to 100 C lower than we model. Additionally, the rates of heating would be lower than modelled here, in proportion to the different maximum temperatures attained.
Finally, we do not vary through time the rate of thickening in our models, and use a constant rate for the entire duration of mountain building. This decision is due to the variable extent to which changes in the motions of bounding plates occur during mountainbuilding events (e.g. Copley et al., 2010;McQuarrie et al., 2003;Patriat & Achache, 1984), and uncertainties over the ways any changes are accommodated in laterally distributed deformation zones.

| MODEL RESULTS
Of the models generated by all the parameter combinations given in Table 1, we examine in detail those models that result in the conditions typical of exhumed continental mountain belts (sometimes referred to as 'Barrovian' or 'regional' metamorphism). These conditions typically result in a sequence of commonly observed mineral zones, for example in metapelites the successive appearance of chlorite, biotite, garnet, staurolite, kyanite and sillimanite. Such assemblages are commonly observed in regions of past continental collision (e.g. Barrow, 1912;Stipska & Powell, 2005;Moynihan & Pattison, 2013) and hereafter referred to as the 'Barrovian P-T range'. For a model to be considered a successful reproduction of these conditions, the high-temperature point (which is generally the preserved peak assemblage, because dehydration reactions in most lithologies have steep trajectories in P-T space, Guiraud & Powell, 2001) for at least one level within the crust must fall within the range shown by the grey dashed polygon in Figure 3a, corresponding to 450-950 C and 3-12 kbar ($11-45 km depth). These conditions were chosen to be a broad representation of typical conditions experienced by regions of past mountain building. The lower pressure limit is chosen to be close to the upper pressure limit of andalusite stability, and the upper pressure limit to be close to the maximum pressure of staurolite stability in typical pelitic assemblages. The temperature range spans from the greenschist facies to the granulite facies. Such conditions can also be achieved in extensional and back-arc settings (e.g. Currie & Hyndman, 2006;Sandiford & Powell, 1986), but this paper only considers metamorphism related to crustal thickening. Structural observations and the compositions of any associated igneous rocks can be used to distinguish between these settings.
We reject models in which the crust gets unrealistically hot, which we take to mean hotter than 1100 C, but enforce that some part of the crust must reach at least 450 C. The upper temperature limit is based upon observations from exhumed crustal sections, and xenoliths from active mountain belts, that show such temperatures are attained at the present day and in the geological past (e.g. Hacker et al., 2000;Harley, 2004;Hokada, 2001). Although such high temperatures are rarely documented, we include them in order to sample the full range of models that are compatible with known conditions, and to account for the potential sampling bias related to the limited surface exposures of rocks from the lowermost crust of orogenic belts. Successful models are considered to have attained a crustal thickness of at least 45 km, but no more than 90 km (as greater values would be unsupportable by the strength of the lowlands bordering mountain ranges; Dalmayrac & Molnar, 1981;Copley, Avouac, Hollingsworth, et al., 2011). We specify that points within the Barrovian P-T range must have increased in temperature by at least 200 C, and in depth by 5 km, in order to select models in which there has been appreciable prograde metamorphism. We also specify that the metamorphic field gradient in successful models must lie in the range 0-45 C/km for rocks within the P-T window defined above, based on observations from exhumed mountain ranges (e.g. Kohn, 2014a;Todd & Engi, 1997).
3.1 | Homogeneous thickening of the entire lithosphere 3.1.1 | Tectonic settings corresponding to the model Homogeneous thickening of the entire lithosphere requires an absence of significant spatial contrasts in rheology, so that there are negligible vertical gradients in horizontal velocity produced in response to the compressive forces driving the mountain building and the resulting gravitational potential energy contrasts (equivalent to the 'thin viscous sheet' model of England & McKenzie, 1982). Situations such as the underthrusting of relatively rigid foreland material, or a weak décollement horizon, would result in the concentration of thickening in certain depth ranges, as considered below, rather than the homogeneous thickening modelled in this case. In the geological record, homogeneous thickening would be preserved as steeply dipping foliations, unless overprinted by later post-thickening fabrics (e.g. Sandiford, 1989). At the present day, a location where this style of deformation may be occurring is in the Quidam Basin and Qilian Shan on the northeastern margin of the Tibetan Plateau, where spatially widespread thrust faulting occurs in regions distant from any clear signs of underthrusting of rigid lithosphere. Northern Tibet (north of the underthrusting nose of rigid Indian lithosphere) and the Central Altiplano in the Andes may have undergone this style of deformation in the past (e.g. Lamb et al., 1997;Yin & Harrison, 2000), before thickening migrated onto the propagating margins of the mountain ranges.
3.1.2 | Homogeneous thickening: Successful parameter ranges For the case of homogeneous thickening of the entire lithosphere, a small proportion of the tested parameter combinations (596 of 17,280 models, corresponding to 3.4%) are consistent with the Barrovian metamorphic conditions described above. The maximum depths experienced by the modelled rocks depend on the imposed geometry of shortening (i.e. the thicknesses and rates of motion), in competition with the rate of erosion. The peak temperature depends upon the rate of radiogenic heating, the total thickness of the crust, and the timescale over which the crust is thick. Figure 3 shows an example model result: Figure 3a illustrates eight P-T loops corresponding to rocks initially located at 2.5 km depth intervals throughout the 20 km thick crust, and Figure 3b shows the evolving geotherm at selected time intervals (with temperature in the crust shown by thick lines and in the mantle by thin lines). There is a continuum of conditions between these snapshots in time and between the spatial snapshots shown by the P-T loops of the selected rocks. As the lithosphere thickens, isotherms are advected downwards relative to the surface. In Figure 3, the temperatures at a given depth are initially cooler than the starting geotherm, due to this downwards advection relative to the surface. Temperatures then increase to be warmer than the initial geotherm due to radiogenic heating in the thickened crust. After the end of thickening at 35 Myr, erosion results in the rocks being advected towards the surface (at progressively decreasing rates, due to the form of the erosion law used in the models). Heating continues after thickening is complete and lasts until erosion has sufficiently reduced the crustal thickness and therefore reduced the amount of radiogenic heating. After 150 Myr, the thermal perturbation to the crust because of thickening has not equilibrated with the entire thickness of the lithosphere, and temperatures in the lower crust are greater than those in the upper mantle. The geotherm at times ≥30 Myr shows a temperature decrease with depth in the lower crust, resulting from the radiogenic heating causing the mid crust to heat up, and heat is conducted downwards into the cooler mantle (as also found by McKenzie & Priestly, 2008). Rocks that are initially positioned within 5 km of each other in depth have timings of peak temperatures that differ by tens of Myr (the coloured circles on Figure 3a are at intervals of 15 Myr), due to the style of thickening causing their depths to significantly diverge, and so the rates of heating, and cooling due to exhumation, to significantly differ. Diverse timings and grades of peak metamorphism can therefore be produced in a tectonically simple setting, without the presence of multiple tectonic events. Figure 4 shows the range of parameters that are consistent with the constraints described above, showing that some parameters are well-constrained and some not. In the following analysis, we distinguish between models in which the metamorphosed rocks did (black bars) and did not (grey bars) undergo cooling during the orogenic cycle in the model, by examining whether the points within the Barrovian P-T range decreased in temperature by at least 200 C between the peak conditions and the end of the model duration (150 Myr in all cases). The models in which this cooling does not occur are ones where the rocks experience Barrovian conditions at the lower bound of the defined pressure range, and the limited amount of subsequent erosion of the crust back to 40 km thickness does not result in significant cooling. Because of this configuration, such models dominantly start with low crustal thicknesses, enabling prograde metamorphism to occur within relatively thin crust. It is worth distinguishing between the models that do and do not involve decreases in metamorphic conditions at the end of the model run as these two scenarios represent distinct phenomena in the geological record, of rocks either undergoing an orogenic cycle of thickening and erosion, or the construction of normal-thickness crust that can then persist until affected by a subsequent tectonic event.
Models that fit the temperature constraints described above involve rates of radiogenic heating of 1.0-2.5 Â 10 À6 W/m 3 . Lower values result in too little heating to develop sufficiently high temperatures in the upper and middle crust over the range of mountain building durations considered here, and higher values result in too high temperatures and thermal gradients. The temperatures in the lower crust could potentially be lowered if significant melting occurs and the latent heat of fusion is included in the models, but this effect is likely to be less than 50-100 C (Clark et al., 2011;Schorn et al., 2018). The rate and duration of thickening have a strong control on the resulting thermal structure, and trade-off against each other, such that lower rates of thickening must occur for longer if models are to fit the constraints (i.e. a sufficient total amount of thickening must occur). The details of the trade-offs between model parameters are shown in Appendix A.
Only models with relatively low initial crustal thicknesses are able to fit the temperature constraints ( Figure 4). This effect arises because for larger initial crustal thicknesses, if the shallow rocks are to reach Barrovian conditions, the lower parts of the crust become extremely hot (i.e. hotter than the highest estimated metamorphic temperatures; Hokada, 2001). An implication of these results is that homogeneous thickening is only likely to result in Barrovian-type metamorphism if the initial pre-thickening lithosphere is a passive margin or other thinned region (such as the present-day Basin and Range province). England and Thompson (1984) also suggested that homogeneous thickening of the entire lithosphere was unlikely to result in widespread Barrovian metamorphism, and by investigating the full parameter range, we have been able to confirm their result in light of our updated knowledge of continental tectonics and the thermal parameters of the crust. We have also been able to establish that the range of resulting P-T conditions is wider than previously discussed (e.g. Sandiford & Dymoke, 1991).
A feature of Figure 3 that is shared by most homogeneous thickening models that are able to match Barrovian conditions is that large thicknesses (i.e. tens of kilometres) of the lithospheric mantle spend significant amounts of time cooler than 600 C. At these temperatures, the mantle is seismogenic if it is anhydrous (e.g. Jackson et al., 2008). However, widespread seismicity in bands tens of kilometres thick is not observed within the mantle lithosphere beneath mountain ranges. What limited deep seismicity that is present is concentrated close to the Moho in underthrusting foreland lithosphere (e.g. Craig et al., 2012), implying that homogeneous thickening of the entire lithosphere is rare at the present day, or that the mantle is always hydrated during mountain building.
3.1.3 | Homogeneous thickening: Controls on the curvature of P-T loops Figure 5 shows the P-T conditions sampled by the successful models, and also the curvature of the model P-T loops. This curvature is expressed as ΔT and ΔZ, where ΔT is the temperature difference between the maximumpressure and maximum-temperature points of each P-T loop, and ΔZ is the depth difference between these same points (as illustrated on Figure 1b). Figure 5 only includes models in which the rocks that sample Barrovian conditions also experienced cooling (black bars F I G U R E 4 The range of model parameters that result in matches to the Barrovian P-T constraints (dashed box on Figure 3a), for the case of homogeneous thickening of the entire lithosphere. The input parameters span the full width of the histograms. The grey areas of the bars show models where Barrovian metamorphism occurs but less than 200 C subsequent cooling, because the crust does not reach a great enough thickness to induce significant erosion (as described in Section 3) on Figure 4). Rocks that begin at deeper levels in the prethickening lithosphere, in both absolute terms (Figure 5a), and as a proportion of the overall crustal thickness (Figure 5d) have more curved P-T loops (i.e. greater ΔZ and ΔT). This situation is as expected from simple geometrical arguments, because deeper F I G U R E 5 Curvature of P-T loops (expressed as ΔT and ΔZ) for the case of homogeneous thickening of the entire lithosphere, as a function of model parameters. ΔT is the temperature difference between the maximum-pressure and maximum-temperature points of each P-T loop, and ΔZ is the depth difference between these same points. For the interpretation of ΔT and ΔZ in terms of the shape of P-T loops, please refer to Figure 1b, where examples loops are provided for each area of ΔT-ΔZ space. The models shown successfully reproduce Barrovian conditions and underwent subsequent cooling (black bars on Figure 4). (a) shows the dependence on initial depth of the rock before thickening begins. (b) shows the regions of P-T space sampled by successful models, as a function of the initial depth of the rocks before thickening begins, and colour-coded according to the resulting metamorphic field gradient. Each point represents the maximum-T> conditions experienced by a modelled rock. The field gradient is calculated for each point by comparing the high-temperature point of each P-T loop with that on the loop corresponding to rocks that were 5 km shallower in the pre-thickening setup. (c)-(h) show the P-T loop curvature of rocks initially buried at 10 km before thickening begins, as a function of the model parameters listed adjacent to the colour scale bars samples will be moving vertically faster relative to the surface, so for a given rate of heating would be able to more rapidly change their depth before thermal equilibration occurs. A corollary of this concept is that the imposed thickening rate and amount of radiogenic heating control the P-T loop curvature. Increasing the rate of radiogenic heating increases ΔT for a given value of ΔZ (Figure 5c), because greater heating is able to occur due to radiogenic heating over the time when the crust is thick. Lower rates and longer durations of thickening result in lower values of ΔZ and ΔT, because the crust remains closer to thermal equilibrium during the thickening (Figure 5f,h). The other model parameters have no systematic controls on the P-T loop curvature (Figure 5e,g). In the case of the erosion coefficient, although this parameter controls the rate of exhumation for a given crustal thickness, the path of the P-T loop is also controlled by the other aspects of the model (e.g. how hot the crust became before erosion became dominant), so there is no clear correlation between this parameter alone and loop curvature. The erosion coefficient does, however, trade-off against some of the other model parameters (Appendix A).
Only rocks that are initially deep within the crust before thickening begins (e.g. ≥15 km) are able to attain conditions in the centre or upper parts of the Barrovian pressure/depth range (Figure 5b). This effect arises because the long timescale for thermal equilibration through the entire lithosphere means that only initially deep samples have not been exhumed to shallow depths by erosion before peak temperatures are achieved. An implication of this result is that rocks in the mid-to high-P part of Barrovian conditions are not likely to have formed by homogeneous thickening, and also that terranes with large areas of low-P assemblages at a range of T conditions may be an indicator of metamorphism during homogeneous thickening. Figure 5b also shows the metamorphic field gradient, calculated for each depth interval by comparing the high-temperature point of each P-T loop with that on the loop corresponding to rocks that were 5 km shallower before thickening began. For the majority these models, the metamorphic field gradient approximates the average thermal gradient between the high-T conditions and the surface, in contrast to the situation described below.

| Thickening above rigid lower crust
3.2.1 | Tectonic settings corresponding to the model Thickening above rigid lower crust occurs in situations where the upper crust thickens in a region spatially separated from the shortening in the lower crust and the mantle lithosphere. This style of thickening currently operates on the edges of most modern-day mountain ranges (e.g. Ni & Barazangi, 1984;Lamb et al., 1997;Nissen et al., 2011). In these regions, thickening by folding and thrust faulting in the shallow crust is decoupled from relatively undeforming underthrusting foreland lithosphere. We hereafter refer to everything between that decoupling depth and the Moho as 'rigid lower crust'. In the geological record, this situation would be recognized by a change in foliation orientation and style with structural depth. Depending on the relative importance of folding and thrusting, the foliations in the overriding, thickening, crust could be steeply dipping if folding is dominant or more shallowly dipping if faulting dominates. Foliations in the rigid lower crust will be inherited from past deformation episodes, with limited overprinting (e.g. St-Onge & Lucas, 1995;Whyte et al., 2021). In poorly exposed terranes, such features may not be apparent, in which case the thermal characteristics described below will be important in distinguishing this tectonic style from homogeneous thickening. In modernday mountain ranges, the depth at which the material properties and deformation are thought to change depends on a range of factors, including lithology, hydration state and temperature (e.g. Jackson et al., 2008;Nissen et al., 2011). Due to the absence of a single global over-arching cause, we do not impose the depth to the top of the rigid lower crust in our models. We instead use a range of possible values of this parameter and establish how to use petrological information to establish that depth.

| Successful parameter ranges
More of the modelled parameter combinations fit our Barrovian constraints for this case of thickening above rigid lower crust than was the case for homogeneous thickening of the entire lithosphere (8466 models of 57,600, corresponding to 14.7%). An example model is shown in Figure 6, where the depth to the top of the rigid lower crust is initially 15 km, with an initial crustal thickness of 40 km. The P-T paths diverge more in the region above the rigid crust than within it, due to that being the depth interval that is thickening (Figure 6a). Some similar principles hold true as in the case of homogeneous thickening, for example the relative rates of thickening, erosion and radiogenic heating control which sections of P-T loops are hotter or cooler than the original geotherm. In the Figure 6 example, there is limited change in temperature in the lower crust, because the lifespan of the mountain range is too short to allow the heat produced by radiogenic heating in the thickened upper crust to diffuse into the lower crust in significant amounts before erosion reduces the crustal thickness. However, in the upper crust, similarly wide variations in conditions and timescales for originally closely spaced rocks are present as was the case for homogeneous thickening. One of the results of these effects is the variable metamorphic field gradient as a function of depth, shown by the red line in Figure 6a. A further consequence of this style of thickening is that there can be near-isothermal decompression of the lower crust during erosion, resulting in steeply inclined P-T paths. The delay between peak-P and peak-T conditions, due to thickening being able to occur faster than heating, is consistent with the widespread intrusion of granites that occur towards the end of, or post-date, the main foliation-forming deformation events (e.g. Roger et al., 2004;Woodcock et al., 2019).
The successful models that can match the constraints described above have rates of radiogenic heating focused in the range 0.5-2.5 Â 10 À6 W/m 3 , with a peak at 1.0 Â 10 À6 W/m 3 (Figure 7). The average rate required is lower than for the case of homogeneous thickening. A wider range of initial crustal thicknesses can also match the constraints (Figure 7). These effects both arise because of the geometry of the thickening. For thickening above rigid lower crust, the thermal gradient beneath the thickening layer does not immediately change; it is initially advected downwards and then evolves due to diffusion. The heat flux into the base of the thickened upper crust is therefore unchanged, compared to the case of homogeneous thickening for which the heat flux into the upper crust reduces, for example being halved if the entire lithosphere is thickened by a factor of two. Additionally, the dominant thermal perturbation is the vertical stretching of the geotherm in the thickening layer. Because of the limited amplitude and length scale of this perturbation, relaxation towards thermal equilibrium occurs significantly faster than for thermal relaxation of the entire lithosphere, and less radiogenic heating is required. A wider range of parameter combinations can therefore result in Barrovian conditions in the upper-to mid-crust.
There is an upper limit on the combination of the amount of radiogenic heating and the initial crustal thickness, with high values of both resulting in too high temperatures (Appendix A). Because the thickening and heating are dominated by the geometry of the crust, the thickness of the lithosphere plays little role in controlling the thermal evolution, and the full range of tested thicknesses results in successful models. As above, and for the same reasons, there is a clear trade-off between the duration and rate of thickening, between the rate of thickening and rate of erosion and between the rate and F I G U R E 6 Model result from thickening of initially 40 km thick crust above rigid lower crust with an initial depth of 15 km, at a rate of 2.5 mm/year. The initial lithosphere thickness is 150 km, and thickening occurs for a duration of 15 Myr. The rate of radiogenic heating is 1.0 Â 10 À6 W/m 3 , and the erosion coefficient is 1.5/s. The dotted and dashed lines show the initial geotherm and that after 150 Myr. The lines track rocks initially located at 5 km depth intervals from 5 to 40 km, with the deepest point representing the Moho, and the coloured circles indicate times since the start of the model run. The dashed red line is the model metamorphic field gradient. The corner in the depthtemperature-time paths indicates the end of thickening at 15 Myr, at which point erosion then dominates the vertical motions, as the thickened crust continues to increase in temperature. The grey dashed polygon shows the Barrovian conditions for which successful models must contain the peak-T conditions for at least one crustal level. (b) Geotherms at selected intervals throughout the model, with the thick parts of the curves representing the crust. The black circles mark the top of the rigid lower crust at each time interval duration of thickening and the amount of radiogenic heating (Appendix A). These trade-offs are important to consider when interpreting the results of models that only sample part of the parameter space.

| Curvature of P-T loops
A greater range of ΔZ and ΔT is produced than for the case of homogeneous thickening, because a greater variety of thickening geometries, and so rates and styles of thermal evolution, are able to reproduce Barrovian conditions ( Figure 8). Additionally, the wider range of successful geometries means that the resulting models span the full range of the Barrovian P-T space (Figure 8c-e).
The same principle applies as for the case of homogenous thickening, that the rate of radiogenic heating controls the magnitude of ΔT for a given ΔZ, due to the control on the amount of heating that can occur for a given geometry and duration of crustal thickening (Figure 8f). The depth to the top of the rigid lower crust also controls the curvature of the P-T loops (Figure 8b). For a given amount of thickening, a thinner layer that is accommodating the mountain building results in greater strain within that layer, resulting in a greater ΔZ for rocks within that layer (Figure 8b). However, the correspondingly lower maximum crustal thicknesses that result from successful models with thin deforming layers result in a lowering of ΔT. These effects combine to increase ΔZ relative to ΔT when compared to the homogeneous thickening case. The initial depth of the rock plays a less important role than was the case for homogeneous thickening (Figure 8a) as this effect is overwhelmed by whether the rock was within the straining upper crust or the rigid lower crust, and by the rate of strain in the upper crust. Because thickening is concentrated in the upper crust, a greater range of ΔZ is present in models that reproduce Barrovian conditions than was the case for homogeneous thickening (Figure 8). These high-ΔZ models are absent in the successful subset of homogeneous thickening models because high rates of thickening in the entire lithosphere result in such low thermal gradients that heating to Barrovian conditions is not able to occur during the timescale of cycles of mountain-building. The effect described above, of the depth-dependent metamorphic field gradients that can be produced by thickening above rigid lower crust, can be seen in Figure 8c-e. For samples initially at 5 km depth, the field gradient represents the average geothermal gradient between that sample and the surface, whereas for samples initially deeper (e.g. 15 km), the field gradient can differ significantly from the average geothermal gradient. It is therefore apparent that in cases of thickening above rigid lower crust, the curvature of the P-T loop is F I G U R E 7 The range of model parameters that result in matches to the Barrovian constraints, for the case of thickening above rigid lower crust. The input parameters span the full width of the histograms. The grey areas of the bars show models where Barrovian metamorphism occurs but less than 200 C cooling, because the crust does not reach a great enough thickness to induce significant erosion (as described in Section 3) F I G U R E 8 Legend on next page. more sensitive to the model parameters than the metamorphic field gradient. As was the case with homogenous thickening, the rate and geometry of thickening also play a role in the P-T loop curvature and for the same reasons ( Figure 8).

| THE EFFECTS OF REDISTRIBUTING RADIOGENIC HEATING
This section discusses the effects of redistributing radiogenic heating during a modelled orogenic cycle, and so tests the effects of our assumption made above that this value is the same throughout the crust. Figure 9 shows the result of a calculation identical to that in Figure 6, with the exception that where rocks exceed the solidus temperature, a proportion of the radiogenic heating is removed, and redistributed evenly through any overlying rocks that are below the solidus temperature. This process simulates the effects of radiogenic elements being transported from the lower crust to the upper crust by melts, although recent studies have questioned the importance of this effect (Alessio et al., 2018;Weller et al., 2020;Yakymchuk & Brown, 2019). The solidus temperature depends on composition, pressure and hydration state, and we use a value of 700 C to simulate conditions close to the water-saturated solidus for most lithologies of relevance to the crust (e.g. Ebadi & Johannes, 1991;Lambert & Wyllie, 1972;White et al., 2001). This configuration therefore simulates the scenario in which radiogenic heat redistribution by partial melting F I G U R E 8 Curvature of P-T loops (expressed as ΔT and ΔZ, as defined on Figure 1), as a function of model parameters, for the case of thickening above rigid lower crust. The models shown successfully reproduce Barrovian conditions and underwent cooling (black bars on Figure 7). (a) shows the lack of dependence on initial depth of the rock before thickening begins. (b) shows the effect of the initial depth to the rigid lower crust, for rocks initially buried at a depth of 10 km before thickening begins. (c)-(e) show regions of P-T space sampled by successful models, as a function of the initial depth of the rocks before thickening begins, and colour-coded according to the resulting metamorphic field gradient. The field gradient is calculated for each point by comparing the high-temperature point of each P-T loop with that on the loop corresponding to rocks that were 5 km shallower in the pre-thickening set-up. (f)-(k) show the P-T loop curvature of rocks initially buried at 10 km before thickening begins, as a function of the model parameters listed adjacent to the colour scale bars F I G U R E 9 Calculations identical to those in Figure 6, except that radiogenic heating is re-distributed in the manner described in the text. In both panels, the grey circles and lines show the results on Figure 6, and all other symbols are as for that figure. The vertical green line shows the solidus temperature used in these models. The red lines show the estimates of Ebadi and Johannes (1991) for the granite solidus at water-saturated (left), anhydrous (right) and intermediate (a H2O = 0.3; centre) conditions. These estimates do not reach the highest-pressure parts of these diagrams. The blue lines show the estimates of Lambert and Wyllie (1972) for the water-saturated (left) and anhydrous (right, mostly at higher temperatures than displayed here) basalt solidus. (a) Results after one orogenic cycle of thickening and erosion. (b) Results after the final distribution of radiogenic heating in the model shown in (a) is used as the initial input distribution for an additional cycle of thickening and erosion will be most important, and incomplete water saturation will reduce the effects discussed below. In these models, where the temperature exceeds 700 C, the radiogenic heating is reduced to 0.5 μW/m 3 , which is the median value for granulite-facies rocks in the compilation of Hasterok et al. (2018). The radiogenic heating that is removed is then redistributed evenly through all overlying rocks that are at temperatures lower than the solidus. Figure 9a shows the results for a single orogenic cycle of thickening and erosion. The timescales and pressures of the P-T loops are the same as in the original model, as these quantities depend upon the geometry of the model (i.e. thicknesses and rates), rather than on the rate of radiogenic heating. The peak-T conditions in the upper and middle crust are very similar to those in Figure 6 because these conditions are achieved before the redistribution of radiogenic heating can have an effect. After a significant amount of time has elapsed (>50 Myr), the lower crust is slightly cooler than in the original model, because some of the radiogenic heating has been removed from this depth interval and transferred to the upper crust. The final geotherm is slightly cooler because some of the upper crust, where radiogenic heating has been concentrated, is eroded from the shallow part of the model, meaning that the total vertically integrated rate of heating has decreased. However, all of these effects are modest and are ≤50 C. Figure 9b examines a situation in which rocks that have already undergone one cycle of thickening, erosion and the redistribution of radiogenic heating undergo the same process again. This situation is modelled by taking the final distribution of radiogenic heating from the model in Figure 9a and using it as the starting configuration for a further identical cycle of thickening and erosion. In this case, the initial geotherm is cooler, because of the overall reduction in total radiogenic heating during the first orogenic cycle described above. Peak temperatures are also correspondingly cooler, for the same reason. The size of this effect depends upon depthrocks in the upper and mid crust have peak temperatures within $50 C of the original model shown in Figure 6, but in the lower crust, the difference can be up to $100 C. This effect arises because the lower crust is going through an entire orogenic cycle with the reduced rate of radiogenic heating, whereas in the model in Figure 9a, this effect only became established during the orogenic cycle, once rocks started reaching supra-solidus conditions.
The models shown in Figure 3 are not significantly affected by undertaking the same process as described above, because so little of the crust reaches the solidus temperature. In models of homogeneous thickening of the entire lithosphere in which more of the lower crust exceeds the solidus temperature, then more redistribution of radiogenic heating would occur. In that case, it is possible that the effect could lessen the issue that the homogeneous thickening models able to achieve Barrovian conditions in the upper/mid crust become too hot at depth. However, because that heating is moved into the overlying crust, and because there is time within a model run for diffusion through the crust, that effect alone is insufficient to keep the lower crust at realistically low temperatures. Only by coupling this effect to high rates of erosion that remove significant volumes of enriched upper crust can there be a significant effect on the temperatures. However, in these cases, the overall loss of heating also results in the upper/mid crust struggling to attain Barrovian conditions. However, it should be noted that there is a possibility of this effect being significant under a limited range of specific conditions.
In summary, it is debated whether the redistribution of radiogenic elements during melting and metamorphism is important (Alessio et al., 2018;Bea & Montero, 1999;Weller et al., 2020;Yakymchuk & Brown, 2019), and there is no correlation between heating rate and metamorphic grade (Hasterok et al., 2018). If such a process is included in the models, then the changes to the results are modest, especially over a single orogenic cycle. If the rocks are not always completely water-saturated, then the effects discussed here will be further reduced. We therefore view our assumption of constant radiogenic heating throughout the crust as appropriate, but note that the values we discuss should be taken as averages.

| TEST CASES: DANBA, WOPMAY AND BARBERTON
To test that the approach taken in this paper is able to reproduce metamorphic conditions in well-studied examples, and to further illustrate the controls on the nature of the thermal evolution of mountain belts, we make use of two contrasting metamorphic terranes: Danba in eastern Tibet and Wopmay in Canada. These examples demonstrate that even where only fragments of the P-T histories of the rocks are known (to the level that is common in exhumed orogenic belts), our approach is successful at reproducing independently known aspects of the history of mountain ranges, which we explicitly do not use for model discrimination in order to use them as an independent test. We also consider the case of the Barberton granite-greenstone belt in South Africa, from the perspective of examining the viability of using P-T loops to infer tectonic setting. Weller et al. (2013) studied the P-T-time conditions in the Danba region of eastern Tibet, where recent deformation has exposed a cross-section through an early Jurassic Barrovian metamorphic sequence. The rocks experienced low-P Barrovian metamorphism, with peak conditions of the exposed structural levels varying from 570-690 C at 5.0-6.5 kbar. These peak conditions were approached from lower pressures and temperatures, as inferred from modelling garnet growth zonation, and the thickening was above a detachment level now exposed at the surface. Weller et al. (2013)'s peak P-T constraints are encompassed by the grey ellipse on Figure 10a. The peak P and T constraints from Danba occupy a narrow enough range (due to the amount of structural relief exposed) that they represent closely spaced P-T loops. From our library of model results presented above, we find models which (1) match the Weller et al. (2013) peak conditions with the high-T points on model P-T loops and (2) have prograde paths consistent with the peak-T conditions being approached from lower pressures and temperatures. We therefore select models that matched the peak conditions to within a tolerance of 30 C and 1 kbar and have a maximum ΔZ of 0.5 kbar and ΔT of 50 C. We reject models that only match the observed P-T conditions on prograde or retrograde paths, rather than at peak, and those in which the direction of approach to the peak-T conditions are inconsistent with Weller et al. (2013)'s results. A subset of 14 of our thermal models match these constraints, one of which is shown on Figure 10a.

| Danba: Slow and long-lived thickening and hairpin P-T loops
All the successful models involve thickening above rigid lower crust, which is consistent with the independent observation of a mapped detachment separating the deformed metasediments from the underlying metaigneous basement. Some of the model parameters are well-constrained: the rate of radiogenic heating is 1.0-1.5 Â 10 À6 W/m 3 , the initial crustal thickness is 40 km, the thickening rate is 0.5-1.5 mm/year, the erosion coefficient is 1.5/s and the duration of thickening is 40-60 Myr. The lithosphere thickness is less well constrained but is 200 km or greater, and the initial depth to the top of the rigid lower crust is also not well constrained and could be anywhere in the range 5-15 km. The model shown in Figure 10a has an initial depth to the rigid crust of 10 km, which results in material immediately above that boundary reaching the observed peak conditions, in agreement with the study samples of Weller et al. (2013) being taken from rocks immediately above the detachment. The initial crustal thickness in the models is consistent with the independent information regarding the metamorphic protolith containing shallow-water sediments, implying a crustal thickness resulting in elevations close to sea level (e.g. Huang et al., 2003; Weller F I G U R E 1 0 Example thermal models that match the estimated P-T conditions at Danba and Wopmay. Dashed red lines show the modelled field gradients. (a) For the Danba case, the model represents thickening of initially 40 km thick crust above rigid lower crust whose top is at an initial depth of 10 km. The initial lithosphere thickness is 200 km. Thickening occurs at a rate of 0.5 mm/year for a duration of 55 Myr. The rate of radiogenic heating is 1.5 Â 10 À6 W/m 3 , and the erosion coefficient is 1.5/s. The grey ellipse encompasses the P-T estimates of Weller et al. (2013), which were approached from lower-P and lower-T conditions. (b) For the Wopmay case, the model represents thickening of initially 40 km thick crust above rigid lower crust whose top is initially at 10 km. The initial lithosphere thickness is 150 km. Thickening occurs at a rate of 3.5 mm/year for a duration of 15 Myr. The rate of radiogenic heating is 1.0 Â 10 À6 W/m 3 , and the erosion coefficient is 2.5/s. The grey box shows the peak-T conditions estimated by St-Onge (1987), which were approached from higher-P and lower-T conditions et al., 2013). The example thermal model shown in Figure 10a demonstrates that the slow rates of motion result in the crust staying at close to thermal equilibrium, and P-T paths remaining close to the initial geotherm and the resulting metamorphic field gradient. The rates of heating when approaching peak-T conditions in the successful models are slow (e.g. 4.3 C/Myr in the example shown in Figure 10a), which agrees well with U-Pb monazite dating by Weller et al. (2013), which suggested the studied samples took $20 Myr to increase in temperature by $100 C, giving a rate of heating of $5 C/Myr.

| Wopmay: Rapid thickening and broadly curved P-T loops
The Proterozoic Wopmay orogen strongly contrasts with Danba. P-T paths derived from thermobarometry applied to zoned poikiloblastic garnets indicate rocks approaching their maximum-T conditions (600-650 C and 6-8 kbar) during decompression from significantly higher-pressure conditions (St-Onge, 1987). Due to our neglect of heat movement by magmatism in this paper, we only use the southernmost samples described by St-Onge (1987), which are more than 25 km distant from significantly sized plutons. We select the models from our compilation that can reproduce the observed peak-T conditions, which are similar to each other for the samples we use, and are shown by the grey box in Figure 10b. The fragments of the P-T paths reconstructed by St-Onge (1987) are also consistent between the different samples. The peak-P conditions are not preserved, and the samples are on a decompression path as they approach the maximum-T conditions. The known fragments of the P-T paths show that the samples increased temperature by at least 100 C whilst decompressing by at least 1-2 kbar (3.7-7.4 km). The steepness of the observed parts of the decompression paths implies that the peak-P conditions must have been experienced at depths that are higher than the peak-T conditions by at least double this value. Therefore, we extract from our library of results those models that have samples which match the observed peak-T conditions and which approach those conditions from depths at least 10 km deeper and temperatures at least 100 C cooler (i.e. that ΔZ ≥10 km and ΔT ≥100 C). We reject models that only sample the observed peak-T conditions on prograde or retrograde paths, rather than at peak. In contrast to the Danba case, these constraints are not tight enough to provide good constraints on the models, with over 400 models spanning the range of input parameters able to fit match the imposed conditions. We therefore use the additional constraint suggested by St-Onge and Davis (2018) that peak-T conditions must be achieved within 30 Myr of the end of sedimentation (which we take to be the onset of thickening). This additional constraint reduces the number of acceptable models to 21, an example of which is shown in Figure 10b. All involve thickening above rigid lower crust, which is consistent with the independent observation of a mapped detachment in the region (although the depth to the detachment is not well resolved in the models). Successful models thicken for a short amount of time (10-20 Myr) at a rate of 3.5-4.5 mm/year. The initial crustal thickness is 40 km in the majority of the models, the erosion coefficient is 3.5/s and the rate of radiogenic heating is 1.0 Â 10 À6 W/m 3 . This combination of rapid thickening above rigid lower crust, and rapid erosion, is required in order to thicken the crust and then erode it sufficiently for peak-T conditions to be achieved on the decompression path within $20-30 Myr of the onset of thickening.
In summary, the application of our models to typically reported metamorphic results (i.e. peak conditions and a fragment of the prograde P-T path, in places supplemented by constraints on the timing of metamorphism relative to the onset of thickening) from Danba and Wopmay is able to (1) demonstrate that our approach is able to successfully reproduce independently known aspects of the deformation of the regions (e.g. that in both cases the thickening was above rigid lower crust); (2) constrain previously poorly known parameters (e.g. the rate of radiogenic heating, the pre-thickening crustal thickness and the rate of thickening); and (3) illustrate the differences in mountain belt history that are required to reproduce the observed contrasting styles of P-T loops. Additionally, by establishing the parameter values that result in successful models, this process can be used as a starting point for future two-or three-dimensional models, if required.

| Barberton: Average thermal gradients and inferring tectonic setting
Shown as grey dashed lines on Figure 10 are average thermal gradients, such as would be obtained by dividing a temperature estimate by a pressure estimate from a sample. For the Danba case, the slow rates of motion result in rocks staying close to an average thermal gradient of 30 C/km, as the geotherm is always close to steady state. In contrast, for the Wopmay case, the model rocks experience conditions equivalent to a wide range of apparent thermal gradients during the growth and decay of the mountain range, from <15 C/km to >30 C/km. This result has important implications for using low apparent average gradients to infer the presence of subduction (e.g. Brown, 2008;Ganne et al., 2011;Moyen et al., 2006;Xu et al., 2018), particularly as some high-P metabasic rocks can preserve mineral assemblages close to their high-P position on a given P-T loop due to steep exhumation paths being characterized by more hydrated assemblages (Weller et al., 2015). For recent metamorphic events, high-temperature geochronometers can have sufficient resolution to estimate the rate of exhumation, which can aid inferring the tectonic environment of metamorphism (e.g. Parrish et al., 2006). However, in ancient metamorphic belts, the resolution of chronometers often means that exhumation rate information is not available, so tectonic inferences rely more upon the P-T conditions of metamorphism.
To investigate whether low average thermal gradients reliably indicate subduction, we will use the P-T paths developed for the $3.2 Ga metamorphism in the Inyoni shear zone of the Barberton terrane of South Africa by Moyen et al. (2006), which juxtaposes the two domains of this composite terrane. P-T estimates for garnet growth in amphibolites indicate conditions of 550-700 C and 10-17 kbar, implying low average thermal thermal gradients of 12-15 C/km, which were used as evidence for Archean subduction. P-T estimates for garnet breakdown provide constraints on another part of the P-T loop at higher temperatures (600-850 C) and lower pressures (7-10 kbar), representing a clockwise P-T loop (as also found by Peng et al., 2019). A range of our models is able to reproduce the observed P-T paths, all involving thickening above rigid lower crust, and rates of radiogenic heating in the range 0.5-1.0 Â 10 À6 W/m 3 . Most successful models involve thickening for ≥30 Myr, and the other parameters are poorly constrained. An example of a model that successfully reproduces the Barberton P-T estimates is shown in Figure 11, which is also consistent with the P-T conditions inferred for other structural levels within the deformation belt (as summarized in Moyen et al., 2006). The passage of the modelled rocks through the region of $600 C and $11 kbar is consistent with the generation of syn-tectonic trondjhemitic melts, as also observed in the shear zone, as this region of P-T space is the lowest-T solidus position in hydrated mafic rocks . Our results show that average thermal gradients at peak-P conditions in the range seen at Barberton do not unambiguously imply subduction. Although continental lithosphere in thermal steady state would not be expected to have average thermal gradients as low as 10 C/km, Figure 11 demonstrates that low values can be achieved as the thermal structure is evolving, due to thickening being able to occur faster than heating. Sandiford and Dymoke (1991) discussed similar effects for the case of homogeneous thickening. Our models alone do not necessarily imply that the metamorphism of these rocks occurred in a continental mountain range, for which further information is necessary, as discussed below.
The rate of radiogenic heating we estimate at Barberton is similar to the more geologically recent examples from Wopmay and Danba described above, which may be surprising given that the decay of radiogenic isotopes means that the total rate of radiogenic heating on Earth has decreased over this time period. This similarity is likely to be related to lithology, and the variation in rates of radiogenic heating seen within and between lithologies at the present day (Hasterok et al., 2018). Therefore, although the bulk rate of heating will have reduced through time, the specific lithology (averaged throughout the crust, as this bulk value is what our models are sensitive to) and geological history of each location mean that we would not expect all ancient terranes to exhibit more vertically averaged heating than all modern ones, because of the likelihood of such heating being unevenly distributed on a global scale. 6 | DISCUSSION 6.1 | Constraining mountain belt characteristics using P-T path curvature The comparison of Danba and Wopmay reveals that the peak-T metamorphic conditions of a given sample, along F I G U R E 1 1 Example thermal model that matches the estimated P-T conditions from Barberton. The grey ellipses are P-T estimates from Moyen et al. (2006). The model represents thickening of initially 30 km thick crust above rigid lower crust whose top is initially at 10 km. The initial lithosphere thickness is 250 km. Thickening occurs at a rate of 1.5 mm/year for a duration of 45 Myr. The rate of radiogenic heating is 1.1 Â 10 À6 W/m 3 , and the erosion coefficient is 1.5/s with one or two other pieces of information, are sufficient to place constraints on the geometry and rates of thickening and erosion, and the rate of radiogenic heating. The additional pieces of information that can be diagnostic include the direction of approach to the peak-T conditions in P-T space (i.e. the curvature of the P-T loop; Figure 1b) or the timing of peak conditions relative to the onset of thickening. Examining the curvature of P-T loops, in addition to peak conditions and/or timing, therefore provides a promising avenue with which to reconstruct the characteristics of both modern and ancient mountain-building events. As discussed above, the curvature of the P-T paths is more diagnostic of the nature of a mountain-building episode than the metamorphic field gradient. Once the range of possible model solutions has been calculated, as done here, then such a database can then act as a library to be searched for a given combination of observations and inferences (as done above for Danba, Wopmay, and Barberton), or alternatively used in reverse to establish which observations will be most diagnostic in a particular tectonic setting. Figure 12 illustrates the controls on the prograde P-T loop curvature, as illustrated by the calculations above and the examples from Danba and Wopmay. A sample approaching its high-T conditions from lower P and T (i.e. low ΔT and low ΔZ), along the geotherm, can be achieved by slow rates of thickening so that the geotherm remains close to steady state. This situation also requires slow erosion, if mountain-building or appreciable metamorphism is to occur (e.g. Sandiford, 2002). A sample approaching the high-T conditions from significantly lower temperatures but higher pressures (i.e. high ΔT and high ΔZ) can be achieved by rapid thickening and rapid erosion, such that the sample achieves its deepest location before the temperature has equilibrated through the crust, and heating continues as erosion is exhuming the sample. This situation can be enhanced, and the effect can occur for lower rates of thickening and erosion, if the rate of radiogenic heating is high. For a sample to approach the high-T conditions from higher pressures but similar temperatures (i.e. low ΔT and high ΔZ, commonly described as isothermal decompression; Brown & Dallmeyer, 1996;Whitney et al., 2004) requires the crust to have had time to reach high temperatures whilst thick, and for rapid erosion to result in exhumation with limited change in temperature. This situation can be enhanced by low rates of radiogenic heating, which limits the rate of temperature change during exhumation. None of our models reproduce isobaric heating and cooling paths (i.e. small ΔZ, large ΔT), confirming the established view that such trajectories are mainly related to local heating adjacent to intrusions, which is an effect not included in our model setup.
The situations discussed above are an expression of the concept that the thermal evolution is controlled by the relative timescales of thermal equilibration, and thickness changes due to thickening and erosion. Rough estimates of the timescales of deformation can therefore be obtained from the orientation of approach of samples to the high-T conditions (although these estimates will generally not be as accurate as those based on geochronology). The homogenization of mineral zoning and sparsity of inclusions that can occur in the granulite facies (e.g. Harley, 2004) is likely to remove much of the record of the prograde metamorphic path. In such cases, the constraints provided by the timings of metamorphism and/or the initial crustal thickness (e.g. from sedimentary facies distributions where such information is available) provide the best model constraints. Although the metamorphic field gradient is less sensitive than the curvature of P-T loops to the characteristics of the mountainbuilding events, combining this information with dating constraints may be the most promising avenue to pursue in chemically and texturally homogenized high-grade terranes, provided that the apparent field gradient represents a coherent slice of crust, rather than widely spaced lithologies juxtaposed by subsequent deformation.

| Relationship of model results to current levels of precision of P-T estimates
When examining the curvature of predicted P-T paths ( Figures 5 and 8), the resolving power of current thermobarometric techniques should be considered. A wide range of techniques is available, including traditional calibrated thermometers (e.g. Ferry & Spear, 1978; F I G U R E 1 2 A summary of the factors affecting the curvature of P-T loops for thickening above rigid lower crust Holland & Blundy, 1994) and barometers (e.g. Holdaway, 2001) and their statistically optimal combination (Powell & Holland, 1992), pseudosection (bespoke phase diagram) modelling , element substitution methods (e.g. Henry et al., 2005;Tomkins et al., 2007;Watson et al., 2006) and Raman-based thermometers (e.g. Beyssac et al., 2002) and barometers (e.g. Kohn, 2014b). The reported accuracy of each technique commonly varies as a function of pressure and/or temperature, and a given technique is generally only applicable to restricted bulk compositions and/or P-T conditions, all of which negates generalized statements regarding uncertainty. In some circumstances and for some techniques, precision is estimated to be as good as AE10 C (Watson et al., 2006) and AE0.2 kbar (Kohn, 2014b), but for most crustal P-T conditions, errors of AE50 C and AE1 kbar (2σ) are likely more reasonable, taking into account a wider range of potential sources of uncertainty than are typically formally estimated for a given thermobarometer (Powell & Holland, 2008b;Palin, Weller, et al., 2016). Ultrahighpressure and -temperature metamorphic conditions are potentially characterized by at least twice this uncertainty (Powell & Holland, 2008b), but the diversification and refinement of thermobarometric techniques are improving this situation (e.g. Kohn, 2014b;Tomkins et al., 2007). Nevertheless, for metamorphic conditions of most relevance to this study, it can be seen that the differences in P-T loop curvature documented above are resolvable, with variations in values of ΔT and ΔZ of ≥50 C and ≥5 km ($1.5 kbar) being diagnostic of the nature of the orogenic cycle (Figures 5 and 8).

| Timing of peak conditions
The thermal models presented above indicate the wide range of metamorphic timings that can be produced for a geometrically simple model. The timescales of metamorphism in the models above vary greatly, and the timing of peak-T for different structural levels within a given model can also vary by large amounts (i.e. up to tens of millions of years for rocks initially 5-10 km apart in depth). Slower rates of thickening and erosion result in greater similarity of the timing of peak-T for rocks at different structual levels (e.g. Figure 10a, where the peak-T point occurs at similar times for most structural levels, in contrast to Figures 10b and 11), providing a possible avenue to further constrain the rates of thickening if multiple structural levels are exposed in an exhumed mountain belt. However, a corollary of this point is that age estimates that are widely spaced in time may not represent separate metamorphic events (e.g. Huang et al., 2003) but may rather represent the differences in timing of peak temperatures that can result within even geometrically simple mountain ranges. Although our models reproduce the tens of millions of year timescales often documented for the metamorphic evolution of collisional mountain belts (e.g. Weller et al., 2021), significantly shorter timescales have been suggested for some metamorphic events (e.g. Ague & Baxter, 2007;Smye et al., 2011;Spear et al., 2012), including as brief as less than one million years. Such timescales are not reproduced by any combination of parameters in our models and so requires other driving mechanisms beyond diffusion, radiogenic heating and vertical advection by thickening.

| Radiogenic heating and craton formation
We have found that the rates of radiogenic heating required to match Barrovian metamorphic conditions in general, and the specific examples of Danba, Wopmay and Barberton, are relatively modest (e.g. 1.0-1.5 Â 10 À6 W/m 3 ). Because of our assumption regarding the radiogenic heating being evenly distributed throughout the crust, these estimates should be taken as bulk averages. It is possible that variations with depth could be present and result in similar patterns of thermal evolution as those modelled here, although extreme concentrations of radiogenic heating will result in changes to the shape of the geotherms. Our estimates are consistent with the total rates of heating proposed based upon petrological arguments (e.g. Hacker et al., 2015;Rudnick & Gao, 2014), if not their spatial distribution. Our results suggest that for the Barrovian metamorphism we have studied there need not be significant spatial concentration of high heat-production material (e.g. in accretionary prisms; Jamieson et al., 1998) or external sources of heat (e.g. from enhanced heat flux from the convecting mantle; Houseman et al., 1981;Li et al., 2016) and that radiogenic heating in thickened crust is sufficient to result in the observed styles of metamorphism (e.g. . In some cases, the rate of radiogenic heating required to produce Barrovian conditions could depend upon the convergence rate. For example, in regions of extremely fast convergence, the underthrusting plate will have a cooling effect on the over-riding material (not considered in our one-dimensional approach), which would increase the rate of heating required in order to achieve a given temperature increase. Figure 2 gives an indication of the rate of convergence and thickness of crust that would be required in order for this consideration to apply, which is only in the handful of thickest and fastest-converging ranges at the present day.
The rates of radiogenic heating we have discussed can raise the temperature in the mid-crust above that in the underlying crust and mantle, as also found by McKenzie and Priestley (2016). The size and location of the temperature inversion are a result of the geometry of the thickening and the rate of radiogenic heating. Much attention has focussed on the origin and interpretation of inverted metamorphic grades in crustal sections, often revolving around rapid thrust motion or flow in crustal channels (e.g. Beaumont et al., 2004;England & Molnar, 1993;Gibson et al., 1999;Hubbard, 1989;Searle et al., 2006). Our models presented here (and those of McKenzie & Priestly, 2016) demonstrate that radiogenic heating in thickened crust can also result in inverted metamorphic grade and highlights that analysis of the timing and the magnitude of the temperature inversion, and any associated deformation, is required to distinguish between these possibilities. In particular, the models presented here show that the lengthscale of the temperature inversion due to radiogenic heating is larger than for the other potential causes. A difference between the models presented here and the instantaneous thickening models of McKenzie and Priestley (2016) is that where thickening occurs above rigid lower crust, it is possible to achieve granulite-facies conditions faster, with lower rates of radiogenic heating, and without there being a temperature inversion in the crust (e.g. Figures 6 and 10b). This finding implies that less time is required for craton formation (by melt extraction yielding an anhydrous granulite-facies terrane) and that the granulite-facies mid-crustal rocks commonly observed in Archean continental interiors may be underlain by similarly high-grade rocks.

| Dominant style of mountain building and constraints on ancient tectonics
The results presented above suggest that the majority of Barrovian P-T conditions cannot be achieved by homogeneous thickening except in extreme cases (e.g. by the rocks close to the base of crust that is very thin before thickening begins). The prevalence of Barrovian conditions in continental mountain ranges thereby implies that most mountain ranges are built by thickening above rigid lower crust. Such a situation implies rheological contrasts within the crust, between weaker overthrusting material and a strong underthrusting substrate (e.g. Copley, Avouac, & Wernicke, 2011;Jackson et al., 2008;Ni & Barazangi, 1984;Watts et al., 1995). This situation highlights the importance of pre-existing strength contrasts in controlling the tectonics of mountain building. The modelled P-T loops also raise the possibility of regions of homogeneous thickening being mistaken for contact metamorphism, due to the prevalence of low-P assemblages covering a wide range of temperature conditions. Although unlikely to be an issue in well-exposed terranes, in which the spacing of isograds and the geometry relative to intrusions will be diagnostic of the presence or absence of contact metamorphism (e.g. Pattison & Tinkham, 2009), this effect could cause ambiguities in regions with limited outcrop.
A potential use of our models is to examine when in geological history thickening above rigid lower crust became the dominant mechanism of mountain-building, and by doing so establish by what time large regions of strong mid/lower crust had been produced. The Barberton example in Figure 11 shows that P-T loops implying shortening above rigid lower crust were already in existence by 3.2 Ga. Our thermal models alone cannot distinguish between a continental collision and a subduction setting. However, the low Th/Nb ratio in the Barberton greenstones has been used to suggest that formation may have occurred in a non-subduction setting (e.g. Hawkesworth et al., 2019). In that case, our results suggest that there were regions of strong continental crust present by 3.2 Ga, in order to result in metamorphism by mountain building above rigid lower crust. This feature, in turn, may imply that there had already been significant amounts of high-grade metamorphism, dehydration through partial meting, and cooling to form craton-like bodies, in the early-to mid-Archean, as this process is the only one currently known to generate large regions of crust that are strong enough to provide a rigid base to upper-crustal mountain building (e.g. Copley, Avouac, & Wernicke, 2011;Jackson et al., 2008). Such a situation is in line with the suggestion of England and Bickle (1984) that mountain range crustal thicknesses, and so lithosphere strengths, were similar in the Archean to the present day.

| CONCLUSIONS
This paper has used one-dimensional models to describe the main controls on the thermal evolution of continental mountain ranges. The combination of peak-T conditions with constraints on the prograde evolution (e.g. from garnet zonation or inclusions), and so the degree of curvature of P-T loops, provides strong constraints on the rate of vertically distributed radiogenic heating and the characteristics of mountain-building, including the style, rate and geometry of thickening. The geological record of regional metamorphism is dominated by metamorphic rocks produced by thickening above rigid lower crust (rather than by homogeneous thickening of the entire lithosphere) and requires only modest rates of radiogenic heating (0.5-2.5 Â 10 À6 W/m 3 ). Our results cast doubt on the ability to use average thermal gradients to infer the presence of subduction, as a range of model scenarios result in rocks transiently passing through 'cool' thermal gradients at near-peak pressure conditions. In contrast, our results present a promising avenue for examining when mountain-building events began to resemble those at the present day, which the data from Barberton implies may have been as early as the Paleoarchean.

ACKNOWLEDEGEMENTS
This work was partly supported by COMET, which is the NERC Centre for the Observation and Modelling of Earthquakes, Volcanoes and Tectonics, a partnership between UK Universities and the British Geological Survey. We thank Sandra McLaren, Andrew Smye and two anonymous reviewers for comments on the manuscript and Richard White for editorial handling.
ORCID Alex Copley https://orcid.org/0000-0003-0362-0494 Owen Weller https://orcid.org/0000-0001-8872-8618 A P P END I X A : TRADE-OFFS BETWEEN MODEL PARAMETERS Figures A1 and A2 show the trade-offs between the parameters of models that are able to reproduce Barrovian metamorphic conditions. Where there is a diagonal stripe of successful models, these parameters trade off against each other. Where there is a horizontal or vertical stripe, one parameter has a strong control on which models are successful, and the other has less control. Where the entire plot is the same colour, wide ranges of either parameter can result in successful models, but the parameters do not depend upon each other.
F I G U R E A 1 The trade-offs between the parameters of models that are successfully able to reproduce Barrovian metamorphic conditions, for the case of homogeneous thickening of the entire lithosphere. The colours show the number of successful models plotting within each region of the graph. Grey regions contain no successful models F I G U R E A 2 The trade-offs between the parameters of models that are successfully able to reproduce Barrovian metamorphic conditions, for the case of thickening above rigid lower crust. The colours show the number of successful models plotting within each region of the graph. Grey regions contain no successful models