Efficient Surrogates Construction of Chemical Processes : Case studies on Pressure Swing Adsorption and Gasto-Liquids

We propose a workflow for reduction in the time required for data generation during generation of statistical digital twins. This methodology is particularly relevant for real-world engineering problems when data generation is expensive. A prerequisite for building surrogates is sufficient input/output data, whereas over-sampling can hardly improve the regression accuracy. The time for data generation can be reduced via (1) reduction of the average time spent on generating individual data points and (2) reduction in the total number of data points, by reducing the sampling rate with the improvement of surrogate quality. Examples of a dynamic process and a steady-state process from the field of carbon capture and utilization are used as two case studies: pressure swing adsorption (PSA) and Gas-to-Liquids (GTL). With the proposed methodology, the time for surrogate generation can be reduced by 88% for PSA and 60% for GTL, respectively.


Introduction
Surrogates, or data-driven models, play an increasingly important role in the evaluation of engineering systems. In recent years, machine learning has offered more types for the 'surrogate family' and boosted the application of surrogates. [1][2][3][4] To build surrogates, one of prerequisites is data, the generation of which can be time-consuming and prohibitively expensive for real-world engineering systems when no preliminary data is available.
Conventional sampling methods can lead to under/over-sampling issues. 5 Our hypothesis is to develop a workflow to reduce the time spent on data generation by reducing the total number of the required data points, and improving the data effectiveness (or reducing the average time spent on generating an individual data point). To successfully set up such a methodology, it is beneficial to review prior work on surrogate modelling and smart sampling, which will be elaborated in the remainder of this section.
The evaluation of real-world engineering systems is expensive. With physical models and inputs, computer simulations can accurately deliver information about systems. Although cheaper than experimental or industrial data, simulations can be considerably slow when the systems involve multi-scale, multiphase phenomena and dynamic behaviors. 4 Data-driven surrogates can represent the original physical models by building a relationship between inputs and outputs (responses). For some complex systems, there are even no physical models available, and thus data-driven surrogates together with the design of experiments (DoE) seem to be the only choice. [6][7][8] Surrogates are cheap-to-evaluate and can directly be employed for optimization, control and design problems in many engineering fields, e.g. chemical engineering, 1,2,9,10 pharmaceutical manufacturing, 11 supply chain management, 12,13 and aerospace engineering. [14][15][16] The traditional formulation of surrogates is polynomials, moving least-squares and radial basis functions, etc., but the low accuracy is often criticized. [1][2][3] In machine learning, artificial neural networks (ANNs) and Gaussian Processes (GPs) have been identified as promising formulations for surrogate generation. [17][18][19][20] ANNs are regarded as universal approximators and allow to fit multiple outputs simultaneously. The flexible structure and various activation functions enable ANNs to accurately fit any linear or nonlinear relationship of inputs/outputs. GPs belongs to a non-parametric model type, which has excellent fitting performance and can predict uncertainty. 18 However, GP is generally implemented for the single output, whereas the formulation for fitting multiple outputs is complex. An alternative solution is to use independent GPs for each output, while the correlations of outputs are missing, which can cause worse predictions for highly-coupled outputs. 21,22 Data is one of the prerequisites to train a surrogate. When the available data is limited, data generation is necessary, but it can be extremely expensive for real-world engineering systems. 17 To demonstrate it, we consider an example of a chemical process -pressure swing adsorption (PSA). [23][24][25] Details about PSA can be found in Section 4, where we use PSA as one of the case studies. Data is randomly sampled, while the surrogate is trained iteratively. As shown in Figure 1, the time for data generation has a significantly higher order of magnitude than that for surrogate training, which is one of the common problems in chemical process systems.
Insufficient data quantity cannot guarantee good quality for constructing a surrogate, while Garud et al. report that simply increasing the data quantity cannot lead to better performance of a surrogate. 5 Thus, the quality of surrogates should rely on both -data quantity and quality. To reduce the time for data generation, the first objective is to obtain good-enough surrogates with the minimum amount of data. There are two types of sampling methods: one-shot and sequential (adaptive) methods. 1 The former method samples the design space uniformly in one go and then builds a surrogate, while the latter samples data in batch and refine the surrogate iteratively. The one-shot is straightforward but may result in under/over-sampling, where either poor regression or inefficiency can occur. In recent years, the sequential methods tend to be popular because they are reported to better balance the regression performance and efficiency. 1 However, over-sampling is still hard to avoid by a typical sequential method. To demonstrate this, we still use the example of a PSA process. Mean squared errors (MSE) is employed to evaluate the regression performance. The data is sequentially sampled in a space-filling method, which is stochastic. A stochastic method has anytime behavior, 26 where sampling can be stopped at any time. We plot the fitting performance against time to observe the termination condition as shown in Figure 2. We divide the sequential sampling as two equal parts based on the number of data points (approximately equivalent to time, because the time for surrogate training can be neglected compared to data generation as shown in Figure 1). The plot indicates that regression improvement in the first half is significantly greater than that in the second half.
This suggests that too much data is not worth collecting, although the limit of the infinite number of data points is required to fully fill the searching space theoretically. In other words, due to the anytime behaviour, we should stop any further sampling after achieving a certain fitting performance. Also, we noticed that the MSE values fluctuate all the time. Hence, it is rather challenging to determine an optimal termination criterion. Figure 2. Illustration of why too many data points might not be worth sampling.
Meanwhile, with the linear increase in the number of data points generated, time seems to exponentially increase for surrogate training ( Figure 1). Consequently, the surrogate training might be extremely time-consuming if the number of sampled data points is high. Therefore, over-sampling brings the unnecessary time costs for data generation and extra effort for surrogate training. To avoid over-sampling, it might be beneficial to spot the non-improvement trend as early as possible.
The second objective in sampling is concerned with the improvement of data quality. The design space for sampling is initially based on limited prior experience or even random guesses, and the infeasible design space is commonly unavoidable. Consequently, some inputs, which happen to be sampled from the infeasible design space, can lead to unexpected outputs, such as non-converged simulation outputs or experimental failures. Such outputs will introduce significant errors to the surrogate construction, and thus they are ineffective data, which should be screened out. To increase data effectiveness, a classifier can be constructed to distinguish between infeasible and feasible design spaces. Such application of a classifier has been successfully demonstrated in prior research works. Ibrahim et al. reported that a support vector machine (SVM) can be used to set a feasibility constraint to filter infeasible design space for non-converged simulations. 27 Cao et al. adopted a Bayes classifier to improve the design space for the experimental conditions of formulations. 6  For each new data to sample, they apply a derivative-free optimization technique to maximize the error between the surrogate and the original model. To identify one optimal sample placement, many new data points are required to be generated for evaluation during the optimization. Consequently, this method, actually, generates far more data points than the number of optimal data points, as presented in their results. An alternative approach is to employ GP-based surrogates, which contain the uncertainty-related information, and the new points can be sampled to reduce the predicted uncertainty. 32,33 However, the model-based methods can result in over-sampling in local spaces and lose generality. 5 For example, Gonzalez-Garay et al. reported a surrogate-based optimization method for a chemical flowsheet, and over-sampling was executed to refine the surrogate near the optimal solution. 34 However, the obtained surrogate may not be applicable for other case studies if the objective changes, because only the prior local space near optima is highly exploited.
To develop a more generic surrogate, the model-free sampling strategy can be more advantageous because it can guarantee randomness. In this sense, we should consider homogeneous sampling methods, such as Latin hypercube sampling (LHS), Sobol Sampling, and Monte Carlo sampling. 5 Meanwhile, exploration should be well balanced with exploitation (model-free type) during sequential sampling. On the one hand, exploration samples the entire design space homogeneously. On the other hand, exploitation focuses on sampling the complex/nonlinear space. A general strategy is to divide the overall design space into several sub-spaces, and then rank them based on certain score criteria, following by exploiting the subspace with the worst score. Since it is out of the scope of this work, more detailed information can be referred to in the cited literature. [35][36][37] Overall, prior studies focus more on the sampling methods, but little attention is given to reducing the total time for data generation. In this work we aim to obtain good-enough surrogate models via efficient sequential data generation. The efficiency benefits from: -reduction of the total number of sampling points; the sampling rate gradually slows down with the improvement of regression performance, and thus the non-improvement trend will be spotted at an earlier stage; -reduction in average time spent on generating an individual datapoint; a classifier is trained to screen out unpromising data inputs, and thus avoid spending time on unnecessary simulations or experiments.
The remaining sections are structured as follows. Section 2 proposes the overall workflow for the surrogate construction. Section 3 demonstrates the state-of-art of two principles for efficient data generation. Section 4 presents two case studies on chemical processes, following by conclusions and outlooks in the final section.

Workflow for surrogate construction
This section presents the workflow for surrogate generation. To enhance the surrogate quality globally, this work is focused on exploration-based sampling (exploitation-based method may affect the evaluation on the influence of the sampling rate). We select Latin Hypercube Sampling (LHS), because it does not lose generality and can deliver a well-distributed sampling result. 5 As shown in Figure 3, the algorithm samples initial data by LHS. Then, simulations or experiments generate the corresponding outputs. With the initial data points (or together with a few iterations), an SVM classifier is trained to separate the feasible design space from the infeasible one. The data inputs from the infeasible region are deleted, while inputs in the feasible one are passed to the simulator for outputs. To fit multiple outputs simultaneously, ANN is selected as the surrogate type. In the following iterations, data is sampled in batch by LHS for surrogate refinement, with which the sampling rate gradually slows down. We briefly demonstrate the procedure for data generation. Process simulators -Aspen Plus, Dymola, or gPROMS are powerful tools for process modelling. Still, they are not flexible for data storage and are limited in their capacity to access high-level statistic packages. Therefore, it is necessary to establish an interface between process simulators and high-level programming languages (e.g., MATLAB, Python, etc.). Advantages of MATLAB over other programming languages are: reliable optimization and its machine-learning toolboxes, which have been wellestablished for commercial use. The automation of MATLAB to process simulators has previously been reported in process design 38-40 and control 41 . As shown in Figure 4, the inputs are sampled by LHS in MATLAB and passed to simulators, e.g., Dymola for dynamic simulations and AspenPlus for steady-state process simulations. The obtained outputs are sent back to MATLAB for data collection. The obtained dataset is divided into three parts: training, validation, and test at a ratio 70% / 20% / 10%. Random search is used to optimize structure of networks, e.g., the number of layers and the number of neurons. We select the hyperbolic tangent (tanh) as the activation function.

State-of-art for efficient data generation
The two essential elements, the classifier and slow-down sampling, are detailed in this section.

Classifier SVM
The sampled points might fall in the infeasible design space due to extreme operating conditions for experiments (e.g., unexpected reactions occur at high temperature) or nonconverged recycle streams, or integration failure on stiff models during computational simulations. A classifier can be trained to pre-treat the data inputs. Only the selected data inputs can be passed into the simulation or experiment stage, thus saving the average time spent on a single data point.
SVM is a machine learning technique primarily for classification. SVM was initially proposed as a linear classifier, while Vapnik et al. expanded its application as a nonlinear classifier in 1995. 42 It is a mature and reliable method, the success of which was proven in the fields of pattern recognition and computer vision problems. 43 For a typical chemical process, the high-dimensional features (or multiple inputs) and the nonlinearity are unavoidable. Ibrahim has demonstrated its successful application in chemical process engineering. 27 A toolbox of SVM can be accessed in MATLAB, so SVM is selected as the classifier in this work.
The training process for SVM is similar to the steps for surrogate training. Two differences are specified here. Firstly, only the dataset in the initial several iterations is used to train the classifier. This is because the classifier in this work is expected to give a rough classification between infeasible and feasible design spaces, so the iterative refinement for the classifier is not necessary. Secondly, the output for the classifier is binary, 0 and 1: set 0 if the simulation outputs fall on the infeasible space, while setting 1 if the simulation outputs fall on the feasible area. Following this, the data inputs together with the classifier outputs are used to train the SVM.

Slow-down sampling
As illustrated in Figure 2, too much data is not worth sampling, while data generation is usually more time-demanding than surrogate training. Herein, we define the sampling rate as the newly added number of samples per iteration, 7 #$$%$ . A slow-down sampling strategy is considered: slow down the sampling rate while the quality of the iteratively refined surrogates improves.
To achieve this, the improvement rate of surrogate quality needs to be quantified. MSE can be used to quantify the regression performance. We use the moving mean (89: ;;;;;; ) to smooth the fluctuation of the MSE curve. The improvement rate of surrogates can be defined as the MSE decrease per data added, or we call it the <=>?@ &'( as Equation (1). A scaling function can scale the slope values to a relative value, A@=BCDE@ <=>?@ as Equation (2). This value is generally between -1 and 1 (due to the fluctuation, the value of the slope can be positive). The added ratio function can convert the relative slope to the value of the added ratio (typically between 0 and 1). Following this, the number of added samples is calculated by Equation (4), where 7 )**%+ is the maximum adding number.
BGG@G ABCD> , = ABCD>(|A@=BCDE@ <=>?@ , |) The ratio function, Equation (3), can be directly proportional to the absolute value of A@=BCDE@ <=>?@ , . To further decrease the sampling rate near the optimal surrogate, we use a trigonometric-type function as shown in Equation (5). With the assistance of trigonometrictype ratio function, the sampling rate can drop extensively when the relative slope is approaching 0, compared to direct proportionality ( Figure 5).
To better demonstrate the slow-down principle, we use a simple example of two reactions in The two reactions are assumed to obey first-order kinetics, as written in Eqs (6)(7)(8).
Based on this physical model, the concentration profiles of the three species are simulated. An ANNs-based surrogate is iteratively refined by sequential sampling. As Figure 6a indicates, with more data added, the fitting performance improves (MSE decreases). Meanwhile, the decreasing rate of MSE becomes slower ( Figure 6a) and the absolute value of relative slope tends to be smaller ( Figure 6b). Following this, the added ratio decreases (Figure 6c) as well as the same trend is indicated for the sampling rate ( Figure 6d). The algorithm is terminated when |A@=BCDE@ <=>?@ , | < 0.02, and 25 data points in total are sampled until the final iteration. Figure 6. 'Slow-down' principle for sampling rate for surrogate construction of T → W (Scaling factor for the added ratio function is set as S = 1.2).
To evaluate the performance of surrogate in a more direct way, we simulate the concentration profiles of three species using physical model and surrogate model, respectively. Figure 7 shows that the regression performance of the iteratively-refined surrogate gradually improves  For each iteration, the training data is fitted 60 times with the random structures of ANNs. The boxplot is used to show the distribution of the MSE values of these ANN structures based on the validation dataset. The selected ANN is used to predict the outputs of the test dataset. As shown in Figure 8, the values of MSE are decreasing for both validation and test dataset. In other words, the fitting performance gradually improves with the more sampled data added in.
The slow-down principle can be well reproduced, which can be referred to an example of peaks function in Supplementary Information (SI, Figure S1-S2). Sequential sampling is performed four times on the peaks: the sampling trends are similar for the four times and the number of total sampled data points are close to each other (between 190 and 220).

Case studies
Two case studies come from two processes in carbon capture and utilization (CCU): pressure swing adsorption (PSA) and Gas-to-Liquids (GTL), which starts from combined reforming (steam + CO2) of natural gas.

Case study 1: surrogate generation for PSA, a dynamic process built in Dymola
Pressure swing adsorption (PSA) is a cyclic dynamic process for gas separation. As to achieve the objective of net-zero 2050, PSA is regarded as a promising technology for CO2 capture from fossil fuel-based processes. 44-46 Through continuously varying pressure, adsorption switches with desorption for all the process periods ( Figure 9). Eventually, PSA reaches a cyclic steady state (CSS), where consecutive cycles have the same profile.  Table 1 describes the inputs and outputs for the PSA system. In this case, PSA is applied to capture the CO2 from the flue gas of a natural gas power plant.
Due to the low CO2 concentration in the flue gas (~ 4%), one PSA unit cannot guarantee the required purity (90% for carbon capture). PSA in series can be an option. In this work, we mainly focus on performance of the first PSA unit, where recovery of CO2 is supposed to be high enough. The purity of CO2 should improve as well. A trade-off relationship is reported between recovery and purity, 39,44 so the CO2 purity cannot be too high given the priority on recovery. Therefore, we trained an SVM classifier to choose the samples with high recovery (higher part, >50%) and purity (middle part, 25%-75%). The classifier's performance can be referred to in Figure 10, and eventually, only 23% of the initial-sampled data is selected to fall in the desired space. The slow-down sampling is applied to collect data iteratively. Figure 11 indicates that the regression improvement is not significant after 10 iterations, and the corresponding sampling rate decreases in the meanwhile. The relative slope fluctuates significantly along with the iteration (Figure 11b), while the added ratio function helps smooth the fluctuation (Figure 11c).
Eventually, we terminate the algorithm after 50 iterations, since the MSE value seems to hardly decrease after the 40 th iteration. The effect of the two principles can be merged to improve efficiency of surrogate generation for PSA. As shown in Figure 13, if equally sampling without classifier is applied, the time spent on surrogate generation for PSA is 3.9e6 s, which can be reduced by 88% if the two principles apply (4.8e5 s). A separate dataset was used to test the performance of the iteratively refined surrogate for PSA.
We employed the boxplot for the relative errors between the surrogate predictions and the rigorous simulations for the three outputs -recovery of CO2, purity of CO2 in the product flow, and energy consumption of the system. As shown in Figure 14, all outputs can be well predicted with relative errors smaller than 5%.

Case study 2: surrogate generation for GTL, a steady-state flowsheet in Aspen Plus
Gas-to-Liquids is a classical chemical process for production of fuels. 47, 48 We built a flowsheet in AspenPlus (detailed information in SI, Figure S3). The process starts with the combined reforming (steam + CO2) of natural gas to syngas, followed by Fischer-Tropsch (FT) synthesis for fuels. A recycle stream is split: one for reforming, the other for FT rector.
The combined reforming, FT (kinetics and chain growth probability for products distribution), simulation (convergence) and a typical input-output data point of GTL system are presented in the Section S3 of Supplementary Information. To seek the optimal operating condition, we may optimize some decision variables under uncertainty (input for surrogate) to evaluate the corresponding process performance (output for surrogate) as shown in Table 2. Aspen simulations suffer from non-convergence issues when improper operating conditions are given, or the recycle stream is set too tight. 27,49 Such problems also occur in our case.
Initially, MATLAB is used to run the Aspen model in the first two iterations, where nearly 20% of simulations crashed due to non-convergence issues ( Figure 15). The inputs/outputs in the first two iterations are used to train an SVM classifier. The obtained classifier was then applied here to screen out the potential non-converged inputs in the following iterations, thus improving the percentage of effective data ( Figure 15). In the final iteration, less than 9% of inputs resulted in ineffective data, with the pre-treatment of SVM. Notably, the non-converged simulations in Aspen Plus usually takes a long time to stop but deliver invalid outputs. The   The slow-down sampling is applied to collect data iteratively. The relative slope in Figure 16b fluctuates more significantly than the case study of PSA, see Figure 11. That is probably because the GTL has more outputs to fit, and the regression is more complex than the case of PSA. The observed trend in Figure 16a indicates that the regression improvement is not significant after the 25 th iteration. Eventually, we terminate the algorithm after 40 iterations to avoid the unnecessary computation cost.
The two principles can separately improve the sampling efficiency for building surrogates for GTL. As shown in Figure 17a, the slow-down sampling possesses a higher chance for an earlier termination than the equally-sampling, to achieve a similar fitting performance (MSE=1e-4) with fewer data points (slow-down for 13112 data points vs. equally sampling for 18287 data points). The trend in Figure 17b shows that the SVM classifier is able to reduce the average time spent on individual points by nearly 50%. Overall, the effect of two principles can be merged to improve the efficiency of surrogate generation for GTL. As shown in Figure 18, the total time spent on surrogate generation for GTL can be reduced by 60% compared to equally sampling without a classifier. A separate test dataset is used to evaluate the performance of the surrogate obtained in the final iteration. We employ the boxplot for the relative errors between the surrogate predictions and the rigorous simulations for the 10 outputs. As shown in Figure 19, most outputs can be well predicted with relative errors smaller than 5%, and some are even smaller than 1%, e.g., the mass flowrate for the fuel products. The fitting for the utility is not ideal, and the relative error of the electricity consumption can go up to 15%. This is probably due to the insufficient feature selection for utility fitting. For example, the electricity consumption is related to the units of pumps and compressors, while no relevant features are selected into the inputs for the surrogate training. Meanwhile, no features related to heat exchangers are chosen, so the fitting performance of the utility is not as good as the mass flowrates. However, the motivation behind surrogate is to build a reduced-order model to replace the original full-order physical model, and thus sacrificing partial accuracy is unavoidable but acceptable. Figure 19. Prediction performance of the final surrogate for GTL.

Conclusions and outlook
This work developed an efficient workflow for surrogate generation for engineering systems (typically C $#K# ≫ C K+#,L,LM ) regarding data quality and quantity. The proposed workflow can reduce the time for data generation via two principles: (1) a classifier is trained to avoid the undesired design space for data generation and improve the data quality, (2) a slow-down principle is applied to data generation, and meanwhile, more time is allocated to surrogate training (but still negligible compared to data generation). The slow-down sampling significantly lowers the possibility for over-sampling (data quantity). With the proposed workflow, the time of surrogate generation is shown to be reduced by 88% for PSA and 60% for GTL case studies.
The primary goal of this work was to investigate the influence of the sampling rate for the surrogate generation. Sampling was desired to be of a homogenous type, which may be disturbed by exploitation-based methods. As a result, we only considered exploration-based methods in our current methodology workflow. In the future, to further enhance sampling efficiency, the exploration in sampling the global design space should be well balanced with the exploitation in sampling the nonlinear/complex local areas. Another work that can be done is to determine better termination criteria: we tried to stop the algorithm when the MSE difference between two consecutive iterations approached 0, or the slope approached 0, but the fluctuation of MSE values always existed for the case study of GTL or PSA, which made the tolerance value for termination hard to set. One solution could be to apply feature selection techniques (i.e., automatically adjust input variables) to improve fitting performance and reduce its fluctuation during sequential sampling.
Additionally, this work lays foundation for the superstructure optimization of an extensive CCU system. The two case studies presented in this work belong to its two process options.
The sub-processes of CCU are usually modelled in different process simulators, which cause inconvenience for overall simulation or optimization. This work can replace the original subprocess models of CCU with the machine learning-based surrogates, following by overall optimization in a high-level interactive platform.

Competing Interests
Authors declare no competing financial interest.

S1. Reproducibility of the slow-down sampling for peaks function
Peaks function is used as a numerical example to demonstrate the reproducibility of the slow-down sampling. Peaks function consists of two variables ( Figure S1), which can be accessed by the function 'peaks' in MATLAB. The mathematical form is as equation S.1.
The slow-down sampling is applied to generate data sequentially for the surrogate construction ( Figure S2). The termination criterion is set as |relative slople| < 0.02, when the slope is regarded as 0. With four trails, the total number of sampled data falls between 190 and 220. The fitting for the peaks function is relatively more manageable than a typical engineering system, so the termination criterion of peaks function can be well determined. By contrast, we terminate the sampling when the algorithm reaches a maximum iteration for the two case studies used in this work. Figure S1. Sampling for the surrogate building of peaks function. Figure S2. Reproducibility of slow-down sampling for peaks function. Figure S3. Description of a four-stage PSA for CO2 capture.

S2.1. Model equation
The PSA process is operated in a cyclic mode between absorbing desired gas species at a higher pressure and releasing them at a lower pressure. A typical PSA consists of four stages: the gas mixture flows into the column at the high pressure (PH); then the undesired gas species are extracted out due to their weaker interactions with absorbents, while the column pressure decreases to an intermediate pressurethe blowdown pressure (PI); further, the column continues to be evacuated to an even lower pressureevacuation pressure (PL) and the desired product is expected to be extracted in the meanwhile; following this, the column is fed with the gas mixture until the high pressure (PH). In some cases, the high pressure (PH) is set to vacuum level to pursue a higher absorption capacity and lower energy consumption, thus resulting in vacuum swing adsorption (VSA). These four stages make up one cycle of PSA, and the repeating cycles purify the desired product in a cyclic way. This work is mainly focused on a four-stage pressure swing adsorption (PSA) process model for CO2 capture ( Figure S3) as reported in Haghpanah's work. 1 A column packed with solid adsorbent is considered, and the following assumptions are used to derive the balance equations (Equations S.2 -S.20)： (1) A one-dimensional dispersed plug flow model is applied to simulate the bulk fluid flow in the axial direction.
(2) No mass, temperature, or pressure gradient exists in the radius direction.
(3) Ideal gas law is applied for the state of the gas phase.
(4) Ergun equation is used for the pressure drop in the axial direction.
(5) The thermal equilibrium between the gas and solid phase is established instantaneously.
(6) Diffusion through adsorbent pores is considered as molecular diffusion in the macropores.
(7) Multisite Langmuir model is applied to calculate the solid phase saturation loading.
Total mass balance in the gas phase: Component mass balances (J ()*+ − 1) in gas phase: Component mass balance in the solid phase: ?
Energy balance inside column: S.5 Energy balance in the column wall: Pressure drop by Ergun Equation: ?> ?B = − 150 4 The mass transport coefficient is given by I $ * is obtained from a dual-site Langmuir model: whereR $ , T $ are the solid phase saturation loadings of sites 1 and 2, respectively. They can be calculated based on Arrhenius-type temperature dependence:

S2.3. Simulation of PSA
The model equations are discretized using a finite volume method, and partial differential equations (PDEs) are turned into a set of differential-algebraic equations (DAEs). We build this DAEs system in Dymola, 2 a Modelica platform. If the column is discretized as 30 equal volumes ("30" is recommended based on both accuracy and efficiency by Haghpanah 1 ), 1220 equations are generated. In numerical solving, Modelica is an objected-oriented modelling language, 3 Table S1. Binary variables for four stages.
After completing four stages, re-initialize the cycle time (@ (?(@A ) as 0 and then start the simulation of another one cycle of PSA. Consequently, the PSA cycle is simulated iteratively until a cyclic steady state (CSS) is reached. Theoretically, when a CSS is reached, the column profile is expected to be the same between the same step in two subsequent cycles. In the mathematical language, when ax(t) − x(t + t KLKMN )a < δ, PSA is deemed to be under CSS. The dynamic simulation of PSA can be found in Figure S4.
35 Figure S4. The dynamic simulation of PSA.
A full dynamic simulation can contain much more results than what has been shown here, but those results might not be relevant in the process analysis and optimization. This is a motivation to develop a reduced-order surrogate.

S2.4. An example of one input-output data point for PSA surrogate
According to the simulation result as shown in Figure S4, we can collect one input-output data point as shown in Table S2. Obviously, this data point has much less information than a dynamic simulation result in Dymola, but this data point has contained the essential features for the PSA system regarding the operating condition and the corresponding process output. A sufficient number of such effective data points can be used to train a surrogate for the PSA process.

S3. The description of GTL process
As shown in Figure S5, the Gas-to-Liquid (GTL) process is modelled in Aspen Plus, containing combined reforming, FT synthesis, and product upgrading section. This process starts with the combined reforming (CO2 + H2O) of natural gas to syngas, followed by FT synthesis for fuels. Since the upgrading section has little influence on the overall performance, 4 we use a distillation column to simplify it. In this work, more attention is given to the reforming and FT, as well as the flowsheet building in Aspen Plus and its simulation convergence. Figure S5. Flowsheet for GTL built in Aspen Plus. .

S3.1. Why combined reforming?
Reforming is a process of converting natural gas (NG) to syngas (H2 and CO). The relevant approaches range from partial oxidation reforming, autothermal reforming, steam reforming to dry reforming. 5 The main difference among them is the oxidants. The former two approaches require O2 as the oxide agents, while the latter two technologies employ steam and CO2. The most mature technology is steam reforming, which has been commercialized for H2 production for decades. However, none of the individual reforming technology can provide the desired syngas ratio for FT synthesis. Thus, separation technology is required to remove the excessive component. Alternatively, these reforming options can be combined to adjust the ratio between H2 and CO. Baltrusaitis et al. compare five combination configurations and deliver the conclusion that the combination of steam reforming and dry reforming is economically favored over other options. 6 Also, such a combination can reduce carbon footprint by 67% over conventional steam reforming. 7 Meanwhile, the existence of steam is preferred because the dry reforming suffers severe coke formation. 8 Another novelty of combined reforming (CO2 + steam) is that CO2 is regarded as one of the raw materials. Thus, unlike the conventional process (autothermal reforming or steam reforming), a CO2 separation unit is not required before the reforming unit in the process design. Based on the thermodynamic analysis, 9 the combined reforming has the potential to fully convert the methane while maintaining a high yield of H2 at 88%. Therefore, we consider the combined reforming (CO2 + steam) for syngas production in this work.
Combined reforming

S3.2. Fischer-Tropsch (FT) reaction
Fischer-Tropsch (FT) has a long industrial history of producing high-quality fuels. 10,11 The FT products are commercially favored mainly due to S-free and N-free. FT can be classified into two types: lowtemperature FT (LTFT) based on cobalt catalysts and high-temperature FT (HTFT) based on iron catalysts. The FT reaction is highly exothermic, and a lower temperature can improve the final conversion regarding the thermodynamic equilibrium. By contrast, the high temperature tends to terminate the carbon chains, resulting in shorter hydrocarbons. 12 In recent years, LTFT is more popular because it can produce long-chain hydrocarbons and is less energy-intensive. 11 Additionally, most latest research works focus on the cobalt catalysts, 4,11,[13][14][15][16] 14 Herein, we assume that the value of the catalyst improvement factor is 10 for this work. The variables a and b can be expressed by the following equations, where, 1 ) : pre-exponential factor; 9 * : activation energy; 3 ) : adsorption coefficient; Δ + E: adsorption entahlpy.

S3.2.2. Product distribution
Yates' kinetics only describes the consumption rate of CO, but no more information is given concerning the products. The Anderson-Schulz-Flory mechanism can be used to explain the distribution of FT products. 11 Since most FT products are linear hydrocarbons, the reaction can be regarded as the polymerization process to grow a long chain. Figure S6a demonstrates how a small carbon chain extends to a long chain. The chain growth probability, Q, is a parameter to denote whether the chain continues to grow or not: Q for growth, whereas (1 − Q) for termination. Figure  S6b shows how Q can determine the product distribution. Typically, FT is used to produce long hydrocarbons, which can be used as waxes or be easily cracked for short chains. Hence, the value of Q should be high enough (>0.9). In our work, we use 0.93 for it. We plot the corresponding product distribution as Figure S7. Since the properties of hydrocarbons (HCs) are similar, we use several components to represent the whole range of HCs for simplification during the simulation as the Table S4. Based on the distribution of FT products, the molar fractions of representative HCs can be calculated from the sum of molar fraction of corresponding FT products. In this work, we use C 1 H -1 to approximate gasoline, while C -2 H 3! is regarded as diesel. For further simplification, wax is not taken into consideration in the system.

S3.3. Flowsheet establishment of GTL in Aspen Plus
We establish the Aspen Plus process model of GTL mainly by referring to the prior works of Ha et al., 4 Lee et al. 18 and Zhang et al. 21,22 . To deal with a petrochemical system like GTL, Peng-Robinson is recommended as the thermodynamic method.

S3.3.1. Combined reforming section
In the reforming section, GTL starts with NG, water and CO2. A typical composition of NG can be referred to Bao's work (Table S5). 19 The water is heated to steam before reforming. In the prereformer, all the carbon components are converted to CO and CH4. In the reformer, CH4 is converted to syngas with the assistance of CO2 and steam. The reforming is performed at a high temperature between 700 and 1000 ℃, so it is assumed to reach equilibrium. In Aspen Plus, the reformer is modelled by an RGibbs reactor, where the total Gibbs energy is minimized to the reach the equilibrium ('Restricted Chemical Equilibrium' is set for the combined reforming reactions). A flowsheet option is set to vary the flowrate of H2O and NG to guarantee the ratio of CO : H2 falls in a range of 2 -2.2 in the reformer outlet. Before the FT section, the mixed stream is cooled down and split into a gas stream (mainly syngas and CO2) and a liquid stream (primarily water).

S3.3.2. Fischer-Tropsch section
In the FT section, the syngas is pressurized before entering the FT reactor. RPlug is chose to simulate the multi-tubular fixed-bed reactor for FT in Aspen Plus. Yates' kinetics is used for the overall consuming rate of CO (-%& ). Since the properties of hydrocarbons (HCs) are similar, we employ four reactions (:1~:4) to represent the whole range of HCs for simplification during the simulation as the Table S6. The sum of CO consuming rates of individual reactions should be equal to the overall consuming rate of CO (-%& ). Then the individual reaction rates are obtained from the reaction stoichiometry. CO2 is reported not to react on the Cobalt-based catalyst and can be regarded as an inert gas in the FT reaction. 16,23 Thus, no reaction associated with CO2 is listed in the FT reactor. Based on the information mentioned above, we inserted the kinetics into RPlug.

S3.3.3. Separation and recovery
Following the FT reactor, a three-phase flash is used to split the mixed stream into gas, liquid HCs and wastewater. As a simplification to the upgrading system, we use a distillation column (RadFrac) to separate gasoline from diesel. For the gas mixture in the simulation, we use an ideal separator (in the real world, PSA can be an option) to recycle all the C1 components. The GTL system contains inert gas (N2), which must be purged (otherwise, it will gradually accumulate in the recycle stream, making the convergence impossible to achieve). Herein, the recycle stream is split to vent (to purge) and C1REC, which will be followed by splitting into reforming section and FT sections, respectively.

S3.3.4. Utility and its integration
The heating utility is supplied by high-pressure steam and fuel gas over 1000 ℃. The cooling utility is provided by air and cooling water at room temperature. Pump and compressor are powered by electricity.
The reforming is highly endothermic, and the reforming section can account for over 50% of energy consumption in the overall GTL process. A high temperature is required for the reforming reaction. Thus, the reformer outflow has an extremely high temperature and needs to be cooled before the FT process. We built three heat exchangers to gradually cool down the reformer outflow, while the recycled heat is used to pre-heat the mixed stream to the gas form (>100 ℃), an intermediate temperature for pre-reformer (~ 500 ℃) and a high temperature for reformer (700 ~ 1000 ℃).
Additionally, the purge stream contains CO and CH4, which will bring in considerable greenhouse emissions if the direct emissions apply. With the assistance of air, a burner is used to deal with these C1 components. An RGibbs reactor @ 600 ℃ is used to simulate the burner. Due to the exothermic reactions, the burner will release heat, while the waste heat recovery technology 24,25 can be used to recover the part of burner heating (utilization efficiency is assumed at Z +56786 = 60%) to reduce the heating utility of steam or fuel gas.

S3.3.5. Convergence strategy
The modelled GTL system has two flowsheet options (design specifications) to vary the flow rates of H2O and NG, as well as two recycle streams flowing to reforming/FT sections. As a result, the flowsheet convergence can be pretty tricky. After trails and errors, we would suggest a two-stage strategy for a fast convergence for this case. In the first stage, the convergence of mass flows is performed. Newton algorithm is selected to converge the two design specifications, while Broyden or Wegstein algorithm is used to tear the stream FTIN. Because the recycle loop and design specifications can interact with each other in our case, the tear stream has to be solved simultaneously with the two design specifications (Outside-Simultaneous is recommended for sequencing). After the mass flow is converged, the heat flow is then converged in the second stage.
Wegstein algorithm is used to tear the heating streams H1, H2 and H3, which can be made as a convergence block. The heating convergence block is sequenced after the first-stage convergence. Meanwhile, this explains why we use HXFlux to model the heat exchangers. Unlike a typical HeatX involved with mass flows (solid lines), HXFlux only deals with heat flows (dashed lines as shown in Figure S5).

S3.4. Simulation result of Aspen Plus for GTL
We present a typical simulation result of flowrates and utility consumptions as shown in Table S7 and Table S8. For the inlet flow for FT, the ratio of H2 : CO = 2.16 (FTIN) falls in the desired range for the FT reaction. A full Aspen Plus simulation can contain much more results than what has been shown here, but those results might not be relevant in the process analysis and optimization. This is a motivation to develop a reduced-order surrogate.

S3.5. An example of one input-output data point for GTL surrogate
According to the simulation result as shown in Table S7 and Table S8, we can collect one inputoutput data point as shown in Table S9. Obviously, this data point has much less information than an Aspen Plus simulation result, but it has contained the essential features for the GTL process regarding the operating condition and corresponding process output. A sufficient number of such effective data points can be used to train a surrogate for the GTL process.