Stability of saddle points via explicit coderivatives of pointwise subdifferentials

We derive stability criteria for saddle points of a class of nonsmooth optimization problems in Hilbert spaces arising in PDE-constrained optimization, using metric regularity of infinite-dimensional set-valued mappings. A main ingredient is an explicit pointwise characterization of the Fr\'echet coderivative of the subdifferential of convex integral functionals. This is applied to several stability properties for parameter identification problems for an elliptic partial differential equation with non-differentiable data fitting terms.

To x ideas, a prototypical example is the L tting problem [ ] ( . ) min u S(u) − y δ L + α u L .
Here, G(u) = α u L , K(u) = S(u) − y δ , where S maps u to the solution y of −∆y + uy = f for given f , and F (y) = y L . This formulation is appropriate if the given data y δ is corrupted by impulsive noise. Here, F * ( ) = ι { | (x ) | ≤ } ( ), where ι A denotes the indicator function of the set A in the sense of convex analysis [ ]. A second example is the Morozov (constrained) formulation of inverse problems appropriate for data subject to uniformly distributed noise [ ], ( . ) min u α u L s. t. |S(u)(x) − y δ (x)| ≤ δ a. e. in Ω.
Here, G and K are as before, while F (y) = ι { |y (x ) | ≤δ } (y) and hence F * ( ) = δ L . A similar problem arises in optimal control of partial di erential equations with state constraints.
As we show in Section . , critical points of the saddle point problem ( . ) may be characterized concisely through the variational inclusion ∈ R (u, ), where we de ne the set-valued mapping R : Our goal then is to study the stability of solutions to ( . ) resp. ( . ) through set-valued analysis of this mapping. The main tool is a form of Lipschitz continuity of R − known as the Aubin property (or pseudo-Lipschitz or Lipschitz-like property). This is also called the metric regularity of R ; see, e.g., [ , , , , ]. In contrast, second-order stability analysis for PDE-constrained optimization problems is usually focused on the stability of the optimal values and of minimizers (as opposed to saddle-points) and is based on su cient second-order conditions based on directional derivatives, often in stronger topologies; nonsmoothness typically arises from pointwise constraints or, more recently, sparsity penalties. We refer to [ , ] as well as the literature cited therein.
Since the problem ( . ) is nonsmooth, the rst-order conditions involve proper subdi erential inclusions and hence the second-order analysis required for showing metric regularity involves set-valued derivatives. Considerable e ort has been expended on obtaining explicit representations of these derivative, although up to now primarily in the nite-dimensional setting, e.g., in [ , , , , ], with a focus on normal cones arising from inequality constraints. The di culty in the in nite-dimensional setting stems from the fact that there exists a variety of more or less abstract de nitions of such objects, see, e.g., [ ], although more explicit characterizations can be obtained in some concrete situations [ ]. Here, by exploiting the fact that the nonsmooth functionals are de ned pointwise via convex integrands, we are able to explicitly compute regular coderivatives pointwise using the nite-dimensional theory from [ ]; see also [ ] for further developments on their calculus. One of the main contributions of this work is therefore to further narrow the gap between the concrete nite-dimensional and the abstract in nite-dimensional settings.
Besides being of inherent interest, e.g., for showing stability of the solution of the inverse problem with respect to δ , metric regularity is also relevant to convergence of optimization methods. In the context of the saddle point problem for the Lagrangian ( . ), it is required for the nonlinear primal-dual hybrid gradient method of [ ]. More widely, through the equivalence [ ] of the Aubin property to the recently eminent Kurdyka-Łojasiewicz property [ , , ], metric regularity is relevant to the convergence of a wide range of descent methods [ ]. It can also be used to directly characterize the convergence of certain basic optimization methods [ , , ]. Metric regularity is also closely related to the concept of tilt-stability, mainly studied in nite dimensions, see, e.g., [ , , , , ], but recently also in in nite dimensions [ , ]. An extended concept incorporating tilt stability is that of full stability [ , ]. We also note that when the non-linear saddle-point problem can be written as the minimization of a di erence of convex functions -as any C objective can [ ] -detailed characterizations exist in the nite-dimensional case of local minima [ ] and sensitivity [ ]. Moreover, a setvalued analysis of the solvability of such programs with further symmetric cone structure is performed in [ ]. In certain cases, with a nite-dimensional control u in an otherwise in nite-dimensional problem, it is also possible to do away with the regularizer G [ ].
This work is organised as follows. In Section , we derive pointwise characterizations of second-order subdi erentials or generalized Hessians of integral functionals on L and give examples for several functionals commonly occurring in variational methods for inverse problems, image processing, and PDE-constrained optimization. These results are used in Section to give an explicit form of the Mordukhovich criterion for set-valued mappings in Hilbert spaces, in particular for those arising from subdi erentials of the integral functionals considered in the preceding section. Section further specializes this to the case of set-valued mappings arising from the rst-order optimality conditions ( . ) and gives su cient conditions for several stability properties such as stability with respect to perturbation of the data. Finally, Section discusses the satis ability of these conditions in the speci c case of the model parameter identi cation problems ( . ) and ( . ), where it will turn out that stability can only be guaranteed after either introducing a Moreau-Yosida regularization or a projection to a nite-dimensional subspace in F .

L (Ω)
Sadly, we cannot as in [ ] directly use the clean nite-dimensional theory from [ ] to show the Aubin property of R − through the Mordukhovich criterion [ ]. We have to delve into the various complications of the in nite-dimensional setting as presented in [ ]. The rst one is the multitude of di erent de nitions of set-valued generalized derivatives. Luckily however, as it will turn out, because of the pointwise nature of the non-smooth functionals whose second derivatives we require, we will be able to compute the pointwise di erentials using the nite-dimensional theory, and limit ourselves to the regular coderivative in in nite dimensions. Although the results of this section hold for integral functionals on L p for any ≤ p < ∞, we restrict the presentation to the Hilbert space L for simplicity (and since we will make use of the Hilbert space structure of the saddle-point problem ( . ) later anyway).
. We rst collect some notations and de nitions for set-valued mappings in Hilbert spaces, following [ , ] and simplifying the setting of the latter to Hilbert spaces. The symbols X , Y , Q, and W generally stand for (in nite-dimensional) Hilbert spaces, which we identify throughout with their duals via the Riesz isomorphism. With x ∈ X , we then denote by B(x, r ) the open ball of radius r > . The closure of a set A we denote by cl A.
Definition . . Let U ⊂ X for X a Hilbert space. Then we de ne the set of Fréchet (or regular) normals to U at u ∈ U by and the set of tangent vectors by For a convex set U , these coincide with the usual normal and tangent cones of convex analysis.
For our general results, we will need to impose some geometric regularity assumptions.
Definition . . We say that a tangent vector z ∈ T (u; U ) is derivable if there exists an ε > and a curve ξ : exists with ξ + ( ) = z and ξ ( ) = u. We say that U is geometrically derivable if for every u ∈ U , every z ∈ T (u; U ) is derivable.
It is easy to see that (cf. [ , Prop. . ]) U is geometrically derivable if and only if T (u; U ) for each u ∈ U is de ned by a full limit, i.e., we replace in ( . ) the existence of τ i by the requirement of the existence of u i for any sequence τ i . For any cone V ⊂ X , we also de ne the polar cone We use the notation R : Q ⇒ W to denote a set-valued mapping R from Q to W ; i.e., for every The regular coderivatives of such maps are de ned graphically with the help of the normal cones.
Definition . . Let Q and W be Hilbert spaces, and R : Q ⇒ W with dom R ∅. We then de ne We also de ne the graphical derivative DR(q|w) : Q ⇒ W by The graphical derivative may also be written as [ , ] ( . ) Here lim sup t →t A t stands for the outer limit of a sequence of sets {A t ⊂ W } t ∈T over an index set T , de ned as Observe that DR(q|w) : Q ⇒ W whereas D * R(q|w) : W ⇒ Q. Indeed, if R(q) = Aq for a linear operator A between Hilbert spaces, then for w = Aq holds The former is immediate from ( . ) (see also, e.g., [ , Ex. . ]), while the latter is contained in [ , Cor. . ]. We say that Graph R is locally closed at (q, w), if there exists a closed set U ⊂ Q ×W such that Graph R ∩ U is closed. For any convex lower semicontinuous function f : Q → R‚ the graph of the subdi erential ∂ f , considered as a set-valued mapping, is closed. This is an immediate consequence of the de nition of the convex subdi erential. Finally, R is called proto-di erentiable if Graph R is geometrically derivable.
. Let X = L (Ω; R m ) for an open domain Ω ⊂ R n and G : X → R be given by Here we assume that (i) is normal, i.e., the epigraphical mapping x → epi (x, ·) ⊂ R m × R is closed-valued and measurable, (ii) is proper and convex, i.e., the mapping z → (x, z) is proper and convex for each xed x ∈ Ω.
We call an integrand satisfying (i-iii) regular. Note that (i) already implies that the mapping z → (x, z) is lower semicontinuous for each xed x ∈ Ω and that (x, u(x)) is measurable for Since X = L (Ω; R m ) is decomposable, it su ces to have existence of at least one u ∈ dom G to be able to compute pointwise the Fenchel conjugate and the convex subdi erential where conjugate and subdi erential of (x, z) are understood as taken with respect to z for x xed; see [ , Thm. C] and [ , Cor. F], respectively. In order to calculate D * ∂G(x), we observe that Since ε > was arbitrary, we deduce This proves one direction of ( . ), which therefore holds even without geometric derivability. Now we have to prove the other direction, where we do need this assumption. So, let z ∈ N (u; U ). We have to show that z(x) ∈ N (u(x); C(x)) for a. e. x ∈ Ω. Suppose this does not hold. Using the standard polarity relationship N (u(x); C(x)) = [T (u(x); C(x))] • , e.g., from [ , Thm. . ], we can nd δ > and a Borel set E ⊂ Ω of nite positive Lebesgue measure such that for each x ∈ E there exists w(x) ∈ T (u(x); C(x)) with w(x) = and z(x), w(x) ≥ δ . We may without loss of generality take C(x) geometrically derivable for each x ∈ E. By De nition . there then exists for each x ∈ E a curve ξ (·, x) : [ , ε(x)] → C(x) such that ξ + ( , x) = w(x) and ξ ( , x) = u(x). Let us pick c ∈ ( , δ ). By replacing E by a subset of positive measure, we may by Egorov's theorem assume the existence of ε > such that It follows With u i ũ /i for i ∈ N, it follows that lim i→∞ L i > . This provides a contradiction to z ∈ N (u; U ). Thus z(x) ∈ N (u(x); C(x)) for a. e. x ∈ Ω, nishing the proof of ( . ).
We still have to show ( . ). The inclusion follows from the de ning equation ( . ) and the fact that a sequence convergent in L (Ω) converges, after possibly passing to a subsequence, pointwise almost everywhere.
For the other direction, we take for almost every x ∈ Ω a tangent vector z(x) ∈ T (u(x); C(x)) at u(x) ∈ C(x). For the inclusion ( . ), we only need to consider the case z ∈ L (Ω; R m ). By geometric derivability, we may nd for a. e. x ∈ Ω an ε(x) > and a curve ξ (·, x) : [ , ε(x)] → C(x) such that ξ ( , x) = u(x) and ξ + ( , x) = z(x). In particular, for any given c > , we may nd then by ( . ), |ũ c,t (x) − u(x)| ≤ t(c + |z(x)|) for a. e. x ∈ Ω, so that Moreover, ( . ) also gives For each i ∈ N we can nd t i > such that z χ Ω\E /i, t i ≤ /i. This follows from Lebesgue's dominated convergence theorem and the fact that L m (Ω \ E c,t ) → as t → . The estimates ( . ) and ( . ) thus show that u i u /i,t i satisfy u i → u and (u i − u)/t i → z. Therefore z ∈ T (u; U ), nishing the proof of ( . ).
As a corollary, we may calculate D * ∂G(u|w) for G of the form ( . ).
Corollary . . Let G : L (Ω; R m ) → R have the form ( . ) for some regular integrand . Then the regular coderivative of ∂G at u for ξ in the direction ∆ξ , where u, ξ , ∆ξ ∈ L (Ω; R m ), is given by Likewise, the graphical derivative at u for ξ in the directon ∆u is given by for a. e. x ∈ Ω .
Proof. As we have already remarked in the beginning of the present Section . , the sets are by [ , Prop. . , Ex. . , Thm. . ] geometrically derivable for regular integrands . The present result therefore follows by direct application of Proposition . to the set U = Graph ∂G with ∂G given in ( . ).
More generally, we have the following.
for some Borel-measurable and pointwise a. e. proto-di erentiable set-valued function p : Ω×R m ⇒ R k . Then the regular coderivative of P at q for w in the direction ∆w, where q ∈ L (Ω; R m ), and w, ∆w ∈ L (Ω; R k ), is given by Likewise, Corollary . . Let P be a pointwise set-valued functional as in Corollary . , and let h : Q → W be single-valued and Fréchet di erentiable. Then Remark . . Using ( . ), it is not di cult to obtain the characterization ) for a. e. x ∈ Ω of the limiting normal cone N (u; U ) lim sup U u →u N (u ; U ). The proof is based on L convergence giving pointwise a. e. convergence for a subsequence, and in the other direction, reindexing nite-dimensional sequences to get sequences convergent in L . The expression ( . ) then allows obtaining corresponding versions of the corollaries above for the limiting coderivative D * (which enjoys a richer calculus) in place of the regular coderivative D * .
However, the stability analysis which is the focus of this work rests on the relation between the regular coderivative and the graphical derivative, discussed in the following section, which does not hold for the limiting coderivative (which has a similar relation with the regular derivative). In particular, we cannot work in the same way with the convexi ed graphical derivative which is a key step in our analysis (see Section . below). Hence, we do not treat this case in detail here.
. Corollary . and Corollary . give us computable expressions for the coderivative for pointwise set-valued mappings in in nite dimensions in terms of the coderivative in nite dimensions. It is, however, often easier to work with the graphical derivative ( . ). From [ , Prop. . ] we nd for R : Here, for a general set-valued mapping : Q ⇒ W , the upper adjoint * + : W ⇒ Q is de ned via * + (∆w) {∆q ∈ Q | ∆q, ∆q ≤ ∆w, ∆w when ∆w ∈ (∆q )}.
In general, the graph of the regular coderivative need not be a convex set. It is often more convenient -and for our analysis su cient -to work with its convexi cation. To see this, observe rst that by de nition of the upper adjoint, and minding the negative sign of ∆w in ( . ), the relation ( . ) is equivalent to N ((q, w); Graph R) = T ((q, w); Graph R) • .
In particular -simply through the de nitions of polarity and convexity, -the convex hull of the tangent cone satis es we therefore deduce where the rst equality holds due to the nite-dimensional setting, while the second equality holds generally due to the properties of convex hulls and polars.
The following central results shows that for the pointwise functionals that are the focus of this work, both equalities hold even in the in nite-dimensional setting.
Further, if there exists a set E ⊂ Ω with L m (Ω \ E) > and then by constructing we observe for su cient large t the condition ∆q, ∆q > ∆w, ∆w .
Moreover, by the pointwise character of P, also ∆w ∈ DP(q|w)(∆q ). Thus the implication in ( . ) actually holds both ways, which is exactly what we set out to prove. Finally, [DP(q|w)] * + = [ DP(q|w)] * + always holds, as we have already observed in ( . ).
Similarly to Corollary . , we have the following immediate corollary.
Corollary . . Let P be a pointwise set-valued functional as in Theorem . , and let h : Q → W be single-valued and Fréchet di erentiable. Then Proof. Let us set R P + h, and recall from Corollary . that This is the same to say as that ∆q Theorem . this holds if and only if ∆q, ∆q ≤ ∆w, ∆w when ∆w ∈ DP(q|w )(∆q ), or equivalently which is just the same as ∆q, ∆q ≤ ∆w, ∆w when ∆w ∈ DP(q|w )(∆q ) + ∇h(q)∆q .

Now we just use ( . ) to derive ( . ).
Corollary . . Let G : L (Ω; R m ) → R have the form ( . ) for some regular integrand . Then the graphical derivative of ∂G at u for ξ in the direction ∆u, where u, ξ , ∆u ∈ L (Ω; R m ), is given by Moreover, Proof. The claim follows directly from Theorem . with p(x, z) = ∂ (x, z). Local closedness of Graph ∂ (x, ·) is a consequence of the lower semicontinuity of .
. We now study speci c cases of the nite-and in nite-dimensional second-order generalized derivatives, relevant to our model problems ( . ) and ( . ). Other examples satisfying the assumptions are the piecewise linear-quadratic "multi-bang" and switching penalties introduced in [ ] and [ ], respectively.
From Corollary . , we immediately obtain Then . .
The following lemma is useful for computing D[∂F * ]( |η) for the problem ( . ). Its claim in the one-dimensional case (m = ) is illustrated in Figure . Lemma In particular, if m = , then as well as Figure : Illustration of the graphical derivative and regular coderivative for The dashed line is Graph ∂ f . The dots indicate the base points (z, ζ ) where the graphical derivative or coderivative is calculated, and the thick arrows and gray areas the directions of (∆z, ∆ζ ) relative to the base point. The labels (i) etc. indicate the corresponding case of ( . ).
} for the twice continuously di erentiable mapping x → x and the polyhedral set (−∞, α] satisfying the contraint quali cation. For the full proof of ( . ), using second-order subgradient theory from [ ], we refer to [ ]. For completeness, we provide here an elementary proof of the one-dimensional case ( . ). We have otherwise.
There is a small omission in [ , Lemma . ], that actually causes D(∂ f )(z|ζ )(∆z) instead to be calculated. In calculating the subdi erential of ( . ) therein, at the end of the proof of the lemma, the cases y, w = and y, w < need to be calculated separately to give the two di erent sub-cases of ( z = α and ζ = ) in our expression ( . ).
(ii) Let us then suppose |z| = α, ∆z = , but ζ = . In this case, choosing z i = , we have by ( . ) free choice of we obtain the second case of ( . ).
(iii) If |z| = α and z∆z > , then ∂ f (z + t i ∆z i ) = ∅ for large i. Therefore it must hold that z∆z ≤ . If ∆z , it follows that ζ i = (for large i). Since ζ is xed, the limit ( . ) does not exist unless ζ = , in which case also ∆ζ = . This is covered by the third case of ( . ).
Likewise, we obtain the empty coderivative if |z| > α, since even ∂ f (z) is empty and ζ does not exist. Together, we obtain the nal case in ( . ).
Finally, regarding D(∂ f )(z|ζ ) with m = , we see that only the case |z| = α and ζ = is split into two sub-cases in ( . ), yielding an altogether non-convex Graph[D(∂ f )(z|ζ )]. Taking the convexi cation of this set yields ( . ); cf. Figure . Corollary for the cone Proof. The claim about the graphical derivative follows from Corollary . and Lemma . , using the fact that the indicator function of a closed convex set is normal. The regular coderivative formula follows from the more general Proposition . in the appendix. Here, in the derivation of the explicit form of the polar cone V ∂F * ( |η) • , we use the fact that D[∂F * ]( |η)(∆ ) is non-empty if and only if Remark . . If ( , η) satisfy the strict complementarity condition | (x)| < α or |η(x)| > for a. e. x ∈ Ω, the degenerate second and third case in ( . ) (corresponding to the gray areas in Figure ) do not occur, and the cone simpli es to Note that points x ∈ Ω where a degenerate case occurs are precisely those where there is no graphical regularity of ∂ f at ( (x), η(x)). We refer to [ , Thm. . ] for the de nition of this concept, which we do not require in the present work.
The following lemma is useful for computing D[∂F * ] for the problem ( . ). Its claim in the one-dimensional case (m = ) is illustrated in Figure . Lemma In particular, if m = , then as well as Figure : Illustration of the graphical derivative and regular coderivative for The dots indicate the base points (z, ζ ) where the graphical derivative or coderivative is calculated, and the thick arrows and gray areas the directions of (∆z, ∆ζ ) relative to the base point. The labels (i) etc. indicate the corresponding case of ( . ).
Proof. In the case m = , the proto-di erentiability of ∂ f * follows from the fact that f * is piecewise linear and hence twice epi-di erentiable; see We again proceed by case distinction.
(ii) If z and ζ = z/ z , for any z = z + t ∆z / ∆z with z → z and t , we have also ∂ f * (z ) = z / z . The rst case in ( . ) now follows immediately from computing the outer limit (iii) If z = , and ∆z , then z and z / z = ∆z / ∆z . Therefore will only have limits if ζ lies on the boundary of B( , ), and indeed ζ = ∆z/ ∆z . This gives the limit {ζ } ⊥ , i.e., the second case.
If ζ = , then we obtain the limit and hence the third case.
(v) In the same situation, choosing ζ < gives the limit R m and hence the fourth case.
Finally, ( . ) is a trivial specialization of ( . ), while regarding D(∂ f * )(z|ζ ) with m = , we see that only the case z = and |ζ | = is split into two sub-cases in ( . ). These produce an altogether non-convex Graph[D(∂ f * )(z|ζ )]. Taking the convexi cation of this set yields ( . ); cf. Figure . Corollary . . Let f * (z) δ |z| and for the cone and its polar Proof. The claim about the graphical derivative follows from Corollary . and Lemma . , using the fact that (z) = |z| is nite-valued and Lipschitz continuous and hence normal. The regular coderivative formula follows from the more general Proposition . in the appendix. To derive the explicit form of the polar cone V ∂F * ( |η) • , we employ the fact that Remark . . If ( , η) satisfy the strict complementarity condition (x) or |η(x)| < δ for a. e. x ∈ Ω, the degenerate second and third case in ( . ) (corresponding to the gray areas in Figure ) do not occur, and the cone simpli es to Again, points x ∈ Ω where a degenerate case occurs are precisely those where graphical regularity fails to hold for ∂ f * at ( (x), η(x)).
This example is useful for spatially or temporally varying "tube" constraints, which arise in the regularization of inverse problems subject to variable noise levels [ ]. The indicator function of temporally variable constraints also appears in Moreau's sweeping process, which is a model for several phenomena from nonsmooth mechanics such as elastoplasticity [ ].
Due to the measurability of α and β, the integrand f is proper, convex and normal [ , Ex. . ], such that the subdi erential ∂ f (x, ·) can be computed pointwise. Furthermore, f is a. e. proto-di erentiable as the indicator function of the convex polyhedral set [α(x), β(x)]; see again [ , Ex. . & Thm. . ]. By simple pointwise application of Lemma . we can thus compute D[∂ f (x, ·)]. We therefore deduce the applicability of Corollary . to and obtain a pointwise characterization of D(∂F ) similar to Corollary . . Clearly, we can analogously modify Corollary . (squared L (Ω; R m ) norm) and Corollary . (L (Ω; R m ) norm) by, e.g., introducing a spatially varying weight in each norm.
To pave the way towards studying the stability of saddle point systems in the following section, we now recall general concepts for the study of variational inclusions and develop general results that quickly specialize to saddle point systems in L .

.
Our stability analysis is based on the following set-valued Lipschitz property [ , , ], also known as the Aubin property of R − .
Definition . . We say that the set-valued mapping R : Q ⇒ W is metrically regular at w for q if Graph R is locally closed and there exist ρ, δ, > such that We denote the in mum over valid constants by R − ( w | q), or R − for short when there is no ambiguity about the point ( w, q).
A simpli ed view, indicating why this concept is useful, can be seen by taking q satisfying ∈ R( q). Setting q = q and w = in ( . ), we then obtain Therefore, if we perturb the variational inclusion ∈ R( q) -typically an optimality conditionby a small linear perturbation w, we will still nd a nearby solution to the perturbed problem. We will later see that for our problems of interest, we can encode variations in data and in an additional Moreau-Yosida regularization parameter into w. We therefore need to estimate R − , for which the following Mordukhovich criterion [ ] will be useful. It is also contained in [ , Thm. . ] and simpli ed here to our Hilbert space setting from the original Asplund space setting.
Theorem . . Let R : Q ⇒ W be a set-valued mapping between Hilbert spaces Q and W . Suppose Graph R is locally closed around (q, w) ∈ Graph R. Then Here, for positively homogeneous M : W ⇒ Q, we have de ned If R satis es the regularity assumption D * R(q|w) = [DR(q|w)] * + (which is the case for pointwise mappings due to Theorem . ), we may translate Theorem . to be expressed in terms of the graphical derivative DR, where by the second equation in ( . ) it su ces to consider the convexi cation DR. This is the content of the next proposition.
Proposition . . Let R : Q ⇒ W be a set-valued mapping between Hilbert spaces Q and W . Suppose Then Proof. From the De nition . of D * R(q|w) and D * R − (w |q) through N ((w, ); Graph R), we observe from the de nitions that ∆w ∈ D * R(w |q)(∆q) ⇐⇒ −∆q ∈ D * R − (q|w)(−∆w).
Applied to R − , Theorem . therefore gives Referral to ( . ) and the fact that . We now derive necessary and su cient conditions for the Aubin property to hold for variational inclusions involving second-order set-valued derivatives of pointwise functionals. As seen in Section . , these commonly have the structure of a sum of a linear operator and a cone. In fact, for the following analysis, it su ces that the graphical derivatives merely contain such a sum in order to derive upper bounds; this will be important for treating discretization by projection in Section . . We therefore assume therefore that W = Q = L (Ω; R N ) and that for some linear operator T T q : Q → Q, dependent on q but not w, and a cone V V (q|w) ⊂ Q, dependent on both q and w. Here we recall from ( . ) that V • is the polar cone of V . Although it will not be needed in our analysis, an explicit characterization of the regular coderivatives of set-valued mappings satisfying ( . ) (with equality) is derived in Appendix for completeness.
Following the reasoning in [ , Prop. . ], we may, using the structural assumption ( . ), continue from Proposition . to derive We illustrate this expression geometrically in Figure . Observe also that if ( . ) holds as an equality, then so does the rst inequality in ( . ). That is, in this case We can use ( . ) and the expansions above to estimate R − (w |q ) for R = Hū and R = R . To study stability and metric regularity, we however still need to pass to This in essence involves a uniform c > in the condition for all (q , w ) close to (q, w).
If we assume continuity of the mapping q → T q , we can simplify this condition. The following lemma prepares the way for the stability analysis of saddle points in the next section (cf. ( . ) below).
Lemma . . Let q, w ∈ Q = W = X × Y , and suppose that for (q , w ) in a neighborhood U of (q, w), Graph R ∩ U is closed, ( . ) holds, and we have for a cone V (q |w ) ⊂ Y . In addition to these structural assumptions, assume the continuity at q of q → T q , and for some c > the bound Moreover, if ( . ) holds as an equality, then R − (w |q) < ∞ if and only if a(q|w; R) > .
Proof. Suppose ( . ) holds, and pick c ∈ ( , c). Whenever t > is small enough and w and q satisfy w ∈ R(q ), q − q < t and w − w < t, the bound ( . ), the continuity of q → T q , and the inclusion guarantee the estimate The latter says that R − (w |q ) ≤ c − .
By ( . ) and ( . ), therefore Since c ∈ ( , c) was arbitrary, this proves ( . ). If ( . ) does not hold, and ( . ) holds as an equality, we can, given ε > , nd for every t > a pair (∆w, z) Thus, by the de nition of W t R (q|w), we can also nd q and w satisfying w ∈ R(q ), q − q < t and w − w < t such that ∆w ∈ V (q |w ) and z ∈ V (q |w ) • .
Recalling ( . ), which holds as an equality under the present assumption that ( . ) holds as an equality, this implies that R − (w |q ) ≥ ε − .
Since t > was arbitrary, we have as well that Finally, since ε > was arbitrary, it follows that R − (w |q) = ∞ if ( . ) does not hold.
We now apply the results of the preceding section to saddle points characterizing minimizers of nonsmooth optimization problems of the form ( . ). In particular, we assume that for a proper, convex, lower semicontinuous f * and, motivated by the problems considered in the next section, . We rst write the rst-order optimality conditions ( . ) for the problem ( . ) as an inclusion for a set-valued mapping and compute its derivative. For q = ( u, ) to be a saddle point of ( . ), the Lagrangian L has to satisfy L( u, ) ≤ L( u, ) ≤ L(u, ) (u ∈ X , ∈ Y ).
Since −L(u, ·) is convex, proper, and lower semicontinuous for any u ∈ X , we deduce from the necessary and su cient rst-order optimality condition ∈ ∂(−L(u, ·))( ) for convex functions together with the sum rule [ , Prop. . ] that K( u) ∈ ∂F * ( ). We also see that u ∈ arg min u G(u) + K(u), .
Since G is convex and K ∈ C (X ; Y ), we can apply the calculus of Clarke's generalized derivative (which reduces to the Fréchet derivative and convex subdi erential for di erentiable and convex functions, respectively; see, e.g., [ , Chap. . ]) to deduce the overall system of critical point conditions This may be rewritten concisely as for the monotone operator This is de ned at an arbitrary base pointū ∈ X for the linearization of K. Here and generally we use the notation q = (u, ) ∈ X × Y and w = (ξ , η) ∈ X × Y for combining (primal, dual) and (co-primal, co-dual) variable pairs, respectively. This nomenclature stems from being the dual variable in the original saddle-point problem, whereas the co-primal and co-dual variables generally satisfy w ∈ Hū (q). Alternatively, we may rewrite the critical point conditions ( . ) as The mapping R will be useful for general stability analysis, while Hū is critical for the primaldual algorithm of [ ].
We can prove the following about these mappings. Proof. Again, the fact that Graph R is locally closed is an immediate consequence of the lower semicontinuity of the convex functionals G and F * and the continuity of ∇K. The expression ( . ) is also again an immediate consequence of Corollary . , where we set h (u, ) ∇K(u) * −K(u) and P(u, ) ∂G(u) ∂F * ( ) , where we denote ∇ u [∇K(u) * ] ∇(ũ → [∇K(ũ) * ])(u), using the assumption that K is twice di erentiable.
Recalling ( . ) and ( . ), as well as Proposition . , we see that in order to analyze the stability of ( . ), resp. ( . ), we have to compute R − (w |q ) in a neighborhood of ( q, ). We will later see that this will be necessary both forū = u andū = u .
. We now derive su cient conditions for the Aubin property to hold for saddle points of ( . ). We proceed in several steps. First, we observe that provided that if both D[∂G] and D[∂F * ] have individually the form ( . ), then the convexi ed graphical derivative DHū (q|w)(∆q) also has the form ( . ). More precisely for some linear operatorsḠ q : X → X andF q : Y → Y andKū = ∇K(ū), as well as the cone Since G is assumed to be quadratic, we have V ∂G (u |ξ ) ≡ X , which gives the more speci c structure We make, of course, the implicit assumption that ξ ∈ ∂G(u) and η ∈ ∂F * ( ); if this does not hold, then the respective graphical derivatives are empty. As we will see, it is di cult in general to guarantee the Aubin property. One way of doing so is to consider a Moreau-Yosida regularization of F , that is to replace F * by for some parameter γ > ; see, e.g., [ , Chap. . ]. The regular coderivative of the regularized subdi erential satis es at least at non-degenerate points for some cone V ∂F * ( |η) the expression We denote the corresponding operator H u by H γ , u . From Proposition . , we observe that DR (q|w)(∆q) also has the form ( . ) with ( . ), albeit with a di erent termKū and withḠ q including the second-order term ∇ u [∇K(u) * ] from K.
We now specialize the results of Section to the speci c setting considered in this section. We therefore assume thatF q = γ I for some γ ≥ and that V ∂G = X . For the statement of the next lemma, we drop many of the subscripts and denote for short T T q ,K Kū ,Ḡ Ḡ q , andṼ V ∂F * ( |η).
SupposeḠ is self-adjoint and positive de nite, i.e., there exists c G > such that Then, there exists c > such that if and only if either of the following conditions hold: (i) γ > , in which case c = c(γ , c G ); (ii) there exists c K,V > such that Proof. We rst prove the su ciency of (i) and (ii) Assume rst that γ > . For arbitrary λ ∈ [ , γ ], we can insert the productive zero and use (λ − γ ) ν, η ≥ for all ν ∈Ṽ • and η ∈Ṽ to obtain This we further estimate by application of Young's inequality for any ρ , ρ > as Let us choose ρ = λ − and ρ = . Then ( . ) becomes so that by ( . ) we therefore require that This holds if λ < γ is large enough, verifying case (i) including the relationship c = c(γ , c G ). Suppose next thatγ = . To verify the su ciency of (ii), we proceed by contradiction, assuming ( . ) not to hold for c = c K,V + KḠ− .
Using the standard relation This veri es c = c( K , c G , c K,V ).
Having dealt with the su cient conditions, let us now verify the necessity of ( . ) when γ = . We expand Using the invertibility ofḠ from ( . ), let us choose ξ =Ḡ − K * η. Then ( . ) gives immediately showing the necessity of ( . ) and c K,V ≥ c.
Remark . . It is easily seen that if γ = , then existence of a c > such that is necessary for the satisfaction of ( . ).
We now combine the above low-level lemma with Lemma . .
Lemma . . Let q, w ∈ Q = W = X × Y and suppose that for (q , w ) in a neighborhood U of (q, w), Graph R ∩ U is closed, ( . ) holds, and we have In addition to these structural assumptions, suppose that the mappings q →Ḡ q and u →K u are continuous at q and u, respectively. Assume, moreover, that eachḠ q is self-adjoint and positive de nite, i.e., there exists c G > such that Proof. If γ > , we may directly apply Lemma . . So we take γ = . Suppose rst that ( . ) holds. Then b(q|w; R) =: c K,V > , and ( . ) gives for every ∆η and ν satisfying That is, using the facts that ∈ X and ∈ X • , as well as the expression ( . ), we see that ( . ) holds whenever ∆η ∈Ṽ ( |η ) and ν ∈Ṽ ( |η ) • for some q = (u , ) and w = (ξ , η ) satisfying With q and w xed, Lemma . now shows the existence of a constant c > such that with c depending only on K , c G , and c K,V . Therefore ( . ) holds for all Applying ( . ) in the expression for a in ( . ) now shows that a(q|w; R) ≥ c.
In the other direction, to show that R − (w |q) = ∞ if b(q|w; R) = , we assume to the contrary that R − (w |q) < ∞. Then a(q|w; R) ≥ c for some constant c > . Now we perform the above steps in the opposite direction to show that b(q|w; R) > , in contradiction to the premise.
The following theorem, which specializes Lemma . to the speci c structure assumed in this section and estimates the lower bounds slightly to derive easier conditions, is one of the main results of this work.
Theorem . . Let q, w ∈ Q = W = X × Y and let U be a neighborhood of (q, w). Suppose that for G and F * of the form ( . ) for some regular integrands and f * , respectively. Assume further thath ∈ C (X ; Y ) and G ∈ C (X ), and that F * satis es for some γ ≥ the inclusion In addition to these structural assumptions, suppose that there exists a constant c G > such that If ( . ) holds as an equality, then ( . ) holds if and only if ( . ) holds.
Moreover, Graph R ∩ U is closed due to the assumptions onh and G and to G and F * being convex. Further, ( . ) holds by Corollary . . Condition ( . ) is guaranteed by ( . ), while for ( . ), we rst of all observe that We derive for small t > and some constant C > (depending on (q, w)) the inclusion In the nal step we have used the fact thath (u)). Since we take the supremum over t > in ( . ), the scaling factor C > disappears, and we deduce from ( . ) and ( . ) that b(u|w; R) ≥b(u|w; R).
Thus ( . ) guarantees ( . ). Similarly, retracing the steps, we verify that for some C > . Indeed, using ∇G(u) + ∇h(u) * = ξ , we compute for some C > that and in particular, ( . ) guarantees ( . ). Our claims now follow from an application of Lemma . , since its continuity requirements onḠ andK follow from the assumptions onḠ andh.
In the remainder of this section, we apply Theorem . to show several stability properties of saddle points to ( . ).
. We begin with the simplest example of verifying H γ ,ū ( | q) < ∞ with xedū = u for q solving ∈ H γ , u ( q). This is useful for showing convergence of the primal-dual algorithm of [ ]. By Proposition . , we are in the setting of Theorem . . Indeed, for R = H γ ,ū , we obtain an instance of ( . ) withh (u ) = ∇K(ū)u + cū .
Furthermore, we also have so we may take c G = α in ( . ). If γ > , ( . ) is trivially satis ed. By Theorem . , we therefore obtain Thus H − γ , u has the Aubin property at ( , q) provided γ > . We summarize these ndings in the following proposition.
Consequently, ( . ) can be expressed in the setting of this proposition as We will return to the issue of verifying -or disproving -the lower bound onb with speci c examples in Section .
. We now want to study the stability of the condition ∈ H u ( q) with respect to perturbation of the data y δ . This of course only makes sense if we equate the base pointū in Hū to the solution u. Therefore, we de ne for variations ∆y in the data ∆y (u, ) P(u, ) + h ∆y (u, ) with h ∆y (u, ) ∇K(u) * ∆y − K(u) and P(u, ) ∇G(u) ∂F * ( ) .
We remark that due to the linear dependence of the optimality conditions ∈ ∆y (u, ) on ∆y, the stability with respect to ∆y can be seen as a form of tilt-stability [ , , -, , ] for saddle-point systems.
If K ∈ C (X ; Y ), by Proposition . , we can compute DR (q|w). In fact, with we see that R is an instance of the class covered by Theorem . . Its application directly yields the following proposition.
Proposition . . Let K ∈ C (X ; Y ) and suppose that F * satis es ( . ) and ( . ). Denote by q ∆y a solution to the optimality conditions ( . ) for the problem Suppose that a solution q = q exists, and there exists a constant c G > such that If γ > orb( q| ; R ) > , then for some ρ, > there exist solutions q ∆y with q − q ∆y ≤ ∆y whenever ∆y ≤ ρ.

. -
Finally, we study the stability of the regularized optimality condition ∈ H u,γ ( q) with respect to the Moreau-Yosida parameter γ . With P as in the previous section, we now set with h γ (u, ) ∇K(u) * γ − K(u) and P γ (u, ) Observe that = R . Let q solve ∈ R ( q). Now ( . ) applied to γ at q and w ∈ γ ( q) gives with w = and q = q the estimate Since ∈ R ( q), we deduce that w γ ( , γ ) ∈ γ ( q). This quickly leads to the following proposition.
Proposition . . Let K ∈ C (X ; Y ), and suppose F * satis es ( . ) and ( . ). Denote by q γ a solution to the optimality conditions ( . ) for the problem Suppose a solution q = q exists,b( q| ; R ) > , and ( . ) holds. Then for some ρ, > there exist solutions q γ with Proof. We may assume that , because otherwise q γ = q. With w γ = ( , γ ), as above, we expand ( . ) into In order to derive ( . ), we only need to show the existence of a nite constant − γ ( w γ | q) < ∞ and integrate into the constant. For this, we simply apply Theorem . to R = γ with h(u ) = K(u ), and observe thatb( q| w γ ; γ ) =b( q| ; R ). This follows from the fact that the expression ( . ) only depends on γ through the base point η +h(u), which in this case is γ + (−γ + K( u)) = K( u). Observe that ( . ) is equally independent of γ . We can thus bound − γ ( w γ | q) from above uniformly in γ ∈ [ , ρ].
We now discuss the possibility of satisfying the assumptions of the preceding propositions in the context of the motivating parameter identi cation problems ( . ) and ( . ). Since this will depend on the speci c structure of the parameter-to-observation mapping S, we consider as a concrete example the problem of recovering the potential term in an elliptic equation.
Let This operator has the following useful properties [ ]: ( ) The operator S is uniformly bounded in U ⊂ X and completely continuous: If for u ∈ U , the sequence {u n } ⊂ U satis es u n u in X , then ( ) S is twice Fréchet di erentiable.
( ) There exists a constant C > such that ( ) There exists a constant C > such that Furthermore, from the implicit function theorem, the directional Fréchet derivative ∇S(u)h for given h ∈ X can be computed as the solution w ∈ H (Ω) to ( . ) ∇w, ∇ + uw, = −yh, ( ∈ H (Ω)).
Other operators satisfying the above assumptions are mappings from a Robin or di usion coe cient to the solution of the corresponding elliptic partial di erential equation [ ].
. L Let us rst consider the L tting problem ( . ). We are in the setting of ( . )-( . ). More speci cally now where we allow the integral to be possibly in nite if the integrand does not satisfy f * • ∈ L (Ω). We also have Thus, the saddle-point conditions ( . ) for ( . ) are given by does not hold in general (take any orthonormal basis of L (E), which converges weakly but not strongly to zero, and use the fact that ∇S(u) is a compact operator from L (Ω) to L (Ω) due to the Rellich-Kondrachev embedding theorem). Again,b( q| ; H u ) = .
Therefore by Proposition . , there is no metric regularity without some sort of regularization. On the other hand, with Moreau-Yosida regularization, i.e., for γ > , we always have metric regularity of H u at ( q, ) by the same proposition.
Data stability The situation is very similar for stability with respect to data (Proposition . ) when γ = . Comparing ( . ) and ( . ), we see that we have to study whether b( q| ; R ) =b( q| ; H u ) > .
Hence, we again cannot have data stability without Moreau-Yosida regularization. With regularization, i.e., for γ > , we still need to prove ( . ). Using the reverse triangle inequality, the boundedness of the dual variable (x) ∈ [− , ] due to the choice of F * , and assumption ( ), we have that for α su ciently large and hence data stability.
Stability with respect to γ Since Proposition . holds under exactly the same conditions as Proposition . , we deduce that there is no stability with respect to the Moreau-Yosida parameter at γ = . This is to be expected, as any addition of regularization will, whenever η(x) = , immediately force (x) = . At a point γ > , the stability can be proved similarly to the arguments in Proposition . .

. L ∞
Let us now consider the L ∞ tting problem ( . ). We are again in the setting of ( . )-( . ), this time with and G and K as in the previous subsection. Hence, the saddle-point conditions ( . ) are now given by Metric regularity Again, for metric regularity of H u (Proposition . ) we need to show b( q| ; H u ) > .

Let us force again z(x)
. From Corollary . , we obtain that z ∈ V ∂F * ( |η ) satis es If (x) = a. e., and ess sup x ∈Ω | η(x)| < δ holds -meaning the constraint is almost never active -then we can proceed as in Section . to show for any ε > for small enough t > the estimate In consequence, z ∈ V ∂F * ( |η ) satis es and we deduce following Section . that b( q| ; H u ) =b( q| ; R ) = .
Therefore, by Proposition . , we have no metric regularity if the constraint ( . ) is almost never active. (Any small change could force it to be active, and hence cause a large change in the dual variable.) However, also if the constraint ( . ) is active on an open set E, we may reason as in Section . to show instability. The only way to obtain stability is therefore with Moreau-Yosida regularization.
Data stability Stability with respect to data (Proposition . ) again requires that Hence, we cannot have data stability without Moreau-Yosida regularization. If γ > , we additionally need to prove ( . ). Using the reverse triangle inequality and assumption ( ), we have that for α su ciently large and hence data stability. Since in this case we do not have an a priori bound on , the choice of α depends on and hence on the data y δ .
Stability with respect to γ As in the case of L -tting, stability with respect to the Moreau-Yosida parameter only holds at γ > .
. Discretization provides an alternative to regularization. Indeed, in practice, the data y δ lies in a nite-dimensional subspace Y ⊂ Y = L (Ω). With P the orthogonal projection from Y into Y , we then replace the tting term F by F P F •P. We then have with y = y +y We emphasize that in this approach we only discretize the tting term, while the nonlinear operator S and the regularizer G remain in nite-dimensional. Hence, From the de nition ( . ) of the graphical derivative, we calculate that if either Y or ∆ Y , Consequently, by basic properties of convex hulls, we also have the inclusion Suppose now that DF * satis es ( . ), that is, Since any cone V and subspace Y satisfy the easily veri ed identity Since ∇S( u) is invertible (as the inverse of a linear partial di erential operator; cf. ( . )) and the orthogonal projection P is self-adjoint, the restriction of P∇S( u)∇S( u) * P to Y is a self-adjoint positive de nite operator on the nite-dimensional space Y and therefore boundedly invertible. This implies the existence of a constant c > such that which yieldsb( q| ; H u, P ) =b( q| ; R , P ) ≥ α − c > and therefore metric regularity.
It remains to verify the conditions ( . ). For this, let us further assume that the discretization is piecewise constant, that is e i = |Ω i | − χ Ω i , for some subdomains Ω i ⊂ Ω with N i= χ Ω i = χ Ω . For the L -tting example, ( . ) then gives For t > small enough, ( . ) and ( . ) therefore give This suggests to take A i = {− , } to satisfy ( . c). Then, for ( . a) to be satis ed, the condition ( . ) gives the only possibility of B i = { }. Further, for the strict complementarity within ( . a) to hold, it is necessary to impose that the middle two cases in ( . ) and ( . ) do not occur. This strict complementary condition may also be stated as To verify ( . b), suppose thatη i B i . It may happen thatηχ Ω i behaves wildly. Nevertheless, byη i B i , there are points x ∈ Ω i where necessarilyη(x) . Hence the condition ( . ) forces (x) = signη(x). Thus z(x) = , which through z ∈ Y forces z i = , proving ( . b).
Thus ( . ) and the estimate in ( . ) hold for projection-regularized L tting under the strict complementarity condition ( . ). For L ∞ -tting, still using piecewise constant discretization, studying ( . ), ( . ), and ( . ), we see that we can take A i = { } and B i = {−δ, δ } to obtain the same results under the strict complementarity condition Note that the strict complementarity conditions ( . ) and ( . ) can always be satis ed after a small perturbation of , if necessary. Indeed, if η i is active ( η i = resp. | η i | = δ ), then by ( . ) resp. ( . ), i can be made inactive -in which case the strict complementarity condition is satis ed -while maintaining the optimality condition η ∈ ∂F P ( ). This change in can, however, alter the constant in ( . ).
Observe further that for the original in nite-dimensional problem, similar strict complementarity conditions (pointwise almost everywhere) could be derived, but these would not be su cient to obtain metric regularity since we still have the problem that the inverse of ∇S( u)∇S( u) * is unbounded on L (Ω). Further, in the L topology, even a strong complementarity condition (with ε lower bound in the inequality) would not be su cient to transport it from the optimal solution to the perturbed variables and η .
For stability with respect to perturbations of the data, condition ( . ) is also required. Since this condition is independent of F , it will hold under discretization whenever it holds for the original problem (with a possibly di erent constant c G since now ∈ Y ⊂ Y ).
The purpose of this work was to derive explicit stability criteria for solutions to saddle-point problems in Hilbert spaces, in particular those arising from the minimization of nonsmooth nonlinear functionals commonly occurring in parameter identi cation, image processing, or PDE-constrained optimization problems. Our main results are a pointwise characterization of regular coderivatives of convex subdi erentials of integral functionals and explicit conditions for metric regularity of the corresponding variational inclusions. These make it possible to verify the Aubin property for concrete problems. While the results for our model problems are mostly negative (no regularity unless regularization or discretization is introduced), they are still useful: Our function-space analysis provides a uni ed framework for any conforming regularization; in particular, it shows that the stability properties are independent of the discretization of the unknown parameter. Furthermore, for arbitrary small xed Moreau-Yosida parameters, the properties are also independent of the discretization of the data; this is especially important for the convergence of numerical algorithms, where this translates in a discretization-independent number of iterations required to reach a given tolerance.
This work can be extended in a number of directions. In a follow-up paper, we will apply our results on the Aubin property of pointwise set-valued mappings to the convergence analysis of the nonlinear primal-dual extragradient method from [ ] in function spaces. We also plan to investigate the possibility of obtaining partial stability results with respect to only the primal variable without regularization or discretization. An alternative would be to exploit the uniform stability with respect to regularization for xed discretization, and with respect to discretization for xed regularization, to obtain a combined convergence for a suitably chosen net (γ , h) → ( , ); this is related to the adaptive regularization and discretization of inverse problems [ ]. Furthermore, it would be of interest to extend our analysis to include nonsmooth regularizers G, which were excluded in the current work for the sake of the presentation. It would also be worthwhile to try to adapt the stability analysis to make use of the limiting coderivative and its richer calculus; in particular to remove the geometric derivability assumption by directly working with the limiting coderivative. Finally, the pointwise characterization of coderivatives could be useful in deriving more explicit optimality conditions for bilevel optimization problems.
In this appendix, we give an explicit characterization of the regular coderivative for a class of set-valued mapping covering the examples in Section . .
Proposition . . For a set-valued operator R : Q ⇒ Q, on a Hilbert space Q, suppose that for a linear operator T T q : Q → Q, dependent on q but not w, and a cone V (q|w) ⊂ Q, dependent on both q and w. Then otherwise.
Note that if the cone V (q|w) is closed and convex -in particular, a closed subspace -then V (q|w) •• = V (q|w).
In Cambridge, T. Valkonen has been supported by the King Abdullah University of Science and Technology (KAUST) Award No. KUK-I --, and EPSRC grants Nr. EP/J / "Sparse & Higher-order Image Restoration", and Nr. EP/M X/ "E cient computational tools for inverse imaging problems". While in Quito, T. Valkonen has moreover been supported by a Prometeo scholarship of the Senescyt (Ecuadorian Ministry of Science, Technology, Education, and Innovation). [ ] V , Extension of primal-dual interior point methods to di -convex problems on symmetric cones, Optimization ( ), -, : . / .