Novel higher order regularisation methods for image reconstruction Konstantinos Papafitsoros St Edmunds College University of Cambridge A thesis submitted for the degree of Doctor of Philosophy June 2014 Abstract In this thesis we study novel higher order total variation-based variational methods for digital image reconstruction. These methods are formulated in the context of Tikhonov regularisation. We focus on regularisation techniques in which the regulariser incorporates second order derivatives or a sophisti- cated combination of first and second order derivatives. The introduction of higher order derivatives in the regularisation process has been shown to be an advantage over the classical first order case, i.e., total variation regularisation, as classical artifacts such as the staircasing effect are significantly reduced or totally eliminated. Also in image inpainting the introduction of higher order derivatives in the regulariser turns out to be crucial to achieve interpolation across large gaps. First, we introduce, analyse and implement a combined first and second order regularisation method with applications in image denoising, deblurring and in- painting. The method, numerically realised by the split Bregman algorithm, is computationally efficient and capable of giving comparable results with total generalised variation (TGV), a state of the art higher order method. An addi- tional experimental analysis is performed for image inpainting and an online demo is provided on the IPOL website (Image Processing Online). We also compute and study properties of exact solutions of the one dimen- sional total generalised variation problem with L2 data fitting term, for simple piecewise affine data functions, with or without jumps . This gives an insight on how this type of regularisation behaves and unravels the role of the TGV parameters. Finally, we introduce, study and analyse a novel non-local Hessian functional. We prove localisations of the non-local Hessian to the local analogue in sev- eral topologies and our analysis results in derivative-free characterisations of higher order Sobolev and BV spaces. An alternative formulation of a non-local Hessian functional is also introduced which is able to produce piecewise affine reconstructions in image denoising, outperforming TGV. Keywords: Higher order total variation, functions of bounded Hessian, total generalised variation, denoising, deblurring, inpainting, staircasing effect, split Bregman, exact TGV solutions, non-local Hessian, characterisation of higher order Sobolev and BV spaces. Acknowledgements I would like to thank at this point all the people that made this journey towards the completion of this Ph.D. thesis a pleasant one and made the last four years truly memorable. Firstly, I would like to thank Carola Scho¨nlieb for being an excellent supervisor from every perspective. Her continuous help and encouragement played a great role to my academic evolution. She was the one that introduced me to the field of mathematical imaging and showed me how a modern professional mathematician should be. The people I lived with in Cambridge played an important role during my Ph.D.. A big thanks goes to Sara Merino for being the best flatmate and friend possible for the last four years. The same amount of gratitude goes as well to Nayia Constantinou and Luca Calatroni. I could not have asked for better flatmates and good friends. A stimulating environment in the office makes the working hours pleasant and never boring. Bati Sengul and Spencer Hughes were an excellent pair of officemates and friends and I thank them a lot for that. I would like to thank Thalia Sotirchou for her love and support during the demanding last year of the Ph.D.. Many thanks to all my friends in CCA and CMS, Marc Briant, Kolyan Ray (my gym-mate), Damon Civin, Ed Mottram, Vaggelis Papoutsellis, Julio Brau, Ioan Manolescu and also Panagiotis Kosteletos for all the nice moments we had together. Moreover, I would like to thank my collaborators Jan Lellmann, Kristian Bredies and Daniel Spector as well as Martin Benning and Tuomo Valkonen for useful discussions. I am grateful to the CCA directors Arieh Iserles and James Norris for giving me the opportunity to be part of it and to Emma Hacking for providing excellent administrative support. I also acknowledge the financial support provided by the Engineering and Physical Sciences Research Council, the Cambridge Centre for Analysis, the department of Pure Mathematics and Mathematical Statistics, the department of Applied Mathematics and Theoretical Physics and the King Abdullah University of Science and Technology. I would like to thank the thesis examiners Dr. Anders Hansen and Professor Antonin Chambolle for carefully reading the manuscript and for the fruitful discussion during the examination. Above all I would like to thank my mother Emmy, my sister Nassia and all my family, without whom I would not have achieved all these! 6 Statement of Originality I hereby declare that my dissertation entitled “Novel higher order regularisa- tion methods for image reconstruction” is not substantially the same as any that I have submitted for a degree or diploma or other qualification at any other university. I further declare that it contains nothing which is the outcome of work done in collaboration with others, except where specifically indicated in the text. Chapter 1 is a review of the existing literature on variational methods for image reconstruction. It places the dissertation into context and summarises its results. It is a result of my own personal work. Chapter 2 introduces some necessary mathematical preliminaries for the thesis. It consists of a review of some of the known results of measure theory, functions of bounded variation and convex analysis. It is not original material. Chapter 3 is dedicated to the introduction, analysis and implementation of the TV–TV2 method for image reconstruction and it is novel material. It is a result of collaboration and it is contained in [PS14, PSS13]. The proofs as well as the implementation of the method were written by me, apart from Theorem 3.4.7 proved by my supervisor Carola Scho¨nlieb from whom I also greatly benefited through several discussions. The C code implementation for TV–TV2 inpainting that was used in the IPOL online demonstration was written by my collaborator Bati Sengul, University of Cambridge. Chapter 4 contains an analysis of the one dimensional total generalised varia- tion denoising problem. It is novel material and it has been done in collabo- ration with Kristian Bredies, University of Graz, Austria, and follows [PB13]. All the proofs were written by me apart from Proposition 4.4.3 while I greatly benefited from my two visits in Graz. Chapter 5 studies two forms of non-local Hessian functionals following [LPSS]. It is novel material and it has been done in collaboration with Jan Lellmann, Carola Scho¨nlieb, University of Cambridge and Daniel Spector, Technion, Is- rael. All the proofs are the result of my own personal work while I benefited from discussions with all the three aforementioned collaborators. The numer- ical implementation in Section 5.7 is due to my collaborator Jan Lellmann. Contents List of Notation 13 1 Introduction 19 1.1 Digital images, image processing and their mathematics . . . . . . . . . . . 19 1.2 The variational approach for image reconstruction . . . . . . . . . . . . . . 20 1.3 Total variation based methods . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4 Higher order variational models . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.5.1 The TV–TV2 approach for image reconstruction . . . . . . . . . . . 33 1.5.2 Exact solutions of TGV minimisation in one dimension . . . . . . . 35 1.5.3 Non-local Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.6 Organisation of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2 Mathematical preliminaries 39 2.1 Notation on function spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 Radon measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.3 Functions of bounded variation . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.3.1 Definitions and properties . . . . . . . . . . . . . . . . . . . . . . . . 43 2.3.2 Good representatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4 Lower semicontinuous envelopes . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.5 Convex analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.5.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.5.2 Fenchel–Rockafellar duality . . . . . . . . . . . . . . . . . . . . . . . 52 3 The combined TV–TV2 approach for image reconstruction 55 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1.1 Organisation of the chapter . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 Convex functions of measures . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3 The space of functions of bounded Hessian BV2(Ω) . . . . . . . . . . . . . . 60 3.3.1 Definition and basic properties . . . . . . . . . . . . . . . . . . . . . 60 3.3.2 Weak∗ and strict convergence in BV2(Ω) . . . . . . . . . . . . . . . . 61 9 CONTENTS 3.4 Well-posedness of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4.1 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.5 The numerical implementation . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.5.1 Discretisation of the model . . . . . . . . . . . . . . . . . . . . . . . 74 3.5.2 Bregman iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.5.3 Numerical solution via the split Bregman algorithm . . . . . . . . . 79 3.6 Applications to image denoising and deblurring . . . . . . . . . . . . . . . . 83 3.7 Applications to image inpainting and online demo in IPOL . . . . . . . . . 93 3.7.1 Motivation and IPOL’s philosophy . . . . . . . . . . . . . . . . . . . 93 3.7.2 Split Bregman for TV–TV2 inpainting . . . . . . . . . . . . . . . . . 95 3.7.3 The colour image case . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.7.4 Stopping criteria and the selection of λ’s . . . . . . . . . . . . . . . . 100 3.7.5 Inpainting examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4 Exact solutions of the one dimensional TGV regularisation problem 107 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.1.1 Organisation of the chapter . . . . . . . . . . . . . . . . . . . . . . . 108 4.2 Basic properties of TGV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3 Formulation of the problem and optimality conditions . . . . . . . . . . . . 110 4.4 Properties of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.4.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.4.2 L2–linear regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.4.3 Even and odd functions . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.5 A note on the relationship of TV and TGV in dimension two . . . . . . . . 126 4.6 Computation of exact solutions . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.6.1 Piecewise constant function with a single jump . . . . . . . . . . . . 130 4.6.2 Piecewise affine function with a single jump . . . . . . . . . . . . . . 145 4.6.3 Hat function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 4.6.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 153 5 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications 157 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 5.1.1 Organisation of the chapter . . . . . . . . . . . . . . . . . . . . . . . 160 5.2 Localisation – The smooth case . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.3 Localisation – The W 2,p(Rd) case . . . . . . . . . . . . . . . . . . . . . . . . 165 5.4 Second order non-local integration by parts . . . . . . . . . . . . . . . . . . 169 5.5 Localisation – The BV2(Rd) case . . . . . . . . . . . . . . . . . . . . . . . . 171 5.6 Non-local characterisation of W 2,p(Rd) and BV2(Rd) . . . . . . . . . . . . . 173 10 CONTENTS 5.7 An alternative non-local Hessian approach and an application to image denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 6 Conclusions 181 A 183 A.1 Exclusion of the cases N-0-P2-J, N-0-P1-C, N-0-P2-C . . . . . . . . . . . . 183 A.2 Some useful Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Bibliography 191 11 12 List of Notation Against each entry is the page at which the notation is introduced. Sets and Measures: R+ 41 The set of positive real numbers. R 49 The set of extended real numbers, R = R ∪ {+∞}. Sd×d 30 The set of d× d symmetric matrices. Ω 42 An open subset of Rd. B(X) 39 The Borel σ-algebra of X. |µ| 41 The total variation measure of the finite Radon measure µ. M(X) 41 The set of real valued finite Radon measures on X. Endowed with the norm ‖µ‖M(X) = |µ|(X). M(X,R`) 41 The set of R`–valued finite Radon measures on X. Endowed with the norm ‖µ‖M(X) = |µ|(X). M+(X) 41 The set of finite positive Radon measures on X. M+loc(X) 41 The set of positive Radon measures on X. ‖T‖M 42 The Radon norm of the distribution T on a set Ω: ‖T‖M = sup{〈T, v〉 : v ∈ C∞c (Ω), ‖v‖∞ ≤ 1}. If T is a finite Radon measure µ then ‖µ‖M = |µ|(Ω). L 39 The Lebesgue measure on R. Ld 39 The Lebesgue measure on Rd. H 26 The one dimensional Hausdorff measure. Hd 46 The d–dimensional Hausdorff measure. µ ν 42 The Radon–Nikody´m density of µ with respect to ν. sgn(µ) 42 Alternative notation for µ|µ| . µα 42 The absolutely continuous part of µ with respect to Lebesgue measure, i.e., µα = µLd . µs 42 The singular part of µ with respect to Lebesgue measure. Spaces of continuous and differentiable functions: 13 Cc(X) 40 The space of real valued continuous functions of compact support in X. Endowed with the supremum norm ‖u‖∞ = supx∈X |u(x)|. Cc(X,R`) 40 The space of R`–valued continuous functions of compact support in X. Endowed with the supremum norm ‖u‖∞ = supx∈X |u(x)|. C0(X) 40 The completion of Cc(X) under the supremum norm. C0(X,R`) 40 The completion of Cc(X,R`) under the supremum norm. Ckc (X) 40 The space of real valued, k–times continuously differentiable functions with compact support in X. Ckc (X,R`) 40 The space of R`–valued, k–times continuously differentiable functions with compact support in X. Ckc (X,Sd×d) 30 The space of Sd×d–valued, k–times continuously differen- tiable functions with compact support in X. Ck(X) 40 The space of k–times differentiable functions, with continu- ous derivatives up to the boundary of X. C∞c (X) 40 The space of real valued, infinitely many times differentiable functions with compact support in X. C∞c (X,R`) 40 The space of R`–valued, infinitely many times differentiable functions with compact support in X. Spaces of Lebesgue integrable functions: Lp(X,R`;µ) 39 Space of R`–valued, µ–measurable functions such that´ X |u|pdµ < ∞, where 1 ≤ p < ∞ and µ is a positive Radon measure. The corresponding norm is ‖u‖Lp(X,R`;µ) =(´ X |u|pdµ )1/p . Lp(X,R`) 39 Short version of Lp(X,R`;Ld). Lp(X) 39 Short version of Lp(X,R;Ld). L∞(Ω,R`;µ) 39 Space of R`–valued, µ–essentially bounded measurable func- tions, where µ is a positive Radon measure. The correspond- ing norm is ‖u‖L∞(Ω,R`;µ) = ess sup x∈Ω |u(x)|. L∞(Ω) 39 Short version of L∞(Ω,R;Ld), i.e., the space of real valued, essentially bounded Lebesgue measurable functions. Sobolev spaces: Du 24 The distributional derivative of the function u. 14 W k,p(Ω) 40 Sobolev space of functions u ∈ Lp(Ω), such that the dis- tributional derivatives up to order k are also Lp functions (weak derivatives). The corresponding norm is ‖u‖Wk,p(Ω) =(∑ |a|≤k ´ Ω |Dau|pdx )1/p , where Da denotes the a–th distri- butional derivative of u and |a| = a1 + · · ·+ ad is the order of the multiindex a = (a1, . . . , ad). Hk(Ω) 40 The Sobolev space W k,2(Ω). Hk0 (Ω) 40 The completion of C∞c (Ω) under the ‖ · ‖Hk(Ω) norm. Functions of bounded variation, bounded Hessian and bounded deformation: BV(Ω) 44 The space of functions of bounded variation on Ω, i.e., all the functions u ∈ L1(Ω), such that Du is a Rd–valued finite Radon measure. The corresponding norm is ‖u‖BV(Ω) = ‖u‖L1(Ω) + |Du|(Ω). BV(Ω,R`) 44 The space of R`–valued functions of bounded variation on Ω, i.e., all the functions u ∈ L1(Ω,R`), such that Du is a R`×d– valued finite Radon measure. The corresponding norm is ‖u‖BV(Ω,R`) = ‖u‖L1(Ω,R`) + |Du|(Ω). TV(u) 44 The total variation of u ∈ L1(Ω,R`), equal to |Du|(Ω) for a function u ∈ BV(Ω,R`). D2u 60 The second order distributional derivative of u. BV2(Ω) 29 The space of functions of bounded Hessian on Ω, i.e., all the functions u ∈ W 1,1(Ω) such that ∇u ∈ BV(Ω,Rd). The corresponding norm is ‖u‖BV2(Ω) = ‖u‖BV(Ω) + |D2u|(Ω). TV2(u) 29 The second order total variation of u, equal to |D2u|(Ω) for a function u ∈ BV2(Ω). Eu 30 The distributional symmetrised gradient of u. ICTVβ,α(u) 29 The TV–TV 2 infimal convolution of u with weights α and β, i.e., ICTVβ,α(u) = infv∈BV2(Ω) αTV(u− v) + β TV2(v). TGV2β,α(u) 30 The second order total generalised variation of u with weights α, β > 0. BD(Ω) 30 The space of functions of bounded deformation on Ω, i.e., all the functions u ∈ L1(Ω,Rd) such that Eu is a Rd×d–valued finite Radon measure. Dαu or ∇u 44 The absolutely continuous part of Du with respect to Lebesgue measure, i.e., shorter version of (Du)α = DuLd . Dsu 44 The singular part of Du with respect to Lebesgue measure, i.e., shorter version of (Du)s. u′ 44 Short version of ∇u, when u is an one dimensional BV func- tion. pV(u, (a, b)) 47 The pointwise variation of u in (a, b) ⊆ R. eV(u, (a, b)) 47 The essential variation of u in (a, b) ⊆ R. 15 Ju 48 The jump set of u. u(x−), u(x+) 48 Left and right limits of u at the point x. ul, ur 48 Left and right continuous representatives of u. uup, ulow 48 Good representatives of u, defined as u up(x) = max{ul(x), ur(x)} and ulow(x) = min{ul(x), ur(x)}. uΩ 46 The trace of u in ∂Ω for a function u ∈ BV(Ω). Miscellaneous notation: uΩ 46 The mean value of u in Ω, i.e., uΩ = 1 Ld(Ω) ´ Ω u dx. XA 21 The characteristic function of the set A, i.e., XA(x) = { 1 if x ∈ A, 0 if x /∈ A. IA 51 The indicator function of the set A, i.e., IA(x) = { 0 if x ∈ A, ∞ if x /∈ A. ϕ∞ 59 The recession function of ϕ : R` → R, defined for every x ∈ R`: ϕ∞(x) := limt→∞ ϕ(tx)t . scτF 50 The lower semicontinuous envelope (relaxed functional) of F : X → R, with respect to the topology τ . X∗ 50 The analytic dual of the space X. Its elements are denoted by x∗, y∗ . . .. F ∗ 51 The convex conjugate of a function F . If F : X → R, then F ∗ : X∗ → R is defined as F ∗(x∗) = supx∈X〈x∗, x〉 − F (x). Λ∗ 52 The adjoint operator Λ∗ : Y ∗ → X∗ of the bounded linear functional Λ : X → Y , where X, Y are Banach spaces. ∂F 51 The subdifferential of F . Sgn(µ) 114 The set valued sign of the finite Radon measure µ on Ω, defined as: Sgn(µ) = {v ∈ L∞(Ω) ∩ L∞(Ω, |µ|) : ‖v‖L∞(Ω) ≤ 1, v = sgn(µ), |µ| − a.e.}. f 120 The L1–linear regression of f , f := argmin v affine ‖f − v‖L1(Ω). f? 121 The L2–linear regression of f , f? := argmin v affine ‖f − v‖2L2(Ω). Gnu 37 Non-local gradient of u (explicit formulation). Hnu 37 Non-local Hessian of u (explicit formulation). Gσxu(x) 37 Non-local gradient of u at x with weighting function σx (im- plicit formulation). Hσxu(x) 37 Non-local Hessian of u at x with weighting function σx (im- plicit formulation). d 2u(x, y) 37 Second order finite difference scheme: d 2u(x, y) = u(y)− 2u(x) + u(x+ (x− y)). 16 Assumed notation: N - The set of natural numbers. R, Rd - The set of real numbers and its d–Cartesian product. In both cases the Euclidean norm is denoted by | · |. R`×d - The set of `×d real matrices, equipped with the Frobenious norm denoted by | · |. D b Ω - D ⊆ Ω with D being compact. µbA - The restriction of the measure µ to the set A. δx - The Dirac atomic measure concentrated on {x}. B(x, δ) - The open ball in Rd with center x and radius δ. a⊗ b - The `×d matrix with (ij)–th entry aibj , for a ∈ R`, b ∈ Rd. divu - Divergence of u : Ω→ Rd, i.e., divu = ∑di=1 ∂ui∂xi . div2u - Second order divergence of u : Ω→ Rd×d, i.e., ∑di,j=1 ∂2uij∂xi∂xj . ∆u - The Laplacian of u : Ω→ R, i.e., ∆u = ∑di=1 ∂2ui∂x2i . 17 18 Chapter 1 Introduction 1.1 Digital images, image processing and their mathematics This is “the earliest surviving camera photograph” [Hir08]: Figure 1.1: Joseph Nice´phore Nie´pce: Point de vue du Gras, 1826 or 1827. Source: http://en.wikipedia.org/wiki/History_of_photography. Even though undoubtedly a photograph like the above represents a real breakthrough for humanity, it would be considered an image of very low quality in the modern world. High quality digital cameras dominate the photography world of our time and people always look for the technically and visually perfect photo. Joseph Nice´phore Nie´pce had no hope to improve his photograph and that was not only because he was lacking the proper – by today’s standards – camera equipment. Even a few decades ago, when film cameras were the photographer’s only choice, the post-processing techniques were only accessible to the selected few that possessed the secrets of the darkroom. Yet, those techniques were fairly limited, allowing only for some simple colour, brightness and contrast adjustment. However, the development of computers and their unavoidable use in our everyday life arrived soon and all the information started to be encoded in bits and bytes. Photographs were no exception and digital photography was born. An image is no longer seen as a 19 Introduction chemical mark left on a film but as a collection of numbers. Even though admittedly this sounds less romantic, it provides a convenient framework for us mathematicians, to see digital images as functions: A typical digital image of resolution N ×M , that is to say N ·M pixels, can be modelled as a function u = (ured, ugreen, ublue) u : {1, . . . , N} × {1, . . . ,M} → {0, . . . , 255}3, (1.1) where each component measures the intensity of the corresponding channel (RGB im- ages). This representation gives now a new meaning to image processing. For example, the sentence “Given a digital image which has some undesirable characteristics, like noise, blur etc., we would like to obtain a visually and aesthetically better version of it” can be interpreted as “Given a function that has certain characteristics and some local or global regularity properties, we would like to obtain another function, close to the initial one in some sense, with potentially different regularity properties and with some desirable fea- tures”. Mathematicians use the term functional to denote an object that takes a function as an input and gives another function as an output. Thus, past complicated processing procedures in the mini lab, like contrast and colour adjustment, are just simple linear operations in the new language. However, more complicated image processing tasks like noise removal require more sophisticated “functionals” or in other words, mathematical methods and algorithms. If today we can perform these tasks it is not only due to the introduction of computers but mainly because of the development of an advanced mathe- matical technology both at a theoretical and an applied level. Therefore, there is a duality between the increasingly challenging image processing problems and the progress of novel mathematical theories. While a new mathematical method is most of the times intro- duced to tackle a specific image processing task, it often turns out that this method leads to fruitful mathematical theories that are interesting in their own right. A mathematician working in mathematical imaging likes to prove things about a method, not only in order to provide rigorously justified insights about the behaviour of the method but also because he/she is intrigued by the mathematical beauty of the theory behind it. We do not claim that the present thesis is solely about image processing. Even though, we are motivated by complicated image processing problems and we introduce novel meth- ods, most of the results can find their own autonomous place in the mathematical literature (especially Chapter 5). 1.2 The variational approach for image reconstruction We have already mentioned that a digital image can be considered as a vector valued func- tion on a N ×M grid. However, in order to have a convenient and familiar mathematical framework, the analysis is usually done in the continuous setting (returning to the discrete 20 1.2. The variational approach for image reconstruction setting for the numerical implementation). Thus, an RGB colour image u is regarded as a function u : Ω→ R3, (1.2) where Ω ⊆ R2 is typically a rectangular domain. Analogously greyscale images are mod- elled as real valued functions on Ω, i.e., u : Ω→ R. Suppose now that we have some image acquisition device. In an ideal world that device should provide us with a perfect image u. Of course, the world perfect is subjective but we ignore any artistic views on the matter. Thus for us, features like noise, blur, artificial artifacts are always undesirable. Quite often, due to imperfections and limitations in the acquisition process we obtain a corrupted version f of the image u. Corruptions could be noise or blur for instance. A typical model in imaging assumes that u has been transformed through a continuous and linear operation T with the additional presence of random noise. In particular we assume that the two images fulfil the model f = Tu+ η. (1.3) Here η denotes a random noise component that follows a certain distribution. Typically, noise is modelled by Gaussian, Poisson, uniform, “salt and pepper” (impulse) distributions or even a combination of these. The operator T is called the forward operator of the problem. Examples are the identity operator (in that case the only undesirable feature is noise), blurring operators (in which case Tu denotes the convolution with a blurring kernel) and pointwise multiplication with the characteristic function of Ω\D, XΩ\D, where D ⊆ Ω is a subdomain of Ω (inpainting domain) where no data are known at all and in which information from the known part should be interpolated (image inpainting). In Figure 1.2 we give some examples of corrupted images for different forward operators T and noise distributions. Depending on the nature of the operator T the reconstruction process is given a specific name. The classical imaging tasks considered in this thesis are the following: (i) Image denoising: T is the identity operator, i.e., only noise is present, see Figures 1.2(b) and 1.2(d). (ii) Image deblurring: T represents a convolution (blurring) operator and noise might or might not be present, see Figure 1.2(c). (iii) Image inpainting: In this case Tu = XΩ\Du, where D ⊆ Ω is a domain of missing information and noise might or might not be present, see Figures 1.2(e) and 1.2(f). In order to reconstruct the initial image u one has to invert the operator T . However, this is not always possible. For example, in the case of inpainting the operator T is not invertible at all while in the case of deblurring, even though the operator could be 21 Introduction (a) The original image u. (b) Image f corrupted with Gaus- sian noise: f = u + η, where η follows a Gaussian distribu- tion. (c) Image f corrupted by Gaus- sian blur and noise: f = Tu+ η, where T denotes a convo- lution with a Gaussian kernel and η follows a Gaussian dis- tribution. (d) Image f corrupted by impulse noise: f = u + η, where η fol- lows a “salt and pepper” dis- tribution (only a percentage of pixels are corrupted). (e) Image f with missing infor- mation (white letters): f = XΩ\Du, where D denotes the area covered by the letters. (f) Image f with missing informa- tion (white letters) and cor- rupted by noise: f = XΩ\Du+ η, where D denotes the area covered by the letters and η follows a Gaussian distribu- tion. Figure 1.2: Examples of corrupted images for different operators T and noise distributions. invertible in theory, its inversion involves an ill-conditioned problem even in the absence of noise. The presence of noise further complicates the problem. In this case, it is a common procedure to add some a priori information to the model, which in general is given by a certain regularity assumption on the image u. This is the so-called regularisation process. In its most general setting one tries to recover a reconstructed version of f as a minimiser of a functional which has the following form: J (u) = Φ(Tu, f) + Ψ(u). (1.4) The function Φ is called the data fidelity or data fitting term and measures the distance between the data f and the reconstruction u after the forward operator has acted on it. Small values of this term ensure that the transformed image Tu is close to the data f in a suitable sense. The function Ψ is called the regulariser or regularising term and it imposes extra regularity on u. It is expected that small values of Ψ will lead, up to a 22 1.2. The variational approach for image reconstruction certain extent, to the elimination of the undesirable features in the corrupted data f . The two terms are typically balanced by one or more positive parameters within Ψ. The choice of regulariser also decides upon the space in which the minimisation of J is well-posed. It is normally chosen to be a Banach space X on which Ψ takes finite values. As one can speculate, given the problem (1.3), a bad/good choice of the data fi- delity term can lead to a reconstruction of high/poor quality. It is well known in the mathematical imaging community that the correct choice of the fidelity term depends on the noise distribution. For example if the noise follows a Gaussian distribution, the appropriate choice for Φ is the squared L2 norm of the difference Tu − f , i.e., Φ(Tu, f) = 12 ´ Ω(Tu − f)2dx. In the case of impulse noise, it has been shown in vari- ous contexts [Nik04, Nik02, CE05, DAG09, WGO07] that the L1 norm of the difference Tu − f , ´Ω |Tu − f |dx leads to more efficient restorations. In the case of Poisson noise, the Kullback–Leibler divergence of Tu and f , ´ Ω(Tu − f log Tu)dx is the most suitable choice [LCA07, BSW+13, SBJM09] while the supremum norm ‖Tu − f‖L∞(Ω) should be preferred in the case of uniform noise [Cla12]. Table (1.1) provides a summary of the above. Noise of mixed distributions can also be considered. In that case a combination of Type of noise Φ(Tu, f) Gaussian 12 ´ Ω(Tu− f)2dx Impulse ´ Ω |Tu− f |dx Poisson ´ Ω(Tu− f log Tu)dx Uniform ‖Tu− f‖L∞(Ω) Table 1.1: The correct choices of data fidelity terms for different noise distributions. the corresponding fidelities is used [NWC13, DlRS12, CDlRS14, HL13]. While the statistics of the noise give rise to a particular choice of the fidelity term, the choice of the regulariser is determined by our a priori knowledge of the properties of the desired image u. Different regularisers impose different regularity and qualitative characteristics on the image u. Squared Hilbert space norms have a long tradition in inverse problems. The most prominent example is Tikhonov regularisation [Tik63, Whi23] with Ψ(u) = α ˆ Ω |∇u|2dx, (1.5) where α > 0 and ∇u is a weak gradient of u, which – for Gaussian noise – leads to the following minimisation of (1.4): min u∈H1(Ω) 1 2 ˆ Ω (Tu− f)2dx+ α ˆ Ω |∇u|2dx. (1.6) 23 Introduction The well-posedness of (1.6) follows from a standard application of the direct method of calculus of variations, exploiting the fact that H1(Ω) is reflexive and thus bounded sets in that space are weakly pre-compact. In the denoising case, i.e., when T is the identity, the gradient flow of the corresponding Euler–Lagrange equation of (1.6) reads ut = α∆u−u+f . The result of such a regularisation technique is a linearly, i.e., isotropically, smoothed image u for which the smoothing strength does not depend on f . Hence, while eliminating the disruptive noise in the given data, also desirable structures like edges in the reconstructed image are blurred, see Figure 1.3. This observation gave way to a new class of non-smooth norm regularisers, which aim to eliminate noise and smooth the image in homogeneous areas, while preserving the relevant structures such as objects boundaries and edges. (a) The original image u. (b) Image f corrupted with Gaussian noise (c) Denoised image with Tikhonov regularisation. Figure 1.3: Illustration of the isotropically smoothed reconstruction using Tikhonov reg- ularisation, i.e., when Ψ(u) = α‖∇u‖2L2(Ω). 1.3 Total variation based methods In their pioneering work, Rudin, Osher and Fatemi [ROF92] proposed the use of total variation of u, TV(u), as regulariser. Recall that for a function u ∈ L1(Ω), the total variation of u is defined as TV(u) = sup {ˆ Ω u divv dx : v ∈ C1c (Ω,R2), ‖v‖∞ ≤ 1 } , (1.7) where C1c (Ω,R2) is the space of R2–valued continuously differentiable functions with com- pact support in Ω. Functions that have finite total variation are exactly those whose distributional derivative Du can be represented by a finite Radon measure (space of func- tions of bounded variation, BV(Ω)), see Chapter 2 for a complete account on this subject. Recall also that if u ∈ W 1,1(Ω), one can easily show that TV(u) = ´Ω |∇u|dx. The 24 1.3. Total variation based methods corresponding L2 fidelity minimisation problem now reads min u∈BV(Ω) 1 2 ˆ Ω (Tu− f)2dx+ αTV(u). (1.8) The well-posedness of (1.8) is shown again using the direct method of calculus of variations together with some compactness properties of the space BV(Ω) (Theorem 2.3.5). When T = Id, the above problem is commonly known as the ROF problem. Here, less regularity than in (1.6) is imposed in the image u, as BV functions are allowed to have jump discon- tinuities. This non-smooth regularisation procedure results in a nonlinear smoothing of the image, smoothing more in homogeneous areas and preserving characteristic structures such as edges. In particular, total variation regularisation is tuned towards the preserva- tion of edges and performs very well for piecewise constant images, where typically almost a perfect reconstruction is obtained, up to a slight loss of contrast, see for example Figure 1.4. (a) Piecewise constant image. (b) Image corrupted with Gaussian noise (c) TV restoration. Figure 1.4: Illustration of the ability of TV regularisation to preserve edges in the recon- structed image and its efficient restoration of piecewise constant images. Since its introduction in image processing in 1992, total variation has found numer- ous applications in the field, as the edge preservation ability made it quite an appealing tool. These applications cover a wide area, from image denoising [ROF92, CL97, Nik04, LCA07, BC98, CKS01, DHKS10, Pas12a], image deblurring [DS96, CW98, WYZ07, BT09, Pas12b], image inpainting [SC02, CS05, Pas12c] and image zooming [ACHR06] to JPEG decompression [ADF05, BH12b] MRI and PET reconstruction [BGH+14, BMPS14, Mu¨l13] to name a few. Properties of total variation regularisation have been extensively studied at the theo- retical level as well. Acar and Vogel [AV94] studied the well-posedness of problem (1.8). Meyer [Mey01] showed that in the L2–TV denoising problem with f being the characteris- tic function of a disk, the solution is a characteristic function of the same disk with a loss of contrast, whose amount depends on the parameter α, see Figure 1.5. Alter, Caselles and Chambolle [ACC05] extended these results for characteristic functions of certain con- vex bounded sets with a bound on the mean curvature of their boundaries (calibrable 25 Introduction ∼ α f u Figure 1.5: Illustration of Meyer’s result: Total variation regularisation on a characteristic function of a disk in R2 results only to a loss of contrast which is proportional to the parameter α. sets). For the same problem, Strong and Chan [SC03] computed exact solutions for 1D and rotationally symmetric 2D data and similar results were obtained by Ring [Rin00]. In [Gra07] Grasmair proved the equivalence of one dimensional TV regularisation and the taut–string algorithm and an extension to higher dimensions was done by Hinterberger et al. [HHK+03]. Properties of the L1–TV model were studied in [Nik02, DAG09, CE05]. Finally, Caselles, Chambolle and Novaga [CCN07] showed that the discontinuity set of the solution u of the L2–TV denoising problem in Ω ⊆ Rd, is contained, up to a Hd−1–null set, in the discontinuity set of the data function f . Here Hd−1 denotes the (d−1)–dimensional Hausdorff measure in Rd. Further regularity properties of the solution u of total variation denoising were proved by Allard [All08a, All08b, All09]. As it is observed in all the aforementioned applications and rigorously justified analyti- cally, total variation regularisation promotes piecewise constant reconstructions leading to images with blocky like structures. This generally becomes apparent in images that do not only consist of flat regions and jumps but also possess slanted regions, i.e., piecewise affine parts, see Figure 1.6. This is called the staircasing effect. In one dimension, this roughly (a) Image corrupted with Gaus- sian noise (b) Total variation denoised im- age with blocky-like artifacts. (c) Detail of total variation de- noised image. Figure 1.6: Illustration of the staircasing effect in total variation denoising in two dimen- sions. 26 1.3. Total variation based methods means that the total variation regularisation of a noisy signal f , is a staircase u, whose L2 norm is close to f , see Figure 1.7. In fact it can be rigorously proved that if in an open in- Figure 1.7: The staircasing effect in total variation denoising in one dimension. terval I we have u > f (or u < f), then u is constant on I [CCN07, Rin00, BKV13]. Since u and f are BV functions defined a.e., the inequalities u > f and u < f are interpreted through inequalities of certain good representatives of them, see Chapter 2 for the relevant definitions. Analytical studies of the staircasing effect are also done in [DMFLM09, DS96]. Since it leads to unnaturally looking images, the staircasing effect is considered an unde- sirable artifact and many methods have been proposed in order to reduce or eliminate it, see Section 1.4 for a detailed discussion. The ability of total variation regularisation to preserve and promote discontinuities on the image u has also been exploited in image inpainting [SC02, CS05, Pas12c]. There, the corresponding minimisation problem reads min u∈BV(Ω) 1 2 ˆ Ω\D (u− f)2dx+ αTV(u), (1.9) where D is the inpainting domain, i.e., the missing part of the image. A desirable feature in image inpainting is the interpolation of the image along large gaps (connectivity principle). Unfortunately, it turns out that total variation is able to provide interpolation results with sharp edges only if the gap is small enough [CKS02, Sch09]. In fact total variation minimi- sation can be interpreted as penalisation of the length of level lines interpolating them via the shortest path. In Figure 1.8 for example, we see that interpolation of a broken stripe is achieved only if the gap length is smaller than the width of the stripe, while Tikhonov regularisation (harmonic inpainting) leads to blurry reconstructions as expected, with no connection occurring either. Total variation achieves perfect sharp reconstruction of the stripe for small gaps but fails completely for large gaps. Also, following its general charac- teristic, total variation promotes piecewise constant reconstructions inside the inpainting domain, resulting again in images with blocky, unnaturally looking features, see Figure 1.9. In order to overcome these issues, several inpainting methods based on higher order derivatives have been suggested, see Section 1.4. 27 Introduction (a) Small gap. (b) TV inpainting. (c) Harmonic inpain- ting. (d) Large gap. (e) TV inpainting. (f) Harmonic inpain- ting. Figure 1.8: Illustration of the dependence of the connection ability of total variation inpainting on the gap size and comparison with harmonic inpainting (Ψ(u) = α‖∇u‖2L2(Ω)). The inpainting domain D is denoted with grey colour. (a) Destroyed image. (b) TV inpainted image. (c) Detail of TV inpainted image. Figure 1.9: Total variation inpainting promotes piecewise constant reconstructions inside the inpainting domain which is unnatural for real world images. 1.4 Higher order variational models In recent years, higher order versions of non-smooth image reconstructions methods have been considered in order to reduce the staircasing effect and hence to improve the quality of the restored image, as well as to fulfil the connectivity principle in inpainting. Some incorporate a combination of first and higher order derivatives – mainly second order – and some others are purely higher order. Intuitively, minimising second order derivative– based energies, one is trying to recover a piecewise affine version of the data in contrast to a piecewise constant version obtained by total variation regularisation. This is expected, roughly because, due to the introduction of the second order derivatives in the regulariser, its kernel now contains affine functions and not only constants. Already in the paper of Chambolle and Lions [CL97], the authors proposed a higher order method for denoising by means of an infimal convolution of two convex regularisers. 28 1.4. Higher order variational models Here, a noisy image is splitted into u = u1 + u2 by solving min u1∈BV(Ω) u2∈BV2(Ω) 1 2 ˆ Ω (u1 + u2 − f)2dx+ αTV(u1) + β TV2(u2). (1.10) where TV2(u2) := TV(∇u2) and BV2(Ω) = {u ∈ W 1,1(Ω) : ∇u ∈ BV(Ω)}, the space of functions of bounded Hessian [Dem85]. The idea is to decompose the image u into a piecewise constant part u1 and a piecewise affine part u2. An alternative formulation for the minimisation problem (1.10) is min u∈BV(Ω) 1 2 ˆ Ω (u− f)2dx+ ICTVβ,α(u), (1.11) with ICTVβ,α(u) := min v∈BV2(Ω) αTV(u− v) + β TV2(v) = min w∈BV(Ω,R2) w∈R(∇,R2) α‖Du− w‖M + β‖Dw‖M. (1.12) Here BV(Ω,R2) is the space of R2–valued functions of bounded variation, R(∇,R2) = {w ∈ L1(Ω,R2) : ∃v ∈ W 1,1(Ω) such that w = ∇v} is the range of the operator ∇ and ‖T‖M is the Radon norm of the distribution T in Ω, ‖T‖M = sup{〈T, v〉 : v ∈ C∞c (Ω), ‖v‖∞ ≤ 1}. Note that for a finite Radon measure µ on Ω, we have ‖µ‖M = |µ|(Ω). Again, the well-posedness of (1.10) and hence (1.11) is shown using the direct method of calculus of variations. Note that while u1 + u2 is unique in (1.10), one can always add a constant to u1 and subtract the same constant from u2, thus constructing another solution. In that sense, u1 and u2 are not unique but their sum is. Another attempt to combine first and second order regularisation originates from Chan, Marquina, and Mulet [CMM01], who considered total variation minimisation together with weighted versions of the Laplacian. More precisely, they considered a regularising term of the form α ˆ Ω |∇u|dx+ β ˆ Ω ϕ(|∇u|)(∆u)2dx, (1.13) where ϕ must be a function with certain growth conditions at infinity in order to allow jumps. The well-posedness of (1.13) in one dimension was rigorously analysed by Dal Maso et al. in [DMFLM09]. In [CEP07] the term ‖∆u2‖2L2(Ω) was used instead of TV2(u2) in (1.10), also in combination with the use of H−1 norm in the fidelity term. Regularising with second order derivatives only, taking Ψ(u) = αTV2(u) has also been considered by Lysaker et al., [LLT03], Scherzer [Sch98], Hinterberger and Scherzer [HS06a], Bergounioux and Piffet [BP10], Lai et al. [LTC13]. In Lefkimmiatis et al. [LBU12], the use of spectral norm of the Hessian matrix was proposed. Further, Po¨schl and Scherzer 29 Introduction [PS08] studied minimisers of functionals which are regularised by the total variation of the k–th derivative. Finally, a combination of total variation with a fourth order PDE was considered by Lysaker and Tai in [LT06] while a higher order nonlinear diffusion filtering denoising technique was studied by Didas at al. in [DWB09]. All the above methods manage to reduce the staircasing effect up to one degree but not totally eliminate it. For example the decomposition obtained by the infimal convolution, is not the desirable one since the effect of the TV2 term is not strong enough to make the reconstruction piecewise smooth. On the other hand regularising only with TV2 introduces a fair amount of blur in the image. Moreover, some of the problems are difficult to be implemented and optimised – problem (1.13) is not even convex – while others are only tuned for image denoising, lacking the wide applicability of total variation regularisation. Total Generalised Variation A high quality regulariser that eliminates the common staircasing artifacts is the total gen- eralised variation (TGV) introduced recently by Bredies, Kunisch and Pock [BKP10]. The second order total generalised variation of a function u ∈ L1(Ω) with positive parameters α and β is defined as TGV2β,α(u) = sup {ˆ Ω udiv2v dx : v ∈ C2c (Ω,S2×2), ‖v‖∞ ≤ β, ‖divv‖∞ ≤ α } , (1.14) where C2c (Ω,S2×2) is the space of S2×2–valued twice continuously differentiable functions with compact support in Ω, with S2×2 being the space of 2 × 2 symmetric matrices. Definition (1.14) is called the supremum or the predual definition of TGV. An equivalent formulation, the minimum or primal definition, was shown in [BV11] TGV2β,α(u) = min w∈BD(Ω) α‖Du− w‖M + β‖Ew‖M, (1.15) where Ew is the distributional symmetrised gradient of w and BD(Ω) is the space of functions of bounded deformation, i.e., the functions whose symmetrised distributional gradient can be represented by a finite Radon measure, see [TS80, Tem85]. One should note the difference between ICTVβ,α and TGV 2 β,α by observing the two definitions (1.12) and (1.15). While the minimisation in the ICTVβ,α definition (1.12) is restricted over the functions that are gradients, the minimisation in (1.15) is done over the wider space BD(Ω) leading eventually for a more optimal decomposition in piecewise constant and piecewise affine parts [Mu¨l13]. As a result, the total generalised variation–based regularisation, i.e., minimisation of (1.4) with Ψ(u) = TGV2β,α(u), has the ability to adapt to the regularity of the data and images restored with this method are typically piecewise smooth that is to say, not only the discontinuities but also the affine structures are preserved, see Figure 1.10. Let us note here that a modified infimal convolution approach that has been recently 30 1.4. Higher order variational models proposed in the discrete setting by Setzer et al. in [SST11] has very similar results with TGV and in fact in some if its variants the two methods are equivalent. (a) Noisy image. (b) Total variation denoising. (c) Total generalised variation de- noising. Figure 1.10: Illustration of the ability of total generalised variation to preserve edges and affine structures in the same time, thus eliminating the staircasing artifacts. Some successful applications of total generalised variation in image restoration and re- lated tasks have been done in image denoising [BKP10, Bre14, BDH13], image deblurring [BV11, Bre14], image zooming [Bre14, BH13b], MRI reconstruction [KBPS11, BGH+14], diffusion tensor imaging [VBK13] and JPEG decompression [BH12a, Hol13]. Moreover there have already been a few contributions to the analysis of the model. Properties of the one dimensional L1–TGV model were studied by Bredies, Kunisch and Valkonen in [BKV13], while in [BH13a] Bredies and Holler studied the well-posedness of the corre- sponding Tikhonov regularisation problem established in the BD space. A comparison among TV, TV2, ICTV and TGV regularisation is done in [BBBM13] by Benning et al. in the context of eigenfunctions, see also Benning’s Ph.D. thesis [Ben11]. In particular, the authors investigate the capability of the three methods to recover certain one dimensional data exactly apart from a loss of contrast (almost exact recovery). They compute exact solutions of the corresponding minimisation problems for data that can be recovered al- most exactly by one method but not the others. Additionally in his Ph.D. thesis [Mu¨l13], Mu¨ller constructed a 2D function that can be recovered almost exactly by TGV but not by ICTV, emphasising the differences between the two functionals in two dimensions. Note that ICTV and TGV regularisations coincide in dimension one since in that case every L1 function can be written as a weak derivative of another function, compare again definitions (1.12) and (1.15). The main disadvantage of TGV regularisation problem is the increased computational time involved to its solution. The most popular numerical method that is used for this purpose is the primal-dual algorithm of Chambolle–Pock [CP11]. Even though TGV min- imisation is solved fairly fast with this algorithm, it is much slower than TV minimisation solved with the state of the art methods that have been developed for it, e.g. the split Bregman algorithm [GO09]. 31 Introduction Inpainting methods Higher order inpainting methods perform in general better than first order ones, like total variation inpainting, because of the additional directional information used for the interpolation process. Euler’s elastica is a popular higher order variational method of this kind which is able to achieve large gap connectivity, see Nitzberg et al. [NMS93], Masnou and Morel [MM98], Chan, Kang and Shen [CKS02] and also Tai, Hahn and Chung [THC11]. In Euler’s elastica, the regularising term reads ˆ Ω ( α+ β ( ∇ · ∇u|∇u| )2) |∇u| dx, (1.16) i.e., is a combination of the total variation and the curvature of the level lines of u. How- ever, one disadvantage of the Euler’s elastica inpainting model for analytic and numerical issues is that it is a non-convex minimisation problem. Convex approximations to the elastica energy have been studied in [BPW13]. Other examples of higher order inpainting are the Cahn–Hilliard inpainting by Bertozzi, Ese- doglu and Gillette [BEG07], TV–H−1 inpainting by Burger et al. [BHS09], Scho¨nlieb et al. [SBBH09] and Hessian–based surface restoration by Lai et al. [LTC13]. Moreover, curvature driven diffusion inpainting has been considered by Chan and Shen [CS01] while Mumford–Shah based inpainting has been proposed by Esedoglu and Shen [ES02]. Finally, inpainting via transport (first order inpainting) has also been considered, see for instance Bertalmı´o et al. [BSCB00] and Bornemann and Ma¨rz [BM07]. All the aforementioned inpainting methods are local methods, that is to say the in- formation that is needed to fill in the inpainting domain D is only taken from points neighboring the boundary of D. Non-local or global methods take into account all the information from the known part of the image, usually weighted by its distance to the point that is to be filled in. This class of methods is very powerful, allowing to fill in structures and textures almost equally well. However, they still have some disadvantages, for instance high computational cost is involved in their solution. Moreover local methods are sometimes more desirable, as they tend to preserve geometrical features better, espe- cially in relatively small inpainting domains. For non-local inpainting methods the reader is referred to e.g. Cao at al. [CGMP09], Criminisi et al. [CPT03] and Arias, Caselles and Sapiro [ACS09]. 1.5 Contribution This thesis consists of three main distinguishable parts as these also result from the asso- ciated papers that have been produced by the author as part of his Ph.D. [PS14, PSS13], [PB13], [LPSS]. In [PS14], we introduce, analyse and implement an efficient combined first 32 1.5. Contribution and second order regularisation method with applications in image denoising and deblur- ring, also providing an additional experimental analysis for image inpainting accompanied by an online demonstration in [PSS13]. We contribute to the analysis of TGV in [PB13] by analytically solving the one dimensional L2–TGV denoising problem for simple data functions and studying further properties of the model. Finally in [LPSS], we introduce and study a non-local Hessian functional, proving its localisations to the classical Hes- sian in several regularity levels, a work that also leads to some novel characterisations of higher order Sobolev and BV spaces in the spirit of Bourgain, Brezis, Mironescu [BBM01]. We also introduce an alternative definition of non-local Hessian, that is able to produce high quality piecewise affine reconstructions in image denoising as this is confirmed by numerical examples. In what follows, let us give an outline for each of these contributions which will be presented in more details in Chapters 3, 4 and 5. 1.5.1 The TV–TV2 approach for image reconstruction In Chapter 3, we introduce the following combined first and second order regulariser (TV– TV2) for a function u ∈ BV2(Ω): α ˆ Ω ϕ1(∇u) dx+ β ϕ2(D2u)(Ω), (1.17) where ϕ1, ϕ2 are two convex functions ϕ1 : R2 → R+, ϕ2 : R4 → R+ with at most linear growth at infinity. As usual α, β are positive regularisation parameters that balance the two terms. The definition is set up in the context of convex functions of measures – the term D2u is generally a finite Radon measure. This is done in order to define rigorously expressions like √|D2u|2 +  (where φ2(x) = √|x|2 + ) making them equal to√|∇u|2 +  in the special case when D2u = ∇2uL2, i.e., when the measure D2u is representable by a function ∇2u. In the case when ϕ2 is the Euclidean norm | · | the total variation of the measure |D2u|(Ω) is recovered. We study the corresponding variational problem min u∈BV2(Ω) 1 2 ˆ Ω (Tu− f)2dx+ α ˆ Ω ϕ1(∇u) dx+ β ϕ2(D2u)(Ω). (1.18) Under some mild additional assumptions, we prove the existence and uniqueness for so- lutions of (1.18), via the method of relaxation in the spirit of [Ves01], i.e., by identifying the minimising functional’s lower semicontinuous envelope with respect to the naturally defined weak∗ and strict topologies in BV2(Ω). The method is implemented with the split Bregman algorithm [GO09] to whose convergence theory we also contribute and we present applications in image denoising and deblurring. The idea of this combination of first and second order dynamics is to regularise with a fairly large weight α in the first order term preserving the jumps as well as possible and 33 Introduction using a not too large weight β for the second order term, in a way that staircasing artifacts created by the first order regulariser are reduced without introducing any serious blur in the reconstructed image. We show that for image denoising and deblurring the model (1.18) offers solutions whose quality (accessed by the structural similarity index SSIM [WBSS04, WB09]) is not far off from the ones produced by the TGV. Moreover, the computational effort needed for its numerical solution is not much more than the one needed for solving the standard total variation model. For comparison the numerical solution for TGV regularisation is in general about ten times slower than this, see corresponding table in Section 3.6. We proceed in applying the TV–TV2 method in image inpainting, where we observe that we can achieve connectivity over larger gaps than total variation. Thus, the method results in more naturally looking inpainted images see Figure 1.11. The minimisation of (a) Destroyed image. (b) TV inpainted image. (c) TV2 inpainted image. (d) Detail of destroyed image. (e) Detail of TV inpainted image. (f) Detail of TV2 inpainted image Figure 1.11: Illustration of the differences between pure TV and pure TV2 inpainting reconstructions. the TV–TV2 functional can be seen as convex simplification of the Euler’s elastica idea, where we have replaced the non-convex curvature by the convex total variation of the gradient of u. In an experimental analysis we indicate the differences between pure first and pure second order total variation inpainting, we study the influence of the inpainting domain on large gap connection and we propose a numerically justified rule for optimal tuning of the parameters within the split Bregman iteration. We thus end up with a fast implementation of TV–TV2 inpainting. Finally, we provide a source code for the algorithm written in C and an online demonstration on Image Processing Online (IPOL) platform, 34 1.5. Contribution accessible on the article web page http://dx.doi.org/10.5201/ipol.2013.40. 1.5.2 Exact solutions of TGV minimisation in one dimension In Chapter 4, we study the one dimensional L2–TGV denoising model: min u∈BV(Ω) 1 2 ˆ Ω (u− f)2dx+ TGV2β,α(u), (1.19) where Ω = (a, b) is an open interval of R. The motivation is to understand deeper how this kind of regularisation behaves and to unravel the role of the parameters α and β. We formulate the predual problem of (1.19) and we derive the corresponding optimality conditions in the spirit of [Rin00, BKV13] using Fenchel–Rockafellar duality. We examine some of the basic properties of the solutions, e.g., behaviour near and away from the boundary, preservation of discontinuities and facts about the L2–linear regression. We also show that at least for even data functions, TGV and TV regularisations coincide for large enough ratio β/α and we extend this result in dimension two. Moreover we compute exact solutions of (1.19) for three different data functions f : a piecewise constant function with a single jump, a piecewise affine function with a single jump and a hat function, see for instance Figures 1.12 and 1.13. The length of the interval (a, b), the size of the jump and the slope of the hat function appear as parameters in the solutions formulae. Emphasis is given on how the characteristic features of the solutions (discontinuities, piecewise affinity) are affected by the parameters α and β. The analysis is furnished with some numerical experiments that verify our computations. Note: We should mention here that a paper by Po¨schl and Scherzer [PS13], has been prepared at the same time, independently of our work which also addresses exact solutions of L2–TGV in dimension one. Their focus is on determining how to choose the parameters α and β such that the solutions do not coincide with the respective L2–TV and L2–TV2 regularisations. There, the authors compute exact solutions for a characteristic function of an interval I ⊆ (a, b) and for a hat function, with a fixed slope. 1.5.3 Non-local Hessian In a pioneering work, Bourgain, Brezis and Mironescu [BBM01] examined functionals of the type ˆ Ω ˆ Ω |u(x)− u(y)|p |x− y|p ρn(x− y)dxdy, (1.20) where ρn is a sequence of radial functions that concentrates at the origin as n tends to infinity and 1 ≤ p < ∞. They characterised the spaces W 1,p(Ω) for p > 1 and BV(Ω) by 35 Introduction f u (a) f u (b) f u (c) f u (d) Figure 1.12: All the possible types of solutions (red) of the L2–TGV denoising problem for a single jump data function. f u (a) f u (b) f u (c) f u (d) Figure 1.13: All the possible types of solutions (red) of the L2–TGV denoising problem for a hat function. showing that u ∈W 1,p(Ω) ⇐⇒ u ∈ Lp(Ω) and lim inf n→∞ ˆ Ω ˆ Ω |u(x)− u(y)|p |x− y|p ρn(x− y)dxdy <∞, u ∈ BV(Ω) ⇐⇒ u ∈ L1(Ω) and lim inf n→∞ ˆ Ω ˆ Ω |u(x)− u(y)| |x− y| ρn(x− y)dxdy <∞. The above characterisations were used by Aubert and Kornprobst [AK09] to solve some non-local variational problems in image denoising. In fact, there exists a fairly large amount of literature regarding non-local models for image reconstruction – we have already mentioned some non-local inpainting methods in Section 1.4. We mention as well the work by Gilboa and Osher [GO07, GO08], Buades, Bartomeu, Morel (non-local means) [BCM05] and also the Ph.D. thesis of Sawatzky [Saw11]. In [GM01], Gobbino and Mora examined the approximation of functionals depending on the gradient of u and on the behaviour of u near the discontinuity points by families of non-local functionals where the gradient is replaced by finite differences. Ponce [Pon04] derived similar characterisations with [BBM01] by studying functionals of the type ˆ Ω ˆ Ω ω ( |u(x)− u(y)| |x− y| ) ρ(x− y)dxdy, (1.21) where ω is a continuous function and ρ are not necessarily radial. Mengesha and Spector in [MS13] introduced the following non-local gradient operator: 36 1.5. Contribution Gnu(x) = N ˆ Ω u(x)− u(y) |x− y| x− y |x− y|ρn(x− y)dy, x ∈ Ω. (1.22) Functional (1.22) is defined rigorously as a distribution. The authors prove the localisation of the functionals (1.22) to their classical analogue ∇u, in various topologies and they obtain yet another characterisation of the spaces W 1,1(Ω) and BV(Ω): u ∈W 1,p(Ω) ⇐⇒ u ∈ Lp(Ω) and lim inf n→∞ ‖Gnu‖Lp(Ω) <∞, for p > 1, u ∈ BV(Ω) ⇐⇒ u ∈ L1(Ω) and lim inf n→∞ ‖Gnu‖L1(Ω) <∞, for p = 1. In Chapter 5 which constitutes the final part of the thesis, we introduce and study the following higher order non-local functional Hnu(x) = d(d+ 2) 2 ˆ RN d 2u(x, y) |x− y|2 ( (x− y))⊗ (x− y)− |x−y|2d+2 Id ) |x− y|2 ρn(x− y)dy, x ∈ R d, (1.23) where d 2u(x, y) := u(y) − 2u(x) + u(x + (x − y)) and the functions ρn are radial. We call (1.23) the explicit formulation of non-local Hessian. We show that the functional (1.23) localises as n tends to infinity to the continuous analogue ∇2u, in the topology that corresponds to the regularity of u. In particular we show that if u is smooth, this convergence is uniform, while if u ∈ W 2,p(Rd), 1 ≤ p < ∞ then we have Lp convergence. Finally, if u ∈ BV2(Rd) then we show that the sequence of measures HnuLd converges weakly∗ to D2u. Finally after introducing of a second order non-local integration by parts formula we are able to derive non-local characterisations of the spaces W 2,p(Ω) for 1 ≤ p <∞ and BV2(Rd). In particular, we prove the following u ∈W 2,p(Rd) ⇐⇒ u ∈ Lp(Rd) and lim inf n→∞ ‖Hnu‖Lp(Rd) <∞, p > 1, u ∈ BV2(Rd) ⇐⇒ u ∈ L1(Rd) and lim inf n→∞ ‖Hnu‖L1(Rd) <∞, p = 1. We then proceed to an alternative, implicit formulation of a non-local Hessian func- tional which is more suitable for variational problems in imaging. We define the non-local gradient and non-local Hessian of u at a point x with a positive weighting function σx, denoted by Gσxu(x) and Hσxu(x) respectively, as the minimisers of (Hσxu(x),Gσxu(x)) := argmin H′∈Rd×d,G′∈Rd 1 2 ˆ Ω−{x} Ru,G′,H′(x, z)2σx(z) dz, (1.24) where Ru,G′,H′(x, z) = u(x+ z)− u(x)−G′>z − 1 2 z>H ′z. (1.25) Here Ω ⊆ Rd is an open and bounded domain. The weighting function σx is determined 37 Introduction by solving a weighted Eikonal equation, thus making it aware of edges in the image u. We provide numerical examples in image denoising where we use the L1 norm of this implicit form of non-local Hessian as a regulariser, i.e., Ψ(u) = α ‖Hσu‖L1(Ω), for α > 0. Our results indicate that this model is able to preserve edges and promote piecewise affine reconstructions, achieving results of better quality than TGV does in that context. 1.6 Organisation of the thesis In Chapter 2 we provide some mathematical preliminaries mostly on Radon measures, functions of bounded variation, lower semicontinuous envelopes and some results from convex analysis, following mainly [AFP00] [DM93] and [ET76]. In Chapter 3 we discuss the TV–TV2 approach for image reconstruction following [PS14] and [PSS13]. The well-posedness of the method is shown and we discuss its appli- cations to image denoising, deblurring and inpainting by implementing it with the split Bregman algorithm. In Chapter 4 we study the one dimensional total generalised variation denoising prob- lem. We examine the basic properties of the model and we compute exact solutions for simple data functions. All the material is contained in [PB13] apart from Section 4.5 in where we state a result on TV–TGV relationship in dimension two that was obtained after the submission of [PB13]. In Chapter 5, we introduce and study the explicit form of non-local Hessian func- tional following [LPSS]. We prove its localisation to the classical Hessian and we provide some non-local characterisations of W 2,p(Rd) and BV2(Rd) spaces. We also introduce the implicit non-local Hessian formulation and apply it in image denoising. Some Lemmas that require simple but extensive computations were moved to Appendix A for the sake of the nice presentation of the thesis. 38 Chapter 2 Mathematical preliminaries In this chapter we summarise some mathematical tools and notions that are necessary for the rest of the thesis. In particular, after recalling some basic facts about Radon measures, we proceed to a review of the theory of functions of bounded variation which is central for our work. After that, we state some basic facts about lower semicontinuous envelopes or otherwise called relaxed functionals. Finally, we mention some results from convex analysis mainly from duality theory. We assume that the reader has a basic knowledge of real and functional analysis, measure theory and Sobolev space theory. For these subjects we refer the reader to [DiB02, Bol90, LL01, Eva10]. 2.1 Notation on function spaces We start with a few remarks regarding our notation on functions spaces. Let µ be a positive measure on a metric measure space (X,B(X)), where B(X) denotes the σ-algebra of X. We denote by Lp(X,R`;µ) the space of R`–valued, µ–measurable functions u : X → R`, such that ´ X |u|pdµ <∞, for 1 ≤ p <∞ and the space of µ–essentially bounded functions for p =∞. These spaces are Banach spaces with the norms ‖u‖Lp(X,R`;µ) = (ˆ X |u|pdµ )1/p , 1 ≤ p <∞, ‖u‖L∞(X,R`;µ) = ess sup x∈X |u(x)|, p =∞. When the measure µ is omitted then it is assumed to be the Lebesgue measure Ld in Rd (L in R) and of course in that case X is a subset of Rd. Thus, Lp(X,R`) is a short version of Lp(X,R`;Ld). When the range of the functions, R`, is omitted then it is assumed that ` = 1. Hence, Lp(X) is a short version of Lp(X,R;Ld). This notation rule applies in all the function spaces. 39 Mathematical preliminaries The space of R`–valued continuous functions with compact support in X is denoted by Cc(X,R`) and it is endowed with the supremum norm ‖u‖∞ = supx∈X |u(x)|. The completion of Cc(X,R`) under the supremum norm is denoted by C0(X,R`). Consistently, Cc(X) and C0(X) are short versions of Cc(X,R) and C0(X,R) respectively. The space of R`–valued, k–times continuously differentiable functions with compact support in Ω, where Ω is an open domain in Rd, is denoted by Ckc (Ω,R`). When k = ∞, we have the space of smooth functions of compact support in Ω. Again, Ckc (Ω) and C∞c (Ω) stand for Ckc (Ω,R) and C∞c (Ω,R) respectively. Similarly, the corresponding spaces of differentiable functions with not necessarily compact support are denoted by Ck(Ω,R`), C∞(Ω,R`), Ck(Ω) and C∞(Ω). As usual, W k,p(Ω) denotes the Sobolev space of functions whose distributional deriva- tives from order zero up to k are Lp functions. The corresponding norm is ‖u‖Wk,p(Ω) = ∑ |a|≤k ˆ Ω |Dau|pdx 1/p , where Dau denotes the a–th distributional derivative of u and |a| = a1 + · · · + ad is the order of the multiindex a = (a1, . . . , ad). Furthermore, H k(Ω) denotes the Hilbert space W k,2(Ω) and Hk0 (Ω) is the completion of C∞c (Ω) under the ‖ · ‖Hk(Ω) norm. As expected, the corresponding R`–valued Sobolev spaces are denoted by W k,p(Ω,R`), Hk(Ω,R`) and Hk0 (Ω,R`). We denote with Ck(Ω) the space of k–times differentiable functions, with continuous derivatives up to the boundary of Ω. Finally, matrix–valued functions are also considered. For instance C∞c (X,R`×d) denotes the space of R`×d matrix–valued, smooth functions of compact support in X and similar notations hold for other spaces. Note that the norm used for matrices will always be the Frobenious norm, unless stated otherwise. 2.2 Radon measures Since standard textbooks are mainly concerned with positive measures, we give here a condensed account of finite Radon measures. For this and the following section on func- tions of bounded variation, we are following [AFP00] but we also refer the reader to [EG92, Giu84, Rud87]. Definition 2.2.1 (Finite Radon measures). Let X be a locally compact, separable metric space and let B(X), denote the Borel σ-algebra of X. We say that µ : B(X) → R` is an R`–valued finite Radon measure if µ(∅) = 0 and for any sequence (An)n∈N of pairwise 40 2.2. Radon measures disjoint elements of B(X) we have µ ( ∞⋃ n=0 An ) = ∞∑ n=0 µ(An). The space of R`–valued finite Radon measures is denoted by M(X,R`), while in the case ` = 1 (real valued measures), we simply write M(X). Note that every µ ∈ M(X,R`) can be written as µ = (µ1, . . . , µ`) where µi ∈ M(X) for every i = 1, . . . , `. Definition 2.2.2 (Total variation measure). For a measure µ ∈ M(X,R`) we define the total variation measure of |µ| : B(X)→ R+, where R+ is the set of positive real numbers, as follows |µ|(A) = sup { ∞∑ n=1 |µ(An)| : An ∈ B(X) pairwise disjoint, A = ∞⋃ n=1 An } , A ∈ B(X). It can be shown that if µ ∈M(X,R`) then |µ| is a finite positive measure. It is also useful to introduce the notation M+(X) for the set of finite positive Radon measures on X and M+loc(X) for the set of positive Radon measures on X, i.e., all the positive measures in (X,B(X)) that are finite on every compact subset of X. The spaceM(X,R`) is a Banach space under the norm ‖µ‖M(X,R`) = |µ|(X). In fact, as the next theorem shows, the space M(X,R`) can be regarded as the dual space of (C0(X,R`), ‖ · ‖∞). Theorem 2.2.3 (Riesz representation theorem). Let X be a locally compact, separable metric space and let T be a bounded linear functional on (C0(X,R`), ‖ · ‖∞). Then there exists a unique µ ∈M(X,R`) such that T (u) = ∑` i=1 ˆ X ui dµi, ∀u ∈ C0(X,R`), i.e., T can be represented by the measure µ. Moreover, we also have ‖T‖ = |µ|(X), where ‖ · ‖ denotes the operator norm in the dual space of C0(X,R`). From the definition of the operator norm in a dual space and the fact that C∞c (X,R`) is dense in C0(X,R`) under the supremum norm, we have that for every µ ∈M(X,R`) ‖µ‖M(Ω,R`) = sup { 〈µ, v〉 : v ∈ C∞c (X,R`), ‖v‖∞ ≤ 1 } , (2.1) 41 Mathematical preliminaries where here 〈µ, v〉 denotes the duality pairing 〈µ, v〉 = ∑`i=1 ´X ui dµi. Thus (2.1) gives an alternative definition for ‖µ‖M(X,R`) and provides a nice framework to extend its use in distributions: Given an R`–valued distribution T in an open domain Ω ⊆ Rd we define the Radon norm of T to be ‖T‖M(Ω,R`) = sup { 〈T, v〉 : v ∈ C∞c (Ω,R`), ‖v‖∞ ≤ 1 } . (2.2) In fact, as a consequence of Theorem 2.2.3 we have that‖T‖M(Ω,R`) < ∞ if and only if T is a finite Radon measure µ and in that case ‖T‖M(Ω,R`) = |µ|(Ω). When there is no ambiguity for the domain and the range of T we will simply write ‖T‖M. We will often consider the Lebesgue decomposition of a finite Radon measure with respect to a positive measure. We first recall the notion of absolute continuity and mutual singularity for measures. Definition 2.2.4. Let (X,B(X)) be a measure space. (i) Let µ ∈ M+loc(X) and ν ∈ M(X,R`) . We say that ν is absolutely continuous with respect to µ and we write ν  µ, if whenever µ(A) = 0 then |ν|(A) = 0. (ii) Let µ, ν ∈M(X,R`). We say that µ and ν are mutually singular and we write µ⊥ν if there exists an A ∈ B(X) such that |µ|(E) = 0 and |µ|(X \ E) = 0. We note here that if µ ∈ M+loc(X) and f ∈ L1(X,R`;µ) then fµ denotes the measure in M(X,R`) defined by fµ(A) = ˆ A f dµ, A ∈ B(X). Theorem 2.2.5 (Lebesgue decomposition). Let µ be a σ-finite measure in M+loc(X) and ν ∈M(X,R`). Then there exists a unique pair να, νs ∈M(X,R`) such that ν = να + νs, να  µ, νs⊥µ. Moreover there exists a unique f ∈ L1(X,R`;µ) such that να = fµ. The function f is called the Radon–Nikody´m density of να with respect to µ and is denoted by ν α µ . From now on, unless otherwise stated, all the Lebesgue decompositions will be consid- ered with respect to the Lebesgue measure Ld. Thus, notation-wise µα and µs denote the absolutely continuous and singular part of µ with respect to the Lebesgue measure. It is very easy to check that a finite Radon measure µ is always absolutely continuous with respect to its total variation measure |µ| and hence it has a density with respect to it. This leads to the following theorem. Theorem 2.2.6 (Polar decomposition). Let µ ∈ M(X,R`). Then there exists a unique function sgn(µ) ∈ L1(X,R`; |µ|), equal to 1 µ-almost everywhere such that µ = sgn(µ)|µ|, i.e, sgn(µ) = µ|µ| . 42 2.3. Functions of bounded variation We finish this section mentioning two important notions of convergence in M(X,R`), namely the weak∗ and the strict convergence. Definition 2.2.7 (Weak∗ convergence of measures). Let µ ∈ M(X,R`), (µn)n∈N ⊆ M(X,R`). We say that (µn)n∈N converges weakly∗ to µ if for every u ∈ C0(X) we have lim n→∞ ˆ X u dµn = ˆ X u dµ, where the above integrals are computed component-wise, i.e, ˆ X u dµ = (ˆ X u dµ1, . . . , ˆ X u dµ` ) . Observe that the above notion of weak∗ convergence corresponds exactly to the usual notion of weak∗ convergence in duals of Banach spaces since from the Riesz representation theorem M(X,R`) is the dual space of C0(X,R`). Definition 2.2.8 (Strict convergence of measures). Let µ ∈M(X,R`), (µn)n∈N ⊆M(X,R`). We say that (µn)n∈N converges strictly to µ if µn → µ, weakly∗ in measures and lim n→∞ ‖µn‖M(X,R`) = ‖µ‖M(X,R`). Recall that from the general Banach space theory, if µn → µ weakly∗ then ‖µ‖M(X,R`) ≤ lim infn→∞ ‖µn‖M(X,R`), i.e., “mass” can still “escape” at infinity. Thus, strict convergence is a stronger notion of convergence than weak∗. 2.3 Functions of bounded variation 2.3.1 Definitions and properties The space of functions of bounded variation is a central notion in mathematical imaging as functions that belong to that space are allowed to have jump discontinuities which are interpreted as sharp edges in images. Here, we provide a short summary of the theory of this space, giving emphasis to theorems that we are going to use. As usual Ω denotes an open domain in Rd. Definition 2.3.1 (Space of functions of bounded variation). A function u ∈ L1(Ω,R`) is said to be a function of bounded variation or else u ∈ BV(Ω,R`) if its distributional derivative Du can be represented by a R`×d–valued finite Radon measure (also denoted by 43 Mathematical preliminaries Du). That is to say, Du ∈M(Ω,R`×d) and if u = (u1, . . . , u`) then ˆ Ω ua ∂v ∂xi dx = − ˆ Ω v dDiu a, ∀v ∈ C∞c (Ω), i = 1, . . . , d, a = 1, . . . , `. Consistently with our notation, BV(Ω) denotes the space BV(Ω,R). According to the Lebesgue decomposition theorem Du can be decomposed to the abso- lutely continuous and singular part with respect to the Lebesgue measure Du = Dαu+Dsu, where Dαu and Dsu are short versions of (Du)α and (Du)s respectively. We will also use the traditional notation ∇u for the absolutely continuous part of Du, that is ∇u = Dαu. When d = ` = 1, ∇u will be also denoted by u′. It is immediate from the Definition 2.3.1 that W 1,1(Ω,R`) ⊆ BV(Ω,R`) since if u ∈W 1,1(Ω,R`) then Du = ∇uLd. For a function u ∈ L1(Ω,R`) the total variation of u, TV(u), is defined as TV(u) = sup {∑` a=1 ˆ Ω uadivvadx : v ∈ C1c (Ω,R`×d), ‖v‖∞ ≤ 1 } . It can be proven that TV(u) <∞ if and only if u ∈ BV(Ω,R`) and in that case TV(u) = |Du|(Ω). If u ∈ W 1,1(Ω,R`) then |Du|(Ω) is simply the L1 norm of ∇u, ´Ω |∇u| dx. It can be easily checked that the total variation is lower semicontinuous with respect to the strong L1 convergence. The space BV(Ω,R`) endowed with the norm ‖u‖BV(Ω,R`) = ‖u‖L1(Ω,R`) + |Du|(Ω), is a Banach space. However the topology induced by that norm is very strong, in fact smooth functions are not dense with respect to that topology. To that scope, two weaker notions of convergence are introduced, namely the weak∗ and the strict convergence. Definition 2.3.2 (Weak∗ convergence in BV). Let u ∈ BV(Ω,R`), (un)n∈N ⊆ BV(Ω,R`). We say that the sequence (un)n∈N converges to u weakly∗ in BV(Ω,R`) if (un)n∈N converges to u in L1(Ω,R`) and (Dun)n∈N converges to Du weakly∗ in M(Ω,R`×d), i.e., lim n→∞ ‖un − u‖L1(Ω,R`) = 0 and limn→∞ ˆ Ω v dDun = ˆ Ω v dDu, ∀v ∈ C0(Ω). Definition 2.3.3 (Strict convergence in BV). Let u ∈ BV(Ω,R`), (un)n∈N ⊆ BV(Ω,R`). We say that the sequence (un)n∈N converges to u strictly in BV(Ω,R`) if (un)n∈N converges 44 2.3. Functions of bounded variation to u in L1(Ω,R`) and the total variations (|Dun|(Ω))n∈N converge to |Du|(Ω), i.e., lim n→∞ ‖un − u‖L1(Ω,R`) = 0 and limn→∞ |Dun|(Ω) = |Du|(Ω). It can be easily verified that strict convergence is induced by the following metric d(u, v) = ‖u− v‖L1(Ω,R`) + ||Du|(Ω)| − |Dv|(Ω)| . Moreover it can be also shown that strict convergence implies weak∗ convergence while the opposite implication is not true in general. The usefulness of the introduction of these weaker notions of convergence can be seen in the following theorem. Theorem 2.3.4 (Strict approximation by smooth functions in BV). Let u ∈ L1(Ω,R`). Then u ∈ BV(Ω,R`) if and only if there exists a sequence (un)n∈N ∈ C∞(Ω,R`) ∩ W 1,1(Ω,R`) such that lim n→∞ ‖un − u‖L1(Ω,R`) = 0 and limn→∞ ˆ Ω |∇un| dx <∞ In fact, if u ∈ BV(Ω,R`) then the sequence (un)n∈N can be chosen such that lim n→∞ ˆ Ω |∇un| dx = |Du|(Ω), i.e., u can be approximated strictly in BV(Ω,R`) by a sequence of smooth functions. The following compactness theorem is a useful tool for proving the well-posedness of certain variational problems through the direct method of calculus of variations. Recall that the lack of reflexivity of the Sobolev space W 1,1(Ω) is translated to lack of weak compactness. Thus, as we also see in the next sections, minimisation problems originally posed in W 1,1(Ω) and are often embedded in a suitable way in BV(Ω) whose compactness properties guarantee a solution. Theorem 2.3.5 (Compactness in BV). Let Ω be an open bounded domain in Rd with Lipschitz boundary and let (un)n∈N be a sequence in BV(Ω,R`) such that sup n∈N ‖un‖BV(Ω,R`) <∞. Then there exists a subsequence (unk)k∈N that converges to some u ∈ BV(Ω,R`), weakly∗ in BV(Ω,R`). We now state a useful embedding theorem and the version of Poincare´ inequality in BV. We define 1∗ = d/(d − 1) when d > 1 and 1∗ = ∞ when d = 1. Note also that uΩ 45 Mathematical preliminaries denotes the mean value of u in Ω, i.e., uΩ := 1 Ld(Ω) ˆ Ω u dx. (2.3) Theorem 2.3.6. ( BV embedding theorem / Poincare´ inequality) Let Ω be an open bounded domain in Rd with Lipschitz boundary. Then BV(Ω) ⊆ L1∗(Ω) with continuous embedding. Moreover if Ω is connected, the following Poincare´ type inequality holds ‖u− uΩ‖Lp(Ω) ≤ C|Du|(Ω), ∀u ∈ BV(Ω), 1 ≤ p ≤ 1∗, for some constant C depending only on Ω. We finish this section mentioning some facts about traces in BV. Theorem 2.3.7 (Boundary trace theorem in BV). Let Ω be an open bounded domain in Rd with Lipschitz boundary and let u ∈ BV(Ω,R`). Then for Hd−1–almost every x ∈ ∂Ω, there exists uΩ(x) ∈ R` such that lim δ→0 1 δd ˆ Ω∩B(x,δ) |u(y)− uΩ(x)| dy = 0. Moreover, ‖uΩ‖L1(∂Ω,R`;Hd−1) ≤ C‖u‖BV(Ω,R`), (2.4) for some constant C that depends only on Ω. The function uΩ ∈ L1(∂Ω,R`;Hd−1) is called the trace of u on ∂Ω. Moreover the extension u of u to 0 out of Ω belongs to BV(Rd,R`) and viewing Du as a measure on the whole of Rd and concentrated on Ω, Du is given by Du = Du+ (uΩ ⊗ νΩ)Hd−1b∂Ω, where νΩ is the outward unit normal vector on ∂Ω and Hd−1b∂Ω is the (d−1)–dimensional Hausdorff measure restricted on ∂Ω. As a consequence of the boundary trace theorem, we also get the following results: Theorem 2.3.8 (Integration by parts). Let Ω be an open bounded domain in Rd with Lipschitz boundary and let u ∈ BV(Ω,R`). Then for every i = 1, . . . , d, a = 1, . . . , ` and every function v ∈ C1(Ω) we have ˆ Ω ua ∂v ∂xi dx = − ˆ Ω vdDiu a + ˆ ∂Ω (uΩ)a(νΩ)iv dHd−1 (2.5) Note that by summing (2.5) over i and by using an appropriate affine function v we 46 2.3. Functions of bounded variation get the following estimate∣∣∣∣ˆ Ω u dx ∣∣∣∣ ≤ K (|Du|(Ω) + ‖uΩ‖L1(∂Ω,R`;Hd−1)) , ∀u ∈ BV(Ω), (2.6) where the constant K > 0, depends only on d and Ω. If Ω is connected, we can combine further the above estimate with the Poincare´ inequality (2.3) and get ‖u‖L2(Ω) ≤ C ( |Du|(Ω) + ‖uΩ‖L1(∂Ω,R`;Hd−1) ) , ∀u ∈ BV(Ω), (2.7) where again, the constant C > 0, depends only on d and Ω. Proposition 2.3.9. Let Ω be an open bounded domain in Rd with Lipschitz boundary, u ∈ BV(Ω,R`) and v ∈ BV(Rd \ Ω,R`). Then the function w(x) = u(x) if x ∈ Ω,v(x) if x ∈ Rd \ Ω, belongs to BV(Rd,R`) and viewing Du (respectively Dv) as a measure on the whole of Rd and concentrated on Ω (respectively Rd \ Ω), we have Dw = Du+Dv + ( uΩ − vRd\Ω ) ⊗ νΩHd−1b∂Ω. 2.3.2 Good representatives For this section, Ω will be a bounded open interval (a, b) ⊆ R. Since every function u ∈ L1(a, b) is defined as an equivalence class of functions that are equal almost everywhere, it is not possible to make sense of pointwise values of u. In BV(a, b) this problem can be overcome through the introduction of good representatives. This is a notion that we are going to use often in Chapter 4. Definition 2.3.10 (Pointwise variation). For any function u : (a, b) → R, the pointwise variation pV(u, (a, b)) of u in (a, b) is defined as pV(u, (a, b)) = sup { n−1∑ i=1 |u(xi+1)− u(xi)| : n ≥ 2, a < x1 < · · · < xn < b } . As one can observe the pointwise variation is extremely sensitive on the choice of the representative of u. This motivates the following definition where the pointwise variation in minimised among elements in the equivalence class of u. Definition 2.3.11 (Essential variation). For any function u : (a, b) → R, the essential 47 Mathematical preliminaries variation eV(u, (a, b)) of u in (a, b) is defined as eV(u, (a, b)) = inf {pV(v, (a, b)) : v = u, L–a.e. in (a, b)} . It can be proved that for every function u ∈ L1(a, b) the infimum in the definition of essential variation is achieved and it is equal to the total variation of u. Thus, if u ∈ BV(a, b) then eV(u, (a, b)) = |Du|(a, b). Any function u˜ in the equivalence class of u with the property pV(u˜, (a, b)) = |Du|(a, b) is called good or precise representative of u. The next theorem shows that these representatives can be characterised in a more explicit way and they have some good continuity and differentiability properties. We define the jump set of u, Ju, to be the set of atoms of Du, i.e., Ju = {x ∈ (a, b) : Du({x}) 6= 0}. Theorem 2.3.12 (Good representatives). Let u ∈ BV(a, b). Then the following state- ments hold: (i) There exists a unique c ∈ R such that ul(x) := c+Du((a, x)), ur(x) := c+Du((a, x]), ∀x ∈ (a, b), are good representatives of u, the left continuous and the right continuous one. Any other function u˜ : (a, b)→ R is a good representative of u if and only if u˜(x) ∈ { λul(x) + (1− λ)ur(x) : λ ∈ [0, 1] } , ∀x ∈ (a, b). (ii) Any good representative u˜ is continuous in (a, b) \ Ju and has a jump discontinuity at any point of Ju: u˜(x−) = ul(x) = ur(x−), u˜(x+) = ul(x+) = ur(x), ∀x ∈ Ju, where x− and x+ denote left and right limits at x. (iii) Any good representative u˜ is differentiable at L–a.e. point of (a, b) and the derivative u˜′ is the density of Du with respect to L. It is clear from the above that if u ∈ BV(a, b) then the functions uup and ulow are also good representatives of u where uup(x) := max{ul(x), ur(x)} and ulow(x) := min{ul(x), ur(x)}, ∀x ∈ (a, b). see Figure 2.1 for an illustration. 48 2.4. Lower semicontinuous envelopes (a) ur (b) ul (c) uup (d) ulow Figure 2.1: Good representatives ur, ul, uup and ulow of a BV function u. 2.4 Lower semicontinuous envelopes This section concerns lower semicontinuous envelopes of functionals or otherwise called relaxed functionals. We essentially follow [DM93] but [Bra02] is a good reference as well. We denote by R the set of extended real numbers, i.e., R = R ∪ {+∞}. Let X be a set endowed with some topology T . For every x ∈ X we denote by N (x) the set of all the open neighbourhoods of x in X. Definition 2.4.1 (Lower semicontinuity). A function F : X → R is called T –lower semicontinuous at a point x ∈ X if for every t ∈ R with t < F (x), there exists a U ∈ N (x) such that t < F (y) for every y ∈ U . For first countable topological spaces (hence for metric spaces as well) lower semicontinuity is translated to a more familiar expression: Proposition 2.4.2. Suppose that (X, T ) is a first countable topological space. Then the following are equivalent: (i) F is T –lower semicontinuous at x. (ii) For every sequence (xn)n∈N converging to x, we have F (x) ≤ lim inf n→∞ F (xn). Definition 2.4.3 (Lower semicontinuous envelope). Let F : X → R. The lower semicon- tinuous envelope or otherwise called the relaxed functional of F with respect to the topology 49 Mathematical preliminaries T , scT F : X → R is defined as scT F (x) = sup { G(x) : G : X → R, T –lower semicontinuous, G(y) ≤ F (y), ∀y ∈ X} , for every x ∈ R. It easy to check that scT F is the greatest T –lower semicontinuous functional which is smaller or equal than F . It can be also checked that scT F (x) = sup U∈N (x) inf y∈U F (y). Similarly to lower semicontinuity, if X is first countable then lower semicontinuous en- velopes can be easily characterised. Proposition 2.4.4. Suppose that (X, T ) is a first countable topological space. Then scT F is characterised by the following two properties: (i) For every sequence (xn)n∈N converging to x, we have scT F (x) ≤ lim inf n→∞ F (xn). (ii) There exists a sequence (xn)n∈N converging to x, such that scT F (x) ≥ lim sup n→∞ F (xn). The following proposition shows the usefulness of lower semicontinuous envelopes re- garding the existence of minimum points. Proposition 2.4.5. Let F : X → R and suppose that infx∈X F (x) ∈ R. Then if scT F has minimum point we have that min x∈X scT F (x) = inf x∈X F (x). 2.5 Convex analysis In this section we recall some useful tools from convex analysis. In the first part we introduce the basic concepts while in the second part we provide a short summary of the Fenchel–Rockafellar duality theory. For further study we refer the reader to [ET76] which we also follow here. 2.5.1 Basic concepts We start with a few definitions and remarks on notation. If (X, ‖ · ‖X) is a Banach space then X∗ denotes its dual space. The elements of X∗ are denoted with x∗, y∗, . . . etc.. The 50 2.5. Convex analysis indicator function of a set A ⊆ X is defined as IA(x) = 0 if x ∈ A,∞ if x /∈ A. For a function F : X → R we define the domain of F , dom(F ), to be the set of points where F takes finite values, i.e., dom(F ) = {x ∈ X : F (x) <∞}. A function F is called proper if dom(F ) 6= ∅. We now define the convex conjugate F ∗ of a function F : Definition 2.5.1 (Convex conjugate). Let F : X → R. Then the convex conjugate of F , F ∗ : X∗ → R is defined as F ∗(x∗) = sup x∈X 〈x∗, x〉 − F (x), ∀x∗ ∈ X∗. The following, easy to verify result, will be useful to us. Proposition 2.5.2. Let (X, ‖ · ‖X) be a Banach space 1 < p <∞ and F : X → R with F (x) = 1 p ‖x‖pX , ∀x ∈ X. Then the convex conjugate of F is F ∗(x∗) = 1 q ‖x∗‖qX∗ , x∗ ∈ X∗. where 1/p+ 1/q = 1. We finally recall the definition of the subdifferential of a function. Definition 2.5.3. Let (X, ‖·‖X) be a Banach space and F : X → R. We say that x∗ ∈ X∗ belongs to the subdifferential of F at x ∈ X and we write x∗ ∈ ∂F (x) if F (x) <∞ and 〈x∗, y − x〉+ F (x) ≤ F (y), ∀y ∈ X. If ∂F (x) 6= ∅ we say that F is subdifferentiable in x. The subdifferential of F at a given point can be also characterised as follows: Proposition 2.5.4. Let F : X → R. Then x∗ ∈ ∂F (x) if and only if F (x) + F ∗(x∗) = 〈x∗, x〉. 51 Mathematical preliminaries 2.5.2 Fenchel–Rockafellar duality Fenchel–Rockafellar duality theory provides a convenient framework for characterising solutions of variational problems. In this section, we present a basic review for variational problems that have a particular form, i.e., inf x∈X F1(x) + F2(Λ(x)), (2.8) where X, Y are Banach spaces, Λ : X → Y is a bounded linear operator and F1 : X → R, F2 : Y → R are proper, convex, lower semicontinuous functions. The minimisation problem (2.8) is called the primal problem and it is denoted by Pprimal. The dual problem of (2.8) is denoted by Pdual and is defined as sup y∗∈Y ∗ −F ∗1 (Λ∗(y∗))− F ∗2 (−y∗), (2.9) where Λ∗ : Y ∗ → X∗ is the adjoint operator of Λ. We denote with inf Pprimal and supPdual the infimum of (2.8) and the supremum of (2.9) respectively. It can be shown that it always holds supPdual ≤ inf Pprimal. If supPdual = inf Pprimal we say that no duality gap occurs. The following theorem shows how the solutions of Pprimal and Pdual are related when no duality gap occurs. Theorem 2.5.5 (Optimality conditions). Suppose that both problems Pprimal and Pdual have solutions and that supPdual = inf Pprimal, with that number being finite. Then all the solutions x of Pprimal and y∗ of Pdual are related through the following optimality conditions: Λ∗(y∗) ∈ ∂F1(x), (2.10) −y∗ ∈ ∂F2(Λ(x)). (2.11) Conversely, if x ∈ X, y∗ ∈ Y ∗ satisfy (2.10)–(2.11) then x is a solution of Pprimal, y∗ is a solution of Pdual and supPdual = inf Pprimal with that number being finite. We note here that using Proposition 2.5.4 and having in mind that F ∗∗ = F for convex lower semicontinuous functions, we can easily check that the optimality conditions (2.10)- (2.11) are equivalent to x ∈ ∂F ∗1 (Λ∗(y∗)), (2.12) Λ(x) ∈ ∂F ∗2 (−y∗). (2.13) There have been quite a few results regarding necessary conditions for the absence of 52 2.5. Convex analysis duality gap. We mention here a condition due to Attouch and Brezis [AB86], which is useful for our purposes. Theorem 2.5.6 (Attouch–Brezis). Suppose that X,Y are Banach spaces, Λ : X → Y is a bounded linear operator and F1 : X → R, F2 : Y → R are proper, convex, lower semicontinuous functionals. If the set⋃ λ≥0 λ(dom(F2)− Λ(dom(F1))), is a closed subspace of Y , then inf x∈X F1(x) + F2(Λ(x)) = max y∗∈Y ∗ −F ∗1 (Λ∗(y∗))− F ∗2 (−y∗), i.e., the dual problem Pdual has a solution and no duality gap occurs. 53 Mathematical preliminaries 54 Chapter 3 The combined TV–TV2 approach for image reconstruction 3.1 Introduction As we have already discussed in the introduction, one way to reduce the staircasing effect in total variation-based image reconstruction methods, is the incorporation of higher order derivatives in the regularisation process. Infimal convolution type of methods have been proven to be very successful towards this task with total generalised variation being the current state of the art. However, one of the disadvantages of higher order regularisation methods is the increased computational cost involved in their implementation. TGV regularisation is no exception even though sophisticated algorithms can be used for its solution. Currently, TGV minimisation problems are commonly solved with the primal dual hybrid method of Chambolle–Pock [CP11], see [Bre14]. We will show that the method we propose here can be solved even faster with the split Bregman algorithm [GO09] while still reducing the staircasing effect in image denoising and deblurring. Moreover, unlike TGV, our method can be also used for image inpainting, having the advantage of being able to interpolate images along large gaps in the inpainting domain, which is an improvement to first order total variation inpainting. In summary the model we are studying here is the following: min u∈BV2(Ω) 1 s ˆ Ω |Tu− f |sdx+ α ˆ Ω ϕ1(∇u) dx+ β ϕ2(D2u)(Ω), s = 1, 2, (3.1) where α, β are positive parameters, ϕ1, ϕ2 are two convex functions ϕ1 : R2 → R+, ϕ2 : R4 → R+ with at most linear growth at infinity and T is suitable bounded linear operator. The minimisation (3.1) is done over BV2(Ω), the space of functions of bounded Hessian, see Section 3.3 for details. For the time being, we note that for a function u ∈ BV2(Ω), the second order distributional derivative D2u, is a finite Radon measure. 55 The combined TV–TV2 approach for image reconstruction This means that problem (3.1) is posed in the framework of convex functions of measures. The motivation for that is to make sense of expressions like ´ Ω √|∇2u|2 + , where  1 and ∇2u is not necessarily an L1 function but it can also be a measure. This concept has already been considered in the first order case, e.g., in the study of the following “smoothed” total variation denoising problem [Ves01] min u∈BV(Ω) 1 2 ˆ Ω (u− f)2dx+ ˆ Ω √ |∇u|2 +  dx, (3.2) where the term ´ Ω √|∇u|2 +  dx has to be defined appropriately in order to make sense for measures. Such “rounding offs” of non-smooth regularisers like the total variation become necessary for the numerical implementation by means of time-stepping [Ves01], multigrid methods [FSHW04, Vog95, VW97] or semi-smooth Newton methods [HS06b] for instance. Well-posedness of (3.1) is proved by identifying the minimising functional as a lower semicontinuous envelope of another naturally defined functional. The underlying topology under which the lower semicontinuous envelope is considered is a suitably defined topology in BV2(Ω) which is analogous to the strict topology in BV(Ω). Uniqueness and stability results are also shown. We proceed to the numerical implementation of our method using the split Bregman algorithm. To do so, we leave the general framework of convex functions of measures by choosing ϕ1 = | · | in R2 and ϕ2 = | · | in R4. As a result, the minimisation we treat numerically is min u∈BV2(Ω) 1 s ˆ Ω |Tu− f |sdx+ αTV(u) + β TV2(u), s = 1, 2, (3.3) where TV2(u) denotes the second order total variation of u, |D2u|(Ω). Note that the case β = 0, is the classical ROF model while the case α = 0, i.e., pure second order regularisation, has been studied in [BP10]. We show that by choosing α 6= 0, β 6= 0 one can achieve better results (as these are assessed by image quality measures) by both these special cases. The idea in our approach is to weight TV with a fairly large α and in the same time keeping β small enough such that no serious blur is introduced in the image while staircasing is reduced. Even though the introduction of TV2 does not allow the preservation of edges like TGV, by keeping the value of β small, the quality of the results is still comparable with the TGV ones. Moreover, since our method do not decompose the image a´ la infimal convolution, it can solved very efficiently with the split Bregman algorithm. Thus, we end up with a very fast method which is ideal as a pre-processing step and provides a solid basis for some simple and fast further improvements, e.g., contrast adjustment and sharpening. Let us stress finally one more time that the image model we are based on is f = Tu+η, where T is a bounded, linear operator and η is random noise. This model is well suited for a 56 3.1. Introduction variety of image reconstruction tasks e.g. denoising, deconvolution, inpainting, zooming, to name a few. Even though this model and its corresponding variational problem (3.1) is not suitable for some advanced medical imaging tasks like MRI and tomography, it introduces a new regulariser, namely the weighted sum of first and second order total variation which can be exploited in future research focusing on these tasks. In that case extension of our results for complex valued images will be necessary as for instance the images that one typically obtains from an MRI machine are of that kind, as a result of the use of Fourier transform. However, as the present thesis deals with more “traditional” image processing tasks we prefer to focus on the real valued case. We only use the fast Fourier transform in Section 3.5.3 as a tool for the efficient solution of a certain minimisation problem but the data obtained in the end are real valued. 3.1.1 Organisation of the chapter In Section 3.2 we state some preliminary facts concerning convex functions of measures. The definition of a convex function of a measure is given, along with a lower semicontinuity result due to Buttazzo and Freddi. In Section 3.3 we introduce the space of functions of bounded Hessian BV2(Ω). More specifically, the definition of BV2(Ω) together with its basic properties is stated in Section 3.3.1. In Section 3.3.2 we introduce two naturally defined topologies on BV2(Ω), namely the weak∗ and the strict topology. The well-posedness of the TV–TV2 model is shown in Section 3.4. There, the min- imising functional in (3.1) is identified as the lower semicontinuous envelope of a naturally defined functional (defined in the context of convex functions instead of measures) with respect to the aforementioned topologies of BV2(Ω). Existence, uniqueness as well as stability results are shown. Section 3.5 discusses the numerical realisation of the model using the split Bregman iteration. In particular, Section 3.5.1 is concerned with the discretisation of the model. The corresponding finite dimensional norms along with the discrete differential operators are introduced. In Section 3.5.2 we recall the Bregman iteration and its use for solving constrained optimisation problems and we contribute to its convergence theory. In Section 3.5.3, we describe how the Bregman iteration can be adapted to our model and we propose a splitting strategy for it (split Bregman) as a numerical method for the solution of our problem. Applications in denoising and deblurring of the TV–TV2 approach are presented in Section 3.6. We present numerical experiments for images that have been corrupted by Gaussian noise, comparing the TV–TV2 method with TV, TV2, TGV and infimal convo- lution denoising and we do the same for image deblurring. The application of the TV–TV2 method in image inpainting is presented in detail in Section 3.7. In Section 3.7.1, we provide a motivation for the use of our method for 57 The combined TV–TV2 approach for image reconstruction inpainting tasks (large gap connectivity) and we present the concept and philosophy of the online journal Image Processing Online in whose platform we provide an online, user friendly, demonstration of TV–TV2 inpainting. The numerical realisation using the split Bregman algorithm is described in Section 3.7.2, while the extension of the algorithm to colour images is done in Section 3.7.3. Stopping criteria and optimal selection of the split Bregman parameters are proposed and numerically justified in Section 3.7.4. Finally, in Section 3.7.5 we provide some inpainting examples for both real life and synthetic images. We stress the visual improvement of the TV–TV2 method over pure TV inpainting and we discuss experimentally the role of the parameters α, β as well as the role of the geometry of the inpainting domain, as far as connectivity along large gaps is concerned. 3.2 Convex functions of measures In this section we introduce the notion of a convex function of a measure, for functions with at most linear growth, following essentially [DT84]. Let ϕ : R` → R be a continuous function, positively one homogeneous, i.e., for every x ∈ R` ϕ(tx) = tϕ(x), ∀t ≥ 0. Given a measure in µ ∈M(Ω,R`), where Ω is an open subset of Rd, we define the measure ϕ(µ) ∈M(Ω) as follows: ϕ(µ) := ϕ ( µ |µ| ) |µ|. (3.4) Notice that ϕ(µ) is well defined as µ/|µ| = 1, |µ|–a.e., and ϕ is bounded on S`−1 := {x ∈ R` : |x| = 1}. Moreover the following proposition, that was mentioned in [DT84] and we give here a proof, holds: Proposition 3.2.1. Suppose that ϕ : R` → R is a continuous function, positively one homogeneous and µ ∈ M(Ω,R`). Then for every measure ν ∈ M+(Ω) such that µ is absolutely continuous with respect to ν, we have ϕ(µ) = ϕ ( µ |ν| ) |ν|. Moreover, if ϕ is a convex function from R` to R, then ϕ :M(Ω,R`)→M(Ω) is a convex function is the space of measures as well. Proof. Since µ  ν, we also have that |µ|  ν. Using the fact that ϕ is positively one homogeneous and the fact that |µ|/ν is a positive function ν–a.e., we get ϕ(µ) = ϕ ( µ |µ| ) |µ| = ϕ ( µ |µ| ) |µ| ν ν = ϕ ( µ |µ| |µ| ν ) ν = ϕ (µ ν ) ν. 58 3.2. Convex functions of measures Assuming now that ϕ is convex and using the first part of the proposition we get for 0 ≤ λ ≤ 1 and µ, ν ∈M(Ω,R`): ϕ(λµ+ (1− λ)ν) = ϕ ( λµ+ (1− λ)ν |λµ+ (1− λ)ν| ) |λµ+ (1− λ)ν| = ϕ ( λµ+ (1− λ)ν |µ|+ |ν| ) (|µ|+ |ν|) = ϕ ( λ µ |µ|+ |ν| + (1− λ) ν |µ|+ |ν| ) (|µ|+ |ν|) ≤ λϕ ( µ |µ|+ |ν| ) (|µ|+ |ν|) + (1− λ)ϕ ( ν |µ|+ |ν| ) (|µ|+ |ν|) = λϕ ( µ |µ| ) |µ|+ (1− λ)ϕ ( ν |ν| ) |ν| = λϕ(µ) + (1− λ)ϕ(ν). Suppose now that ϕ is not necessarily positively one homogeneous but a convex func- tion ϕ : R` → R which has at most linear growth at infinity, i.e., there exists a positive constant K such that ϕ(x) ≤ K(1 + |x|), ∀x ∈ R`. In that case the recession function ϕ∞ of ϕ is well defined everywhere, where ϕ∞(x) := lim t→∞ ϕ(tx) t , ∀x ∈ R`. It can be proved in this case, see for instance [AFP00], that ϕ∞ is a convex, positively one homogeneous function. We are now ready to define convex functions of measures for functions of at most linear growth. Definition 3.2.2 (Convex function of a measure). Let ϕ : R` → R be a convex function of at most linear growth at infinity and let µ ∈ M(Ω,R`). Moreover, consider the Lebesgue decomposition of µ with respect to the Lebesgue measure Ld in Ω: µ = ( µ Ld ) Ld + µs. Then we define the measure ϕ(µ) ∈M(Ω) as follows: ϕ(µ) = ϕ ( µ Ld ) Ld + ϕ∞ ( µs |µs| ) |µs|. (3.5) It can be proved in a similar way with Proposition 3.2.1 that ϕ :M(Ω,R`)→M(Ω,R) is 59 The combined TV–TV2 approach for image reconstruction also a convex function. The following theorem, proved in [BF91] and can also be found in [AFP00] establishes the lower semicontinuity of ϕ with respect to the weak∗ convergence in M(Ω,R`). Theorem 3.2.3 (Buttazzo–Freddi, 1991). Let Ω be an open subset of Rd, ν ∈M(Ω,R`), (νn)n∈N ⊆M(Ω,R`) and let µ ∈M+(Ω), (µn)n∈N ⊆M+(Ω). Let ϕ : R` → R be a convex function with at most linear growth at infinity and suppose that νn → ν and µn → µ weakly∗ in measures. If ν = (ν/µ)µ + νs and νn = (νn/µn)µn + νsn are the Lebesgue decompositions of ν and νn with respect to µ and µn respectively, then ˆ Ω ϕ ( ν µ ) dµ+ ˆ Ω ϕ∞ ( νs |νs| ) d|νs| ≤ lim inf n→∞ ˆ Ω ϕ ( νn µn ) dµn + ˆ Ω ϕ∞ ( νsn |νsn| ) d|νsn|. In particular, if µ = µn = Ld for all n ∈ N then the above inequality can be written as ϕ(ν)(Ω) ≤ lim inf n→∞ ϕ(νn)(Ω). 3.3 The space of functions of bounded Hessian BV2(Ω) 3.3.1 Definition and basic properties In this section we recall the definition and the basic properties of the space of functions of bounded Hessian. It was firstly introduced and studied by Demengel in [Dem85]. Definition 3.3.1 (Space of functions of bounded Hessian). Let Ω ⊆ Rd be open. A function u ∈ W 1,1(Ω) is said to be a function of bounded Hessian or else u ∈ BV2(Ω) if its second order distributional derivative D2u can be represented by a Rd×d–valued finite Radon measure which is still denoted by D2u. In other words BV2(Ω) = { u ∈W 1,1(Ω) : ∇u ∈ BV(Ω,Rd) } . It can proved, in a similar way with the BV case that a function u ∈ L1(Ω) belongs to BV2(Ω) if and only if its second order total variation TV2(u) is finite, where TV2(u) = sup {ˆ Ω udiv2v dx : v ∈ C2c (Ω,Rd×d), ‖v‖∞ ≤ 1 } , and in that case we have TV2(u) = |D2u|(Ω). The space BV2(Ω) is equipped with the norm ‖u‖BV2(Ω) := ‖u‖BV(Ω) + |D2u|(Ω), which makes it a Banach space. If Ω is connected and has a Lipschitz boundary then it can be shown [Dem85] that there exist positive constants C1, C2 that depend only on Ω 60 3.3. The space of functions of bounded Hessian BV2(Ω) such that ‖∇u‖L1(Ω,Rd) ≤ C1|D2u|(Ω) + C2‖u‖L1(Ω), ∀u ∈ BV2(Ω). (3.6) It is obvious that in that case the norm ‖u‖ = ‖u‖L1(Ω) + |D2u|(Ω) is equivalent with ‖ · ‖BV2(Ω). It is immediate that W 2,1(Ω) ⊆ BV2(Ω) since if u ∈ W 2,1(Ω) then D2u = ∇2uLd, i.e., the second order distributional derivative is a function in L1(Ω,Rd×d). Generally, for a function u ∈ BV2(Ω), we denote by ∇2u, the absolutely continuous part of D2u with respect to Ld. Finally, it can also be proved, see also [Dem85] that the embedding of BV2(Ω) into W 1,1(Ω) is compact. 3.3.2 Weak∗ and strict convergence in BV2(Ω) Similarly with BV(Ω) we can define some weaker notions of convergence in BV2(Ω) namely the weak∗ and the strict convergence. We also prove a weak∗ compactness theorem in BV2(Ω) which is analogous to Theorem 2.3.5. Definition 3.3.2 (Weak∗ convergence in BV2). Let u ∈ BV2(Ω), (un)n∈N ⊆ BV2(Ω). We say that the sequence (un)n∈N converges to u weakly∗ in BV2(Ω) if (un)n∈N converges to u in L1(Ω,Rd) and (∇un)n∈N converges to ∇u weakly∗ in BV(Ω,Rd), i.e., lim n→∞ ‖un − u‖L1(Ω) = 0, limn→∞ ‖∇un −∇u‖L1(Ω,Rd) = 0, and lim n→∞ ˆ Ω v dD2un = ˆ Ω v dD2u, ∀v ∈ C0(Ω). It is not hard to check that weak∗ convergence is induced by a topology which has a basis consisting of the following sets: U(u0, F, ) = { u ∈ BV2(Ω) : ‖u0 − u‖L1(Ω) + ‖∇u0 −∇u‖L1(Ω,Rd) + ∣∣∣∣ˆ Ω vi dD 2u0 − ˆ Ω vi dD 2u ∣∣∣∣ < , i ∈ F} , where u0 ∈ BV2(Ω), F ⊆ N finite,  > 0 and vi ∈ C0(Ω) for every i ∈ F . Using now Theorem 2.3.5 we can easily prove the following compactness result: Theorem 3.3.3 (Compactness in BV2). Let Ω be an open bounded domain in Rd with Lipschitz boundary and let (un)n∈N be a sequence in BV2(Ω) such that sup n∈N ‖un‖BV2(Ω) <∞. Then there exists a subsequence (unk)k∈N that converges to some u, weakly ∗ in BV2(Ω). 61 The combined TV–TV2 approach for image reconstruction Proof. From the compact embedding of BV2(Ω) into W 1,1(Ω) and the fact that the sequence (∇un)n∈N is bounded in BV(Ω,Rd) we have that there exists a subsequence (unk)k∈N, a function u ∈ W 1,1(Ω) and a function v ∈ BV(Ω,Rd) such that (unk)k∈N con- verges to u strongly in W 1,1(Ω) and (∇unk)k∈N converges to v weakly∗ in BV(Ω,Rd). Then, it is obvious that ∇u = v a.e., u ∈ BV2(Ω) and (unk)k∈N converges to u weakly∗ in BV2(Ω). Analogously with BV(Ω) we define the strict convergence in BV2(Ω). Definition 3.3.4 (Strict convergence in BV2). Let u ∈ BV2(Ω), (un)n∈N ⊆ BV2(Ω). We say that the sequence (un)n∈N converges to u strictly in BV2(Ω) if (un)n∈N converges to u in L1(Ω) and the sequence (|D2un|(Ω))n∈N converges to |D2u|(Ω), i.e., lim n→∞ ‖un − u‖L1(Ω = 0 and limn→∞ |D 2un|(Ω) = |D2u|(Ω). It can be easily verified that the strict convergence in BV2(Ω) is induced by the fol- lowing metric: d(u, v) = ‖u− v‖L1(Ω) + ∣∣|D2u|(Ω)| − |D2v|(Ω)∣∣ , u, v ∈ BV2(Ω). The following lemma can be used to compare the weak∗ and the strict convergence in BV2(Ω). Lemma 3.3.5. Suppose that Ω ⊆ Rd is an open, bounded, connected set with Lipschitz boundary. Let u ∈ BV2(Ω), (un)n∈N ⊆ BV2(Ω) and suppose that (un)n∈N converges to u strictly in BV2(Ω). Then (un)n∈N converges to u strongly in W 1,1(Ω), i.e., lim n→∞ ‖un − u‖W 1,1(Ω) = 0. Proof. Since the sequence (un)n∈N is strictly convergent in BV2(Ω), we have that the se- quences (‖un‖L1(Ω))n∈N and (|D2un|(Ω))n∈N are bounded. Using the estimate (3.6) we deduce that the sequence (‖∇un‖L1(Ω,Rd))n∈N is bounded as well, which implies that the sequence (un)n∈N is bounded in BV2(Ω). From the compact embedding of BV2(Ω) into W 1,1(Ω), we get that there exists a subsequence (unk)k∈N and a function v ∈ W 1,1(Ω) such that (unk)k∈N converges to v strongly in W 1,1(Ω). In particular (unk)k∈N converges to v in L1(Ω), thus v = u a.e. and hence, (unk)k∈N converges to u strongly in W 1,1(Ω). However, since every subsequence (un)n∈N is bounded in BV2(Ω) we can repeat the same argument and deduce that for every subsequence (unk)k∈N there exists a further subse- quence (unkh )h∈N that converges to u strongly in W 1,1(Ω). This proves that the initial sequence (un)n∈N converges to u strongly in W 1,1(Ω). Corollary 3.3.6. Strict convergence implies weak∗ convergence in BV2(Ω). 62 3.4. Well-posedness of the model Proof. The proof is straightforward using Lemma (3.3.5) and the fact that strict conver- gence implies weak∗ convergence in BV(Ω,Rd). As in the BV case, functions in BV2(Ω) can also be approximated strictly by smooth functions. In fact a stronger result was proved in [DT84] were in addition to strict approx- imation by a sequence of smooth functions (un)n∈N, also convergence of convex functions of the second order total variations (ϕ(D2un))n∈N to ϕ(D2u) was shown. Theorem 3.3.7 (Demengel–Temam, 1984). Suppose that Ω ⊆ Rd is an open set with Lipschitz boundary and let ϕ : Rd×d → R be a convex function with at most linear growth at infinity. Then for every u ∈ BV2(Ω) there exists a sequence (un)n∈N ∈ C∞(Ω)∩W 2,1(Ω) such that (un)n∈N converges to u strictly in BV2(Ω) and lim n→∞ϕ(D 2un)(Ω) = ϕ(D 2u)(Ω). Note that for a function u ∈W 2,1(Ω) the term ϕ(D2u)(Ω) is equal to ´Ω ϕ(∇2u) dx. 3.4 Well-posedness of the model 3.4.1 Existence and uniqueness In this section we prove existence and uniqueness for the solutions of the minimisation problem (3.1). We suppose that Ω is an open, bounded, connected subset of R2 with Lipschitz boundary so that Theorems 2.3.5, 2.3.6, 3.3.3, 3.3.7 and Lemma 3.3.5 hold. We remark that the analysis is done in R2, i.e., d = 2, not only because we are planning to implement the model in two dimensional images but also because we are going to use the Poincare´ inequality ‖u− uΩ‖L2(Ω) ≤ C|Du|(Ω), ∀u ∈ BV(Ω), which is not true for d > 2. Also for convenience, we take s = 2 in (3.1) as the case s = 1 is treated similarly. We assume that T : L2(Ω)→ L2(Ω) is a bounded linear operator and f ∈ L2(Ω). We also assume that ϕ1 : R2 → R+ and ϕ2 : R4 → R+ are convex functions with at most linear growth at infinity and they both satisfy a coercivity condition, i.e., there exist constants K1, K2 > 0 such that K1|x| ≤ ϕ1(x) ≤ K2(1 + |x|), ∀x ∈ R2, (3.7) K1|x| ≤ ϕ2(x) ≤ K2(1 + |x|), ∀x ∈ R4. (3.8) 63 The combined TV–TV2 approach for image reconstruction Thus, we are interested in the following minimisation problem inf u∈BV2(Ω) H(u), (3.9) where H(u) := 1 2 ˆ Ω (Tu− f)2dx+ α ˆ Ω ϕ1(∇u) dx+ β ϕ2(D2u)(Ω). (3.10) Note that according to (3.5), H can also be written as H(u) = 1 2 ˆ Ω (Tu− f)2dx+ α ˆ Ω ϕ1(∇u) dx+ β ˆ Ω ϕ2(∇2u) (3.11) + β ˆ Ω (ϕ2)∞ ( Ds∇u |Ds∇u| ) d|Ds∇u|. Observe that BV2(Ω) is the right space in which one can try to prove existence of solutions for (3.9). Indeed, if one tries to pose the problem is a Sobolev space setting, he/she would have to minimise the following functional: F (u) = 1 2 ˆ Ω (Tu− f)2dx+ α ˆ Ω ϕ1(∇u) dx+ β ˆ Ω ϕ2(∇2u) dx, (3.12) where now ∇2u is a function in L1(Ω,R2). The natural space for the functional F to be defined in, is W 2,1(Ω). However, since this space is not reflexive we cannot prove existence of solutions for the minimisation of F using the direct method of calculus of variations. However, we will show that there is a connection between the F and H. We can naturally extend the functional F in BV2(Ω) (also denoted by F ) as follows: F (u) =  1 2 ´ Ω(Tu− f)2dx+ α ´ Ω ϕ1(∇u) dx +β ´ Ω ϕ2(∇2u) dx, if u ∈W 2,1(Ω) +∞, if u ∈ BV2(Ω) \W 2,1(Ω). (3.13) Even though F is now defined in BV2(Ω), which, as we have seen, has the useful weak∗ compactness property, one can still not prove existence of solutions of the minimisation of F using the direct method as the next proposition shows: Proposition 3.4.1. Consider the functional F defined in (3.13). Then F is not lower semicontinuous with respect to the strict topology in BV2(Ω) and hence it is neither with respect to the weak∗ topology in BV2(Ω). Proof. Note first that we can find a function u ∈ BV2(Ω) \ W 2,1(Ω), for instance, see [Dem85] for such an example. Hence, from the definition of F , we have F (u) = ∞. However, according to the Theorem 3.3.7 we can find a sequence (un)n∈N in W 2,1(Ω) that converges to u strictly in BV2(Ω). It follows that the sequences (‖un‖L1(Ω))n∈N, 64 3.4. Well-posedness of the model (‖∇2un‖L1(Ω,R4))n∈N, as well as (‖∇un‖L1(Ω,R2))n∈N (using estimate (3.6)) are bounded. Using the Poincare´ inequality we can easily check that (un)n∈N is also bounded in L2(Ω). From the fact that T is a bounded, linear operator and from conditions (3.7)–(3.8) on ϕ1 and ϕ2 we deduce that the sequence (F (un))n∈N is bounded as well. Hence, we get F (u) > lim inf n→∞ F (un), which proves that F is not lower semicontinuous with respect to the strict topology in BV2(Ω). We are going to prove that H is the lower semicontinuous envelope of F with respect to the weak∗ topology in BV2(Ω), i.e., H = sc(BV2(Ω),w∗)(F ). The proof is done in two steps. Firstly, we show that H is indeed lower semicontinuous with respect to the weak∗ topology. Theorem 3.4.2. The functional H is lower semicontinuous with respect to the weak∗ topology in BV2(Ω). Proof. Let u ∈ BV2(Ω), (un)n∈N ⊆ BV2(Ω) such that (un)n∈N converges to u weakly∗ in BV2(Ω). We have to show that H(u) ≤ lim inf n→∞ H(un). (3.14) From the weak∗ convergence we have that (un)n∈N converges to u in W 1,1(Ω). Using the standard Sobolev inequality, see for instance [Eva10], ‖v‖L2(Ω) ≤ C‖v‖W 1,1(Ω), ∀v ∈W 1,1(Ω), we deduce that (un)n∈N converges to u in L2(Ω). Since T : L2(Ω)→ L2(Ω) is continuous we get that the map u 7→ 12 ´ Ω(Tu− f)2dx is continuous and hence we have 1 2 ˆ Ω (Tun − f)2dx→ 1 2 ˆ Ω (Tu− f)2dx, as n→∞. (3.15) It is not hard to check that since ϕ1 is convex with at most linear growth at infinity, then is Lipschitz, say with constant L > 0. Using that fact we get∣∣∣∣ˆ Ω ϕ1(∇un) dx− ˆ Ω ϕ1(∇u) dx ∣∣∣∣ ≤ ˆ Ω |ϕ1(∇un)− ϕ1(∇u)| dx ≤ L ˆ Ω |∇un −∇u| dx. (3.16) 65 The combined TV–TV2 approach for image reconstruction From the estimate (3.16) and the fact that (∇un)n∈N converges to ∇u in L1(Ω,R2) we eventually get that ˆ Ω ϕ1(∇un) dx→ ˆ Ω ϕ1(∇u) dx, as n→∞. (3.17) Finally from the weak∗ convergence we have that (D2un)n∈N converges to D2u weakly∗ in M(Ω,R2×2). We can then apply Theorem 3.2.3 for µn = µ = L2, ν = D2u, νn = D2un and get ϕ2(D 2u)(Ω) ≤ lim inf n→∞ ϕ2(D 2un)(Ω). (3.18) Combining (3.15), (3.17) and (3.18) we derive (3.14). Theorem 3.4.3. The functional H is the lower semicontinuous envelope of F with respect to the weak∗ topology in BV2(Ω). Proof. We are going to use the characterisation of lower semicontinuous envelopes as that was given in Proposition 2.4.4. Condition (i) of Proposition 2.4.4 follows from the lower semicontinuity of H and the fact that H ≤ F . Thus it suffices to prove condition (ii), i.e., to prove that for every u ∈ BV2(Ω), there exists a sequence (un)n∈N ∈ BV2(Ω) converging to u weakly∗ and H(u) = lim n→∞F (un). (3.19) From Theorem 3.3.7 we get the existence of a sequence (un)n∈N ∈ C∞(Ω) ∩W 2,1(Ω) such that (un)n∈N converges strictly, and thus also weakly∗ in BV2(Ω). Moreover, from the same theorem we also get that lim n→∞ϕ2(D 2un)(Ω) = ϕ2(D 2u)(Ω). (3.20) We also have that (un)n∈N converges to u in W 1,1(Ω) which, according to the proof of Theorem 3.4.2 implies that 1 2 ˆ Ω (Tu− f)2dx+α ˆ Ω ϕ1(∇un) dx→ 1 2 ˆ Ω (Tu− f)2dx+α ˆ Ω ϕ1(∇un) dx, as n→∞. (3.21) Combining (3.20), (3.21) and that fact that H = F on C∞(Ω)∩W 2,1(Ω) we get (3.19). Observe that since strict convergence induces weak∗ convergence, we get that H is also the lower semicontinuous envelope of F with respect to the strict convergence in BV2(Ω), i.e., H = sc(BV2(Ω),w∗)(F ) = sc(BV2(Ω),strict)(F ). Let us note here that Theorem 3.4.3 can also be proved by incorporating a general relaxation result in [AC94], where the authors identify lower semicontinuous envelopes of functionals of the type ´ Ω ϕ(∇ku) dx in higher order BV spaces. 66 3.4. Well-posedness of the model We are now ready to prove existence of solutions for the minimisation problem (3.9), where we essentially use the lower semicontinuity of H, combined with the weak∗ compact- ness in BV2(Ω). The proof of the following theorem follows the proof of the corresponding theorem in [Ves01] which deals with the minimisation of the analogue first order functional (β = 0). Theorem 3.4.4. In addition to the already mentioned conditions on Ω, T , ϕ1 and ϕ2, assume also that α > 0, β > 0 and T (XΩ) 6= 0. Then the minimisation problem minu∈BV2(Ω)H(u), i.e., min u∈BV2(Ω) 1 2 ˆ Ω (Tu− f)2dx+ α ˆ Ω ϕ1(∇u) dx+ β ϕ2(D2u)(Ω), (3.22) has a solution u? ∈ BV2(Ω). Proof. Let (un)n∈N be a minimising sequence for (3.22), i.e., limn→∞H(un) = inf H and let M > 0 be an upper bound for (H(un))n∈N. We have that sup n∈N ϕ2(D 2un)(Ω) < M, sup n∈N ˆ Ω ϕ1(∇un) dx < M and sup n∈N 1 2 ˆ Ω (Tun − f)2dx < M. (3.23) The estimates (3.23) together with the coercivity assumptions (3.7)–(3.8) give that sup n∈N |Dun|(Ω) = sup n∈N ˆ Ω |∇un| dx < M K1 , (3.24) sup n∈N |D2un|(Ω) < M K1 . (3.25) We now show that the sequence (un)n∈N is bounded in L2(Ω), following essentially [Ves01]. By the Poincare´ inequality, there exists a constant C > 0 such that for every n ∈ N ‖un‖L2(Ω) = ∥∥∥∥un −XΩ 1L2(Ω) ˆ Ω un dx+ XΩ 1L2(Ω) ˆ Ω un dx ∥∥∥∥ L2(Ω) ≤ C|Dun|(Ω) + 1L2(Ω) ∣∣∣∣ˆ Ω un dx ∣∣∣∣ ≤ CM K1 + 1 L2(Ω) ∣∣∣∣ˆ Ω un dx ∣∣∣∣ . Thus in order to bound (un)n∈N in L2(Ω) it suffices to bound | ´ Ω un dx| uniformly in n. We have for every n ∈ N∥∥∥∥T (XΩ 1L2(Ω) ˆ Ω un dx )∥∥∥∥ L2(Ω) ≤ ∥∥∥∥T (XΩ 1L2(Ω) ˆ Ω un dx ) − Tun ∥∥∥∥ L2(Ω) + ‖Tun − f‖L2(Ω) + ‖f‖L2(Ω) 67 The combined TV–TV2 approach for image reconstruction ≤ ‖T‖ ∥∥∥∥un −XΩ 1L2(Ω) ˆ Ω un dx ∥∥∥∥ L2(Ω) + ‖Tun − f‖L2(Ω) + ‖f‖L2(Ω) ≤ C‖T‖|Dun|(Ω) + √ 2M + ‖f‖L2(Ω) ≤ CM K1 ‖T‖+ √ 2M + ‖f‖L2(Ω). (3.26) Thus, setting K = CMK1 ‖T‖ + √ 2M + ‖f‖L2(Ω) and using the fact that T (XΩ) 6= 0, it follows from (3.26) ∣∣∣∣ˆ Ω un dx ∣∣∣∣ ‖T (XΩ)‖L2(Ω) ≤ KL2(Ω), and since T (XΩ) 6= 0 we get ∣∣∣∣ˆ Ω un dx ∣∣∣∣ ≤ KL2(Ω)‖T (XΩ)‖L2(Ω) . Since the sequence (un)n∈N is bounded in L2(Ω) and Ω is a bounded domain, we have that it is also bounded in L1(Ω), and from estimates (3.24)–(3.25) we deduce that it is bounded in BV2(Ω) as well. From Theorem 3.3.3 we obtain the existence of a subsequence (uk)k∈N that converges weakly∗ to a function u? ∈ BV2(Ω). Since the functional H is weakly∗ lower semicontinuous we have that inf F ≤ H(u?) ≤ lim inf n→∞ H(un) = inf F, which implies that H(u?) = min u∈BV2(Ω) H(u). Let us note here that in the above proof we used the fact that α > 0 in order to obtain an L1(Ω) bound in the gradient of the minimising sequence (un)n∈N that was further used to bound (un)n∈N in L2(Ω) (for the case β = 0 see [Ves01]). However even in the case α = 0, we can also bound (un)n∈N in L2(Ω), and hence in BV2(Ω) using (3.6), if T satisfies a condition of the type M‖u‖L2(Ω) ≤ ‖Tu‖L2(Ω), ∀u ∈ L2(Ω), (3.27) for a constant M > 0. For example (3.27) is true if T has a bounded inverse by setting M = 1/‖T−1‖. If T does not satisfy a condition of the type (3.27) then it is not clear how to get existence. However, we are still able to prove existence when α = 0, for operators T that correspond to image inpainting. i.e., projections on subsets D of Ω. The proof is more involved than the one of Theorem 3.4.4 and makes use of theorems on BV traces. 68 3.4. Well-posedness of the model Ω \D D Figure 3.1: Inpainting domain D. Theorem 3.4.5. If β > 0, then the minimisation problem inf u∈BV2(Ω) ˆ Ω\D (u− f)2dx+ β ϕ2(D2u), (3.28) has a solution, where D b Ω is an open, connected set with Lipschitz boundary, see also Figure 3.1. Proof. Consider a minimising sequence (un)n∈N for (3.28). From the coercivity assump- tions (3.8) we get that sup n∈N |D2un|(Ω) <∞. (3.29) In order to prove the theorem it suffices to bound (un)n∈N and (∇un)n∈N in L1(Ω) and L1(Ω,R2) respectively. Then the proof can be finished using a straightforward application of the direct method as was done in the proof of Theorem 3.4.4. Since (un)n∈N is a minimising sequence for (3.28) and Ω is bounded we have that sup n∈N ‖un‖L1(Ω\D) <∞. (3.30) Estimates (3.29), (3.30) in combination with (3.6) give that sup n∈N |Dun|(Ω \D) <∞. (3.31) Denote with u (1) n and u (2) n the restrictions of un in Ω\D and D respectively. Then according to Theorems 2.3.7 and 2.3.9 we have for every n ∈ N |Dun|(Ω) = |Dun|(Ω \D) + |Dun|(D) + ‖(u(1)n )Ω\D − (u(2)n )D‖L1(∂D;H). (3.32) Since un ∈W 1,1(Ω) we have for every n ∈ N |Dun|(Ω) = ˆ Ω |∇u| dx = ˆ Ω\D |∇u| dx+ ˆ D |∇u| dx = |Dun|(Ω \D) + |Dun|(D). (3.33) From (3.32) and (3.33) we deduce that ‖(u(1)n )Ω\D‖L1(∂D;H) = ‖(u(2)n )D‖L1(∂D;H). (3.34) 69 The combined TV–TV2 approach for image reconstruction From the estimate (2.4) of Theorem 2.3.7 and from (3.30), (3.31) and (3.34) we get that sup n∈N ‖(u(2)n )D‖L1(∂D;H) <∞. (3.35) Suppose now that supn∈N ‖un‖L1(D) =∞. Then from the estimate (2.7) and (3.35), that would mean that sup n∈N |Dun|(D) =∞, (3.36) i.e, there exists an i0 ∈ {1, . . . , d} such that sup n∈N ∥∥∥∥ ∂un∂xi0 ∥∥∥∥ L1(D) =∞, (3.37) but this is a contradiction. In order to see that, observe first that again from Theorems 2.3.7 and 2.3.9 we get for every n ∈ N |D2un|(Ω) = |D2un|(Ω \D) + |D2un|(D) + ‖(∇u(1)n )Ω\D − (∇u(2)n )D‖L1(∂D,Rd;H). (3.38) From (3.29), (3.31) and the estimate (2.4) we get that sup n∈N ‖(∇u(1)n )Ω\D‖L1(∂D,Rd;H) <∞, (3.39) which in combination with (3.29) and (3.38) gives sup n∈N ‖(∇u(2)n )D‖L1(∂D,Rd;H) <∞⇒ sup n∈N ∥∥∥∥∥∥ ( ∂u (2) n ∂xi0 )D∥∥∥∥∥∥ L1(∂D;H) <∞. (3.40) But again estimate (2.7) gives ∥∥∥∥ ∂un∂xi0 ∥∥∥∥ L1(D) ≤ C ∣∣∣∣D( ∂un∂xi0 )∣∣∣∣ (D) + ∥∥∥∥∥∥ ( ∂u (2) n ∂xi0 )D∥∥∥∥∥∥ L1(∂D;H)  . (3.41) From (3.37), (3.40) and (3.41) we deduce sup n∈N ∣∣∣∣D( ∂un∂xi0 )∣∣∣∣ (D) =∞, which cannot happen due to (3.29). Observe that the above proof works also in the case where the inpainting domain consists of finitely many open connected sets (Di)i=1,...,N b Ω with Lipschitz boundaries such that dist(Di, Dj) > 0 for every i 6= j. 70 3.4. Well-posedness of the model We now prove uniqueness of solutions for (3.9). The following proof also follows the proof of the corresponding theorem for the first order analogue in [Ves01]. Theorem 3.4.6. If, in addition to T (XΩ), T is injective or if ϕ1 is strictly convex, then the solution of the minimisation problem (3.9) is unique. Proof. Suppose that u1 and u2 are two minimisers for (3.9). Using Proposition 3.2.1, it can be easily checked that H is convex. If Tu1 6= Tu2 then from the strict convexity of the fidelity term in H we have H ( 1 2 u1 + 1 2 u2 ) < 1 2 H(u1) + 1 2 H(u2) = inf H, which is a contradiction. Thus we must have Tu1 = Tu2. If T is injective, we have u1 = u2. If T is not injective but ϕ1 is strictly convex, we must have ∇u1 = ∇u2 otherwise we get the same contradiction as before. In that case, since Ω is connected, there exists a constant c such that u1 = u2 + cXΩ and since T (XΩ) 6= 0, we get c = 0. We note that in general we cannot obtain uniqueness results when the L1 norm is used in the fidelity term, due to the lack of strict convexity. 3.4.2 Stability In order to complete the well-posedness picture for the problem min u∈BV2(Ω) 1 2 ˆ Ω (Tu− f)2dx+ α ˆ Ω ϕ1(∇u) dx+ β ϕ2(D2u)(Ω), (3.42) it remains to analyse its stability. More precisely, we want to know which effect deviations in the data f have on a corresponding minimiser of (3.42). Ideally the deviation in the minimisers for different input data should be bounded by the deviation of the data. Let Ψ be the regularising functional in (3.42), i.e., Ψ(u) = α ˆ Ω ϕ1(∇u) dx+ β ϕ2(D2u)(Ω). It has been demonstrated by many authors [BO04, BRH07, Po¨s08, BB11] that Bregman distances related to the regularisation functional Ψ are natural error measures for vari- ational regularisation methods with Ψ convex. In particular Po¨schl [Po¨s08] has derived estimates for variational regularisation methods for power of metrics, which apply to the functional we consider here. However, for demonstration issues and in order to make the constants in the estimates more explicit we state and prove here the result for a special case of (3.42). For what we are going to do we assume that one of the regularisers is differentiable. Without use of generality, we assume that the convex function φ1 is differentiable. Analo- 71 The combined TV–TV2 approach for image reconstruction gous analysis can be done if φ2 is differentiable or even under weaker continuity solutions, see [BL11]. Let u˜ be the original image and f˜ the exact data (without noise), i.e., f˜ is a solution of T u˜ = f˜ . We assume that the noisy data deviate from the exact data by ‖f˜ − f‖L2(Ω) ≤ δ, for a small δ > 0. For the original image u˜ we assume that the following condition, called source condition, holds There exists a ξ ∈ ∂Ψ(u˜) such that ξ = T ∗q for a source element q ∈ L2(Ω), (SC) where T ∗ is the adjoint of T . As it was shown in [BO04] for the general setting, the elements that satisfy SC are exactly the minimisers of (3.42) for arbitrary f , α and β. Since φ1 is differentiable and both φ1 and φ2 are convex, the subdifferential of Ψ can be written as a sum of subdifferentials [ET76] ∂Ψ(u) = α∂ (ˆ Ω φ1(∇(·))dx ) (u) + β ∂ ( φ2(D 2(·))(Ω)) (u) = −α div(φ′1(∇u)) + β ∂ ( φ2(D 2(·))(Ω)) (u). We also define the symmetric Bregman distance for the regularising functional Ψ as DsymmΨ (u1, u2) := 〈p1 − p2, u1 − u2〉, p1 ∈ ∂Ψ(u1), p2 ∈ ∂Ψ(u2). Theorem 3.4.7. Let u˜ be the original image with source condition SC satisfying T u˜ = f˜ . Let f ∈ L2(Ω) be the noisy data with ‖f˜ − f‖L2(Ω) ≤ δ. Then a minimiser u of (3.42) satisfies αDsymm´ Ω φ1(∇(·)) (u, u˜) + βDsymm φ2(D2(·))(Ω)(u, u˜) + 1 2 ‖Tu− f˜‖L2(Ω) ≤ ‖q‖2L2(Ω) + δ2, (3.43) where q is the source element in SC. Moreover if u1 and u2 are minimisers for (3.42) with data f1 and f2 then we have the following estimate αDsymm´ Ω φ1(∇·) (u1, u2)+βD symm φ2(D2(·))(Ω)(u1, u2)+ 1 2 ‖Tu1−Tu2‖L2(Ω) ≤ 1 2 ‖f1−f2‖2L2(Ω). (3.44) Proof. The optimality condition for (3.42) reads αp1 + βp2 + T ∗(Tu− f) = 0, where p1 = −div(φ′1(∇u)), p2 ∈ ∂ ( φ2(D 2(·))(Ω)) (u) (3.45) Adding the element ξ = T ∗q from SC and T ∗f˜ to (3.45), we get αp1 + βp2 − ξ + T ∗(Tu− f˜) = T ∗((f − f˜)− q). (3.46) Note now that ξ = αξ1 + βξ2 for some ξ1 and ξ2 that belong to the subdifferential of´ Ω φ1(∇·) and φ2(D2(·))(Ω) respectively. Using that fact and taking the duality product 72 3.5. The numerical implementation of (3.46) with u− u˜ we get α〈p1 − x1, u− u˜〉+ β〈p2 − ξ2, u− u˜〉+ ‖Tu− f˜‖L2 = 〈(f − f˜)− q, Tu− f˜〉. (3.47) where we used T u˜ = f˜ . By Young’s inequality, we eventually get αDsymm´ Ω φ1(∇(·)) (u, u˜)+βDsymm φ2(D2(·))(Ω)(u, u˜)+ 1 2 ‖Tu−f˜‖L2(Ω) ≤ ‖q‖2L2(Ω)+‖f˜−f‖2L2(Ω), (3.48) and in view of ‖f˜ − f‖L2(Ω) ≤ δ we get (3.43). Similarly, if u1 and u2 are minimisers for (3.42) with data f1 and f2 we can derive estimate (3.44) by subtracting the optimality conditions for u1 and u2 and applying Young’s inequality once again. 3.5 The numerical implementation It is a common practice in mathematical imaging and inverse problems in general, to introduce a model in the continuous set up and after analysing it, to discretise functions and operators in order to implement it in a computer. Here we make no exceptions. In order to be completely rigorous, the link between the continuous and the discretised model should be provided. Such a link is normally given in terms of Γ-convergence [Bra02, DM93] of the discrete functional to the continuous one as the resolution of the image tends to infinity. However, as this is not our purpose here we leave this issue for future research. We use standard discretisations for the norms, differential operators and the operator T . The discretisation of the latter is the obvious one as we only consider T to be the identity function (denoising), a convolution operator (deblurring) or a projection operator (inpainting). However, let us just mention here that in some advanced medical imaging tasks e.g. MRI imaging, the way one discretises the problem is not trivial and it is crucial for the quality of the reconstruction results. In this section we work with the discretised version of the problem (3.1) and we discuss its numerical realisation by the so-called split Bregman technique [GO09]. We start by defining the discrete versions of L1 and L2 norms and we also introduce the appropriate discrete differential operators. We proceed with an introduction to the Bregman iteration which is used to solve constrained optimisation problems, an idea originated in [OBG+]. In [GO09] the Bregman iteration and an operator splitting technique (split Bregman) was used to solve the total variation minimisation problem after reformulating the problem as a constrained one. In the latter paper it was also also proved that the iterates of the Bregman iteration converge to the solution of the constrained problem assuming that the iterates satisfy the constraint in a finite number of iterations. Here, we give a more general convergence result where we do not use that assumption, see Theorem 3.5.1. We proceed with describing how our problem can be solved with the Bregman iteration, using the splitting procedure mentioned above. We finish by applying our method to image denoising 73 The combined TV–TV2 approach for image reconstruction and deblurring comparing it with TGV, TV2 and infimal convolution minimisation, with respect both to reconstruction quality and computational time. An extensive study of the application of our method to image inpainting is presented in Section 3.7. 3.5.1 Discretisation of the model In this section we study the discretisation of the problem (3.1). In our numerical examples we set ϕ1 = | · | and ϕ2 = | · |, the Euclidean norms in R2 and R4 respectively. That is to say we discretise the following problem min u∈BV2(Ω) 1 s ˆ Ω |Tu− f |sdx+ αTV(u) + β TV2(u), s = 1, 2. (3.49) In order to do so, we specify the corresponding discrete operators and norms that appear in the continuous functional. In the discrete setting u and f are elements of RN×M where N × M is the resolution of the image and T : RN×M → RN×M , is a linear operator. Here, we consider only greyscale images and we describe an extension to colour images for the application of (3.49) to image inpainting in Section 3.7.3. We define the finite dimensional L1 and L2 norms of a discrete vector field U : {1, . . . , N}×{1, . . . ,M} → RK with U = (U1, . . . , UK) as follows: ‖U‖2 =  N∑ i=1 M∑ j=1 K∑ k=1 Uk(i, j) 2 1/2 , ‖U‖1 = N∑ i=1 M∑ j=1 ( K∑ k=1 Uk(i, j) 2 )1/2 . We also define the discrete first and second order differential operators Dx, Dy, Dxx, Dyy and Dxy which are all operators from RN×M to RN×M . We note here that we assume periodic boundary conditions for u. By choosing periodic boundary conditions, the action of each of the discrete differential operators can be regarded as a circular convolution of u and allows the use of Fast Fourier Transform (FFT), see also [WT10]. Thus, even though the use of periodic boundary conditions is not necessary, it has been chosen in order to speed up the computational performance of our implementation, especially as far as deblurring and inpainting are concerned. In our discrete implementation, the coordinates x and y are oriented along columns and rows respectively. We define the discrete gradient ∇u = (Dxu,Dyu) as a forward finite difference operator Dxu(i, j) = u(i, j + 1)− u(i, j) if 1 ≤ i ≤ N, 1 ≤ j < M,u(i, 1)− u(i, j) if 1 ≤ i ≤ N, j = M, 74 3.5. The numerical implementation Dyu(i, j) = u(i+ 1, j)− u(i, j) if 1 ≤ i < N, 1 ≤ j ≤M,u(1, j)− u(i, j) if i = N, 1 ≤ j ≤M. We also need the discrete divergence operator div : ( RN×M )2 → RN×M that has the adjointness property −divp · u = p · ∇u, ∀u ∈ RN×M , p ∈ (RN×M)2 . This property is essentially the discrete analogue of integration by parts. For a p = (p1, p2) ∈ ( RN×M )2 we define (divp)(i, j) = ←− Dxp1(i, j) + ←− Dyp2(i, j), where ←− Dx and ←− Dy are the following backward finite difference operators: ←− Dxu(i, j) = u(i, j)− u(i,M) if 1 ≤ i ≤ N, j = 1,u(i, j)− u(i, j − 1) if 1 ≤ i ≤ N, 1 < j ≤M, ←− Dyu(i, j) = u(i, j)− u(N, j) if i = 1, 1 ≤ j ≤M,u(i, j)− u(i− 1, j) if 1 < i ≤ N, 1 ≤ j ≤M. Analogously we define the discrete Hessian ∇2u = (Dxxu,Dyyu,Dxyu,Dyxu). This consists of the corresponding compositions of the first order operators, Dxxu(i, j) =  u(i,m)− 2u(i, j) + u(i, j + 1) if 1 ≤ i ≤ N, j = 1, u(i, j − 1)− 2u(i, j) + u(i, j + 1) if 1 ≤ i ≤ N, 1 < j < M, u(i, j − 1)− 2u(i, j) + u(i, 1) if 1 ≤ i ≤ n, j = M, Dyyu(i, j) =  u(n, j)− 2u(i, j) + u(i+ 1, j) if i = 1, 1 ≤ j ≤M, u(i− 1, j)− 2u(i, j) + u(i+ 1, j) if 1 < i < N, 1 ≤ j ≤M, u(i− 1, j)− 2u(i, j) + u(1, i) if i = N, 1 ≤ j ≤M, Dxyu(i, j) =  u(i, j)− u(i+ 1, j)− u(i, j + 1) + u(i+ 1, j + 1) if 1 ≤ i < N, 1 ≤ j < M, u(i, j)− u(1, j)− u(i, j + 1) + u(1, j + 1) if i = N, 1 ≤ j < M, u(i, j)− u(i+ 1, j)− u(i, 1) + u(i+ 1, 1) if 1 ≤ i < N, j = M, u(i, j)− u(1, j)− u(i, 1) + u(1, 1) if i = N, j = M. 75 The combined TV–TV2 approach for image reconstruction One can easily check that Dxy = Dx(Dy) = Dy(Dx) = Dyx. As before, we also need the discrete second order divergence operator div2 : ( RN×M )4 → RN×M with the adjointness property div2q · u = q · ∇2u, ∀u ∈ Rn×m, q ∈ (RN×M)4 . For a q = (q11, q22, q12, q21) ∈ ( RN×M )4 we define (div2q)(i, j) = ←−− Dxxq11(i, j) + ←−− Dyyq22(i, j) + ←−− Dxyq12(i, j) + ←−− Dyxq21(i, j), where ←−− Dxx = Dxx, ←−− Dyy = Dyy, ←−− Dxy = ←−− Dyx with ←−− Dxyu(i, j) =  u(i, j)− u(i,M)− u(N, i) + u(N,M) if i = 1, j = 1, u(i, j)− u(i, j − 1)− u(N, j) + u(N, j − 1) if i = 1, 1 < j ≤M, u(i, j)− u(i− 1, j)− u(i,M) + u(i− 1,M) if 1 < i ≤M j = 1, u(i, j)− u(i, j − 1)− u(i− 1, j) + u(i− 1, j − 1) if 1 < i ≤ N, 1 < j ≤M. Figure 3.2 describes schematically the action of all the above discrete differential operators. According to all the definitions above the discretised version of problem (3.49) can be − + Dx i, j + 1i, j − + i, ji, j − 1 ←− Dx −2+ + i, ji, j − 1 i, j + 1 Dxx = ←−− Dxx + − + −i, j i+ 1, j i, j i− 1, j Dy ←− Dy Dyy = ←−− Dyy −2 + + i, j i− 1, j i+ 1, j + + − − + + − − i, j i, j + 1 i− 1, j − 1 i− 1, j i+ 1, j + 1i+ 1, j i, ji, j − 1 Dxy ←−− Dxy Figure 3.2: Illustration of the discrete derivative approximations. written as min u∈RN×M 1 s ‖Tu− f‖ss + α‖∇u‖1 + β‖∇2u‖1, s = 1, 2. (3.50) 76 3.5. The numerical implementation 3.5.2 Bregman iteration We now introduce the Bregman and split Bregman iterations as suitable numerical meth- ods for the solution of (3.50). We would like to recall some basic aspects of the general theory of Bregman iteration before finishing this discussion with a novel convergence result in Theorem 3.5.1. Suppose we want to solve the following constrained minimisation problem min u∈RL E(u), such that Au = b, (3.51) where E is a convex function and A : RL → RD a linear map. We can transform the con- strained minimisation problem (3.51) into an unconstrained one, introducing a parameter λ: sup λ min u∈RL E(u) + λ 2 ‖Au− b‖22, (3.52) where in order to satisfy the constraint Au = b, one has to let λ increase to infinity. Instead of doing that we perform the Bregman iteration as it was proposed in [OBG+] and [YOGD08]: Bregman Iteration un+1 = argmin u∈RL E(u) + λ 2 ‖Au− bn‖22, (3.53) bn+1 = bn + b−Aun+1. (3.54) In [OBG+], assuming that (3.53) has a unique solution, the authors derived among others, the following facts about the iterates un: (i) The constraint is satisfied at infinity: ‖Aun − b‖22 < C1 n− 1 , C1 > 0, ∀n ≥ 2. (3.55) (ii) Monotonic decrease of the residuals: ‖Aun+1 − b‖22 ≤ ‖Aun − b‖22, ∀n ∈ N. (3.56) (iii) Summability of the residuals: ∞∑ n=0 ‖Aun − b‖22 <∞, (3.57) (iv) Uniform bound on E(un): E(un) < C2, C2 > 0, ∀n ∈ N. (3.58) Using these results, we are able to prove convergence of the Bregman iterates to a solution of the original constrained problem (3.51). This result was proved in [GO09] in the case when the constraint is satisfied in a finite number of iterations, i.e., Aun0 = b for some iterate un0 . Here we do not use that assumption, something that makes our result a 77 The combined TV–TV2 approach for image reconstruction genuine contribution to the convergence theory of Bregman iteration. Theorem 3.5.1. Suppose that the constrained minimisation problem (3.51) has a unique solution u?. Moreover suppose that the convex function E is coercive and that (3.53) has a unique solution for every n ∈ N. Then the sequence of the iterates (un)n∈N of Bregman iteration converges to u?. Proof. Since (3.53) has a unique solution for every n ∈ N we have that the statements (3.55)–(3.58) hold. Moreover we have for every n ≥ 1 bn = b0 + n∑ `=1 (b−Au`)⇒ ‖bn‖2 ≤ ‖b0‖2 + n∑ `=1 ‖b−Au`‖2. (3.59) From (3.58) and the coercivity of E we have that the sequence (‖un‖2)n∈N is bounded, say by a constant C > 0. Thus, it suffices to show that every accumulation point of (un)n∈N is equal to u?. Using the fact that Au? = b, together with (3.55), we have for every increasing sequence of naturals (nk)k∈N, k > 2, λ 2 ‖Au? − bnk−1‖22 ≤ λ 2 (‖Au? −Aunk‖2 + ‖Aunk − bnk−1‖2)2 = λ 2 (‖Aunk − b‖2 + ‖Aunk − bnk−1‖2)2 ≤ λC1 2(nk − 1) + λ‖Aunk − b‖2‖Aunk − bnk−1‖2 + λ 2 ‖Aunk − bnk−1‖22. (3.60) Since unk is a solution to (3.53) we have E(unk) + λ 2 ‖Aunk − bnk−1‖22 ≤ E(u?) + λ 2 ‖Au? − bnk−1‖22, ∀k ≥ 1. (3.61) Combining (3.60) and (3.61) with (3.55) and (3.59) we get for all k ≥ 2, E(unk) ≤ E(u?) + λC1 2(nk − 1) + λ‖Aunk − b‖2‖Aunk − bnk−1‖2 ≤ E(u?) + λC1 2(nk − 1) + λ‖Aunk − b‖2‖Aunk‖2 + λ‖Aunk − b‖2‖bnk−1‖2 ≤ E(u?) + λC1 2(nk − 1) + λ √ C1‖A‖‖unk‖2√ nk − 1 + λ‖Aunk − b‖2‖bnk−1‖2 ≤ E(u?) + λC1 2(nk − 1) + λ √ C1‖A‖C√ nk − 1 + λ‖Aunk − b‖2 ( ‖b0‖2 + nk∑ `=1 ‖Au` − b‖2 ) ≤ E(u?) + λC1 2(nk − 1) + λ √ C1(‖A‖C + ‖b0‖2)√ nk − 1 + λ‖Aunk − b‖2 nk∑ `=1 ‖Au` − b‖2. (3.62) 78 3.5. The numerical implementation Suppose now that (unk)k∈N converges to some u˜ as ` goes to infinity. Taking (3.55) into account, we have that ‖Au˜− b‖2 = 0, i.e., u˜ satisfies the constraint Au˜ = b. On the other hand, taking limits in (3.62) and using Kronecker’s Lemma, see Lemma A.2.1 in Appendix A, we have that the limit in the right hand side of (3.62) is E(u?). Thus we have E(u˜) ≤ E(u?), which means that u˜ is a solution of the constrained problem (3.51) and since the solution is unique then u˜ = u?. Since every accumulation point of the bounded sequence (un)n∈N is equal to u?, we conclude that the whole sequence converges to u?. 3.5.3 Numerical solution via the split Bregman algorithm In this section, we explain how the Bregman iteration (3.53)–(3.54) together with a split- ting technique can be used to implement numerically the minimisation of the functional (3.50). The idea originates from [GO09], where such a procedure was applied to total variation minimisation and was given the name split Bregman algorithm. This iterative technique is equivalent to certain instances of combinations of the augmented Lagrangian method with classical operator splitting such as Douglas–Rachford, see [Set09]. Recall that we want to solve the following unconstrained minimisation problem: min u∈RN×M 1 2 ‖Tu− f‖22 + α‖∇u‖1 + β‖∇2u‖1, (3.63) where here for simplicity, we consider only the L2 fidelity case. The derivation of the split Bregman algorithm for solving (3.63) starts with the observation that the above minimisation problem is equivalent to the following constrained minimisation problem min u∈RN×M v∈(RN×M)2 w∈(RN×M)4 1 2 ‖Tu− f‖22 + α‖v‖1 + β‖w‖1, such that v = ∇u, w = ∇2u. (3.64) It is clear that since the discrete gradient and Hessian are linear operations, the minimi- sation problem (3.64) can be reformulated into the more general problem min z∈RL E(z), such that Az = b, where E : RL → R+ is convex, A is a L × L matrix and b is a vector of length L, where L = 7NM . It is also easy to see that the iterative scheme of the type (3.53)–(3.54) that corresponds to the constrained minimisation problem (3.64) is: (un+1, vn+1, wn+1) = argmin u,v,w 1 2 ‖Tu− f‖22 + α‖v‖1 + β‖w‖1 + λ 2 ‖bn1 +∇u− v‖22 (3.65) 79 The combined TV–TV2 approach for image reconstruction + λ 2 ‖bn2 +∇2u− w‖22, bn+11 = b n 1 +∇un+1 − vn+1, (3.66) bn+12 = b n 2 +∇2un+1 − wn+1, (3.67) where bn+11 = (b n+1 1,1 , b n+1 1,2 ) ∈ ( RN×M )2 and bn+12 = (b n+1 2,11 , b n+1 2,22 , b n+1 2,12 , b n+1 2,21 ) ∈ ( RN×M )4 . Notice that at least in the case where T is injective (e.g. denoising, deblurring), the minimisation problem (3.65) has a unique solution. Moreover, in that case, the functional E is coercive and the constrained minimisation problem (3.64) has a unique solution, thus, Theorem 3.5.1 holds. We also note that one can consider having two parameters λ1 and λ2 for the first and second order term in (3.65) respectively, as it is easily checked that this does not affect the convergence of the Bregman iteration. Our next concern is the efficient numerical solution of the minimisation problem (3.65). We follow [GO09] and minimise with respect to u, v and w alternatingly: Split Bregman for L2 −TV −TV2 denoising, deblurring Subproblem 1: un+1 = argmin u∈RN×M 1 2 ‖Tu− f‖22 + λ1 2 ‖bn1 +∇u− vn‖22 (3.68) + λ2 2 ‖bn2 +∇2u− wn‖22, Subproblem 2: vn+1 = argmin v∈(RN×M )2 α‖v‖1 + λ1 2 ‖bn1 +∇un+1 − v‖22, (3.69) Subproblem 3: wn+1 = argmin w∈(RN×M )4 β‖w‖1 + λ2 2 ‖bn2 +∇2un+1 − w‖22, (3.70) Update on b1: b n+1 1 = b n 1 +∇un+1 − vn+1, (3.71) Update on b2: b n+1 2 = b n 2 +∇2un+1 − wn+1. (3.72) We note that the above splitting strategy is tuned for image denoising and deblurring. As far as image inpainting is concerned an alternative splitting is required for efficient implementation. We discuss that in Section 3.7.2. The above alternating minimisation scheme, make up the split Bregman iteration that is proposed in [GO09] to solve the total variation minimisation problem as well as problems related to compress sensing. For convergence properties of the split Bregman iteration and also other splitting techniques we refer the reader to [CW06, EZC10, Set09]. In [WT10] and [YOGD08], it is noted that the Bregman iteration coincides with the augmented Lagrangian method. Minimising 80 3.5. The numerical implementation alternatingly with respect to the variables in the augmented Lagrangian method results to the alternating direction method of multipliers (ADMM), see [Gab83]. Thus, split Bregman is equivalent to ADMM. In [Eck89] and [Gab83] it is shown that ADMM is equivalent to the Douglas–Rachford splitting algorithm whose convergence is guaranteed. We refer the reader to [Set09] for an interesting study in this subject. We now discuss how we solve each of the minimisation problems (3.68)–(3.70) for the case of denoising and deblurring. Thus, we assume that T is either the identity or represents a circular convolution. We denote by F the two dimensional discrete Fourier transform. Subproblem 1: We want to solve (3.68), i.e., un+1 = argmin u∈RN×M 1 2 ‖Tu− f‖22 + λ1 2 ‖bn1 +∇u− vn‖22 + λ2 2 ‖bn2 +∇2u− wn‖22. This is solved through its optimality condition. In the continuous setting that results in a fourth order linear PDE. In the discrete setting the optimality condition reads as follows: T ∗Tun+1 − λ1 (←− Dx(Dx(u n+1)) + ←− Dy(Dy(u n+1)) ) + λ2 (←−− Dxx(Dxx(u n+1)) + ←−− Dyy(Dyy(u n+1)) + 2 ←−− Dxy(Dxy(u n+1)) ) = f + λ1 (←− Dx(b n 1,1 − vn1 ) + ←− Dy(b n 1,2 − vn2 ) ) − λ2 (←−− Dxx(b n 2,11 − wn11) + ←−− Dyy(b n 2,22 − wn22) + 2 ←−− Dxy(b n 2,12 − wn12) ) , (3.73) where here T ∗ is the adjoint operator of T . Note that for the special cases we are consid- ering here we have in fact T = T ∗. Due to the periodic boundary conditions, the discrete differential operators act like circular convolution. The same holds for the operator T from our assumptions. Thus taking Fourier transforms in (3.73) gives FD · F(un+1) = F(Rn), where FD := F(T ∗) · F(T )− λ1 ( F(←−Dx(Dx)) + F(←−Dy(Dy))) ) + λ2 ( F(←−−Dxx(Dxx)) + F(←−−Dyy(Dyy)) + F(←−−Dxy(Dxy)) ) , and Rn is the righthand side of (3.73). Here “·” denotes pointwise multiplication of matrices. Thus we have un+1 = F−1 (F(Rn)/FD) , where “/” denotes pointwise division. Note that the term FD needs to be calculated only 81 The combined TV–TV2 approach for image reconstruction once at the beginning of the algorithm. Subproblem 2: Problem (3.68) has a closed form solution too. We have vn+1 = argmin v∈(RN×M )2 α‖v‖1 + λ1 2 ‖bn1 +∇un+1 − v‖22 = argmin v∈(RN×M )2 ∑ i,j α √ v1(i, j)2 + v2(i, j)2 + λ1 2 ( (bn1,1(i, j) +Dxu n+1(i, j)− v1(i, j))2 +(bn1,2(i, j) +Dyu n+1(i, j)− v2(i, j))2 )⇒ vn+1(i, j) = argmin (z1,z2)∈R2 ∑ i,j α √ z21 + z 2 2 + λ1 2 ( (bn1,1(i, j) +Dxu n+1(i, j)− z1)2 +(bn1,2(i, j) +Dyu n+1(i, j)− z2)2 ) . Setting sn(i, j) = (sn1 (i, j), s n 2 (i, j)) = ( bn1,1(i, j) +Dxu n+1(i, j), bn1,2(i, j) +Dyu n+1(i, j) ) ∈ R2 it is easy to check that vn+11 (i, j) = max ( |sn(i, j)| − α λ1 , 0 ) sn1 (i, j) |sk(i, j)| , vn+11 (i, j) = max ( |sn(i, j)| − α λ1 , 0 ) sn2 (i, j) |sk(i, j)| , with the convention that 0/0 = 0. The above operation on v is known as shrinkage. Subproblem 3: We have wn+1 = argmin w∈(RN×M )4 β‖w‖1 + λ2 2 ‖bn2 +∇2un+1 − w‖22. Working in the same way as in subproblem 2 and setting tn(i, j) = (tn1 (i, j), t n 2 (i, j), t n 3 (i, j), t n 3 (i, j)) = ( bn2,11(i, j) +Dxxu n+1(i, j), bn2,22(i, j) +Dyyu n+1(i, j), bn2,12(i, j) +D12u n+1(i, j), bn2,12(i, j) +Dxyu n+1(i, j) ) ∈ R4, we obtain wn+111 (i, j) = max ( |tn(i, j)| − β λ2 , 0 ) tn1 (i, j) |tn(i, j)| , wn+122 (i, j) = max ( |tn(i, j)| − β λ2 , 0 ) tn2 (i, j) |tn(i, j)| , wn+112 (i, j) = max ( |tn(i, j)| − β λ2 , 0 ) tn3 (i, j) |tn(i, j)| . 82 3.6. Applications to image denoising and deblurring We remark here that all three subproblems (3.68)–(3.70) are solved exactly. This is in contrast with several previous works on total variation minimisation using split Bregman [GO09, Pas12a, Pas12c, Pas12b] where the subproblem of the type (3.68) is solved ap- proximately with one iteration of Gauss–Seidel to the linear system that results from the optimality condition. There, the optimality condition leads to a second order linear PDE instead of fourth order. Even though, convergence is achieved using both methods, the FFT approach leads to a faster and a more robust convergence, see also the corresponding experimental discussion in Section 3.7.2. Note finally that different values of λ1 and λ2 can result in different convergence speeds. Even though, it is not obvious a priori how to choose λ1 and λ2 is order to get optimal performance, this choice has to be done only once and a potential user does not have to worry about it. For the cases of denoising and deblurring we have found empirically that λ1 = λ2 = 1 give good results. For the case of inpainting, see the corresponding discussion in Section 3.7.4. 3.6 Applications to image denoising and deblurring Image denoising In this section we discuss the application of the TV–TV2 approach (3.50) to image denois- ing, i.e., when the operator T is the identity. We have performed experiments to images that have been corrupted by Gaussian noise, thus the L2 norm in the fidelity term is the most suitable. We compare our method with infimal convolution [CL97] solved also with a split Bregman scheme and the total generalised variation [BKP10] solved with the primal- dual method of Chambolle and Pock [CP11] as it is described in [Bre14]. We present among others, examples for a = 0, β > 0 and α > 0, β = 0. Recall that for β = 0, our model corresponds to the classical ROF denoising model, while for α = 0, it corresponds to the pure TV2 restoration [BP10]. Our main assessment tool for the quality of the reconstructions is the structural sim- ilarity index SSIM [WBSS04, WB09]. The reason for this choice is that in contrast to traditional quality measures like the peak signal-to-noise ratio PSNR, the SSIM index also assesses the conservation of the structural information of the reconstructed image. A jus- tification for the choice of SSIM as a good fidelity measure instead of the traditional PSNR can be seen in Figure 3.3. Note that a perfect reconstruction has SSIM=1. Figures 3.3(c) and 3.3(d) show the denoising results of pure TV minimisation (β = 0). Figure 3.3(c) is the one with the highest SSIM value (0.6595) while the SSIM value of Figure 3.3(d) is significantly lower (0.4955). This assessment comes in agreement with the human point of view since, even though this is subjective, one would consider Figure 3.3(c) as a better reconstruction. On the other hand, Figure 3.3(c) has slightly smaller PSNR value (14.02) than Figure 3.3(d) (14.63), which is the one with the highest PSNR value. Similar results 83 The combined TV–TV2 approach for image reconstruction (a) Clean image (b) Noisy image (c) Best SSIM assess- ment (d) Best PSNR assess- met Figure 3.3: Justification for the use of SSIM as an appropriate image quality assessment tool. The initial image, Figure (a) is contaminated with Gaussian noise of variance 0.5, Figure (b). We show the best SSIM valued, Figure (c), (SSIM=0.6595, PSNR=14.02) and the best PSNR valued, Figure (d), (SSIM=0.4955, PSNR=14.63), recontructions among reconstructed images with pure TV denoising (β = 0). The better SSIM assessment agrees more with the human perception. are obtained for β > 0. For this section, we use a predefined number of iterations as a stopping criterium for our algorithm which in most examples is 300. In fact, we observe that after 80-100 iterations the relative residual of successive iterates is of the order of 10−3 or lower (see also Table 3.1) and hence no noticeable change in the iterates is observed after that. Figure 3.4 depicts one denoising example, where the original image has been corrupted by Gaussian noise of variance 0.005. For better visualisation, we include the middle row slices of all reconstructions in Figure 3.5. The highest SSIM value for TV denoising is achieved for α = 0.12 (SSIM=0.8979) while the highest one for TV–TV2 is achieved for α = 0.06, β = 0.03 (SSIM=0.9081). This is slightly better than infimal convolution (SSIM=0.9053). Note, however, that this optimal combination of α and β in terms of SSIM does not always correspond to the best visual result. In general, the latter corresponds to a slightly larger β than the one chosen by SSIM see Figure 3.4(h). Still, for proof of concept, we prefer to stick with an objective quality measure and SSIM, in our opinion, is the most reliable choice for that matter. In the image of Figure 3.4(h) the staircasing effect has almost disappeared and the image is still pleasant to the human eye despite being slightly more blurry. This slight blur, which is the price that the method pays for the removal of the staircasing effect, can be easily and efficiently removed in post processing using simple sharpening filters, e.g. in GIMP. We did that in Figure 3.4(i), also adjusting the constrast, achieving a very good result both visually and SSIM-wise (0.9463). The overall highest SSIM value is achieved by TGV (0.9249), producing a reconstruction of very good quality. However, TGV converges slowly to the true solution. In order to check that, we compute the ground truth solution (denoted by GT) for the parameters of the TGV problem that correspond to Figure 3.4(d), by taking a large amount of iterations (2000). We check the 84 3.6. Applications to image denoising and deblurring (a) Clean image, SSIM=1 (b) Noisy image, Gaussian noise, variance=0.005, SSIM=0.3261 (c) TV denoising, α=0.12, SSIM= 0.8979 (d) TGV denoising, SSIM=0.9249 (e) Infimal convolution denoising, SSIM=0.9053 (f) TV–TV2 denoising, α=0.06, β=0.03, SSIM=0.9081 (g) TV2 denoising, β=0.07, SSIM=0.8988 (h) TV–TV2 denoising, α=0.06, β=0.06, SSIM=0.8989 (i) TV–TV2 denoising, α=0.06, β=0.06, SSIM=0.9463, Post- processing: GIMP sharpening & contrast Figure 3.4: Denoising of a synthetic image that has been corrupted by Gaussian noise of variance 0.005. GPU time that is needed for the iterates to have a relative residual ‖un −GT‖2 ‖GT‖2 ≤ 10 −3, and we do the same for the TV–TV2 example of Figure 3.4(f). For TGV, it takes 1297 iterations (primal-dual method [CP11]) and 36.25 seconds while for TV–TV2 it takes 86 split Bregman iterations and 4.05 seconds, see Table 3.1. That makes our method more suitable for cases where fast but not necessarily optimal results are needed, e.g. video processing. In order to examine the quality of the reconstructions that are produced from each 85 The combined TV–TV2 approach for image reconstruction (a) Clean image, SSIM=1 (b) Noisy image, Gaussian noise, variance=0.005, SSIM=0.3261 (c) TV denoising, α=0.12, SSIM= 0.8979 (d) TGV denoising, SSIM=0.9249 (e) Infimal convolution denoising, SSIM=0.9053 (f) TV–TV2 denoising, α=0.06, β=0.03, SSIM=0.9081 (g) TV2 denoising, β=0.07, SSIM=0.8988 (h) TV–TV2 denoising, α=0.06, β=0.06, SSIM=0.8989 (i) TV–TV2 denoising, α=0.06, β=0.06, SSIM=0.9463, Post-processing: GIMP sharp- ening & contrast Figure 3.5: Corresponding middle row slices of images in Figure 3.4. method as the number of iteration increases, we have plotted the evolution of the SSIM values in Figure 3.6. In the horizontal axis, instead of the number of iterations, we put the absolute CPU time calculated by the product of the number of iterations times the CPU time per iteration as it is seen in Table 3.1. We observe that for TGV, the SSIM value increases gradually with time, while for the methods solved with split Bregman, the image quality peaks very quickly and then remains almost constant except for TV, where the staircasing appears in the later iterations, leading eventually to a decrease of the SSIM value. We also examine how the SSIM and PSNR values of the restored images behave as 86 3.6. Applications to image denoising and deblurring 0 0.17 0.5 1.08 1.5 1.89 2.5 3 0.7 0.75 0.8 0.85 0.9 Seconds S S I M TV TV-TV 2 TGV In f .C onv . Figure 3.6: Evolution of the SSIM index with absolute CPU time for the examples of Figure 3.4. For TV denoising the SSIM value peaks after 0.17 seconds (0.9130) and after it drops when the staircasing appears, see corresponding comments on [GO09]. For TV–TV2 the peak appears after 1.08 seconds (0.9103) and remains essentially constant. The TGV iteration starts to outperform the methods after 1.89 seconds. This shows the potential of split Bregman to produce visually satisfactory results before convergence has occurred, in contrast with the primal-dual method. 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.05 0.1 0.15 0.2 0.25 0.3 SSIM Values of β V a lu e s o f α 0.85 0.86 0.87 0.88 0.89 0.9 (a) SSIM values 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.05 0.1 0.15 0.2 0.25 0.3 PSNR Values of β V a lu e s o f α 26 27 28 29 30 31 32 (b) PSNR values Figure 3.7: Plot of the SSIM and PSNR values of the restored image as functions of α and β, for the example of Figure 3.4. For display convenience all the values under 0.85 (SSIM) and 26 (PSNR) are colour with dark blue. The dotted cells correspond to the highest SSIM (0.9081) and PSNR (32.39) value that was achieved achieved for α = 0.06, β = 0.03 and α = 0.06, β = 0.005 respectively. Note that the first column in both plots corresponds to pure TV denoising, (β = 0). 87 The combined TV–TV2 approach for image reconstruction a function of the weighting parameters α and β. In Figure 3.7 we plot these values for α = 0, 0.02, 0.04, . . . , 0.3 and β = 0, 0.005, 0.01, . . . , 0.1. The plot suggests that both quality measures behave in a continuous way and they have a global maximum. However, PSNR tends to rate higher those images that have been processed with a small value of β or even with β = 0 which is not the case for SSIM. An explanation for this is that higher values of β result to a further loss of contrast, see Figure 3.8, something that is penalised by the PSNR. Note however, that the contrast can be recovered easily in a post-prosessing stage while it is not an easy task to reduce the staircasing effect using conventional processing methods. 50 100 150 200 250 300 350 400 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Original β = 0 β = 0.06 150 160 170 180 190 200 210 220 230 240 250 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Original β = 0 β = 0.06 Figure 3.8: Middle row slices of reconstructed images (left) with α = 0.12, β = 0 (blue colour) and α = 0.12, β = 0.06 (red colour). Slices of the original image are plotted with black colour. A detail of the plot is provided on the right image. Even though the TV–TV2 method eliminates the staircasing effect, it also results to further slight loss of contrast. Finally in Figure 3.9 we perform denoising in an natural image which has been cor- rupted by Gaussian noise of variance 0.005. The staircasing of TV denoising (SSIM=0.8168) is obvious in the Figures 3.9(c) and (i). The overall best performance of the TV–TV2 method (SSIM=0.8319) is achieved by choosing α = β = 0.017, Figure 3.9(d). However, one can get satisfactory results by choosing α = β = 0.023, eliminating further the stair- casing without blurring the image too much, compare for example the details in Figures 3.9(j) and 3.9(k). Image deblurring In our deblurring implementation T denotes a circular convolution with a discrete Gaussian kernel (σ = 2, size: 11 × 11 pixels). The blurred image is also corrupted by additive Gaussian noise of variance 10−4. Deblurring results are shown in Figure 3.10 and the corresponding middle row slices in Figure 3.11. As in the denoising case the introduction of the second order term with a small weight β decreases noticeably the staircasing effect, compare Figures 3.10(c) and 3.10(f). Moreover, we can achieve better visual results if we increase further the value of β without blurring the image significantly, Figure 3.10(g). Infimal convolution, does not give a satisfactory 88 3.6. Applications to image denoising and deblurring (a) Clean image, SSIM=1 (b) Noisy image, Gaussian noise, variance=0.005, SSIM=0.4436 (c) TV denoising, α=0.05, SSIM=0.8168 (d) TV–TV2 denoising, α=0.017, β=0.017, SSIM=0.8319 (e) TV–TV2 denoising, α=0.023, β=0.023, SSIM=0.8185 (f) TV2 denoising, β=0.05, SSIM=0.8171 (g) Clean image, SSIM=1 (h) Noisy image, Gaussian noise, variance=0.005, SSIM=0.4436 (i) TV denoising, α=0.05, SSIM=0.8168 (j) TV–TV2 denoising, α=0.017, β=0.017, SSIM=0.8319 (k) TV–TV2 denoising, α=0.023, β=0.023, SSIM=0.8185 (l) TV2 denoising, β=0.05, SSIM=0.8171 Figure 3.9: Denoising of a natural image that has been corrupted by Gaussian noise of variance 0.005. 89 The combined TV–TV2 approach for image reconstruction (a) Clean image , SSIM=1 (b) Blurred and noisy image, SSIM=0.8003 (c) TV deblurring, α=0.006, SSIM=0.9680 (d) TGV deblurring, SSIM=0.9806 (e) Infimal convolution deblurring, SSIM=0.9466 (f) TV–TV2 deblurring, α=0.004, β=0.0001, SSIM=0.9739 (g) TV–TV2 deblurring, α=0.004, β=0.0002, SSIM=0.9710 (h) TV2 deblurring, β=0.0012, SSIM=0.9199 Figure 3.10: Deblurring of a blurred (Gaussian kernel of variance σ = 2) and noisy (addi- tive Gaussian noise, variance 10−4) synthetic image. 90 3.6. Applications to image denoising and deblurring (a) Clean image , SSIM=1 (b) Blurred and noisy image, SSIM=0.8003 (c) TV deblurring, α=0.006, SSIM=0.9680 (d) TGV deblurring, SSIM=0.9806 (e) Infimal convolution deblurring, SSIM=0.9466 (f) TV–TV2 deblurring, α=0.004, β=0.0001, SSIM=0.9739 (g) TV–TV2 deblurring, α=0.004, β=0.0002, SSIM=0.9710 (h) TV2 deblurring, β=0.0012, SSIM=0.9199 Figure 3.11: Corresponding middle row slices of images in Figure 3.10. 91 The combined TV–TV2 approach for image reconstruction (a) Clean image, SSIM=1 (b) Blurred and noisy image, SSIM=0.7149 (c) TV deblurring, α=0.0007, SSIM=0.8293 (d) TV–TV2 deblurring, α=0.0005, β=0.0001, SSIM=0.8361 (e) TV–TV2 deblurring, α=0.0005, β=0.0003, SSIM=0.8307 (f) TV–TV2 deblurring, α=0.0005, β=0.0003, SSIM=0.8330, Post-processing: GIMP sharpening (g) Clean image, SSIM=1 (h) Blurred and noisy im- age, SSIM=0.7149 (i) TV deblurring, α=0.0007, SSIM=0.8293 (j) TV–TV2 deblurring, α=0.0005, β=0.0001, SSIM=0.8361 (k) TV–TV2 deblurring, α=0.0005, β=0.0003, SSIM=0.8307 (l) TV–TV2 deblurring, α=0.0005, β=0.0003, SSIM=0.8330, Post-processing: GIMP sharpening Figure 3.12: Deblurring of a blurred (Gaussian kernel of variance σ = 2) and noisy (addi- tive Gaussian noise, variance 10−4) natural image. 92 3.7. Applications to image inpainting and online demo in IPOL result here, Figure 3.10(e). TGV gives again the best qualitative result, Figure 3.10(d), but the computation takes about 10 minutes. Even though the time comparison is not completely fair here (the implementation described in [Bre14] does not use FFT) it takes a few thousands of iterations for TGV to deblur the image satisfactorily, in comparison with a few hundreds for the TV–TV2 method. In Figure 3.12 we show the performance of TV–TV2 method for deblurring a natural image. The best result for TV–TV2 (SSIM=0.8361) is achieved with α = 0.0005 and β = 0.0001, Figure 3.12(d). As in the case of denoising, one can increase the value of β slightly, reducing further the staircasing effect, Figures 3.12(e) and 3.12(k). The additional blur which is a result of the larger β can be controlled using a sharpening filter, Figures 3.12(f) and 3.12(l). Discussion In the case of denoising and deblurring we compared our method with TGV, which is con- sidered a state of the art method in the area of higher order image reconstruction methods in the variational context. Indeed, in both image reconstruction tasks, TGV gives better qualitative results, in terms of the SSIM index. However, the computational time that was needed to obtain the TGV result solved with the primal-dual method is significantly more than the one needed to compute the TV–TV2 method using split Bregman, see Table 3.1. We also show that with simple and fast post-processing techniques we can obtain results comparable with TGV. For these reasons, we think that the TV–TV2 approach is partic- ularly interesting for applications in which the speed of computation matters. Regarding the comparison with infimal convolution, our method is slightly faster and results in better reconstructions in deblurring while in denoising the results are comparable. 3.7 Applications to image inpainting and online demo in IPOL 3.7.1 Motivation and IPOL’s philosophy In this section we present examples of the application of our TV–TV2 approach to image inpainting. Recall that in image inpainting, the goal is to reconstruct an image inside a missing part D ⊆ Ω, the inpainting domain, using the information from the intact part. The corresponding minimisation problem in the discrete setting reads as follows min u∈RN×M 1 2 ‖XΩ\D(u− f)‖22 + α‖∇u‖1 + β‖∇2u‖1. (3.74) We note that in the inpainting task one wants to keep both values of α and β small, such that more weight is put on the fidelity term so it remains close to zero and hence 93 The combined TV–TV2 approach for image reconstruction Denoising (B&W) – Image size: 200×300 No of iterations for GT No of iterations for ‖uk −GT‖2/‖GT‖2 ≤ 10−3 CPU time (secs) time per iteration (secs) TV 2000 136 2.86 0.0210 TV2 2000 107 3.62 0.0338 TV–TV2 2000 86 4.05 0.0471 Inf.-Conv. 2000 58 3.33 0.0574 TGV 2000 1297 36.25 0.0279 Deblurring (B&W) – Image size: 200×300 No of iterations for GT No of iterations for ‖uk −GT‖2/‖GT‖2 ≤ 10−2 CPU time (secs) time per iteration (secs) TV 1000 478 10.72 0.0257 TV2 1000 108 3.64 0.0337 TV–TV2 1000 517 25.47 0.0493 Inf.-Conv. 1000 108 7.47 0.0692 TGV CPU time more than 10 minutes 1.22 Table 3.1: Computational times for the examples of Figures 3.4 and 3.10. We computed a ground true solution (GT) for every method by taking a large number of iterations and record the number of iterations and CPU time it takes for the relative residual between the iterates and the ground true solution to fall below a certain threshold. The TGV examples were computed using σ = τ = 0.25 in the primal-dual method described in [Bre14]. The implementation was done using MATLAB in a Macbook 10.7.3, 2.4 GHz Intel Core 2 Duo and 2 GB of memory. it essentially holds u = f in Ω \ D. As we are going to see later on, connectivity along large gaps in the inpainting domain is essentially achieved only with pure TV2 inpainting (α = 0). However, we keep on using this TV–TV2 combination in order to keep the method flexible, regarded as a superset of pure TV and pure TV2 inpainting. In this way, we can study the effect that each term has on the inpainting image. As we have already mentioned, our motivation here is that the kind of regularisation given in (3.74) has the ability to connect large gaps in the inpainting domain with the price of some blur, see Figure 3.13. There, the task is to inpaint a large gap in a black stripe. (a) Inpainting domain: grey area (b) Harmonic inpaint- ing (c) TV inpainting (d) TV2 inpainting Figure 3.13: Illustration of the connectivity property of TV2 inpainting. Harmonic inpainting, i.e., regularising with ‖∇u‖22, achieves no connectivity, producing a 94 3.7. Applications to image inpainting and online demo in IPOL rather smooth result, Figure 3.13(b). Pure TV inpainting is not able to make a connection at all, Figure 3.13(c) but pure TV2 inpainting is able to connect the large gap with the price of some blur. As we will see in Section 3.7.5, this connectivity depends also on the size and geometry of the inpainting domain. We will next present an algorithm in order to solve (3.74). We also provide a source code for that algorithm written in C and an online demonstration in Image Processing Online (IPOL) accessible on the online version of [PSS13], http://dx.doi.org/10.5201/ ipol.2013.40. IPOL is a peer reviewed online research journal specialising on image processing algorithms. Its philosophy is that each article consists of a standard text describing an image processing algorithm and also its source code, along with an online demonstration in a provided platform. Everyone has access to the code and the online demonstration and can test by themselves what each method does (open access software). In Figure 3.14 we provide two screenshots from the IPOL website that show the structure (a) Setting D and the parameters α, β. (b) Showing the inpainting result. Figure 3.14: IPOL’s online demonstration platform. of the online demonstration. The user can choose the parameters α and β as well as the inpainting domain D. There is also an opportunity for the users to upload their own image. Every test of the method is stored in the “Archive” section of the platform. Each algorithm should be able to compute the result in at most 30 seconds for images of resolution up to 500× 500 pixels. Thus, we need a fast and efficient algorithm in order to solve (3.74) which we present in the next section. 3.7.2 Split Bregman for TV–TV2 inpainting The algorithm for TV–TV2 inpainting is also based on a splitting technique but it is slightly different than the one used for denoising and deblurring. As in the latter cases, we first transform the unconstrained minimisation problem (3.74) into a constrained one, formulate the corresponding Bregman iteration algorithm and then we minimise alternatingly with 95 The combined TV–TV2 approach for image reconstruction respect to all the different variables. Note that (3.74) is equivalent to min u∈RN×M , u˜∈RN×M v∈(RN×M )2, w∈(RN×M )4 ‖XΩ\D(u−f)‖22+α‖v‖1+β‖w‖1, such that u = u˜, v = ∇u˜, w = ∇2u˜. (3.75) The introduction of the auxiliary variable u˜ is crucial since it allows the use of fast Fourier transform, leading to a fast implementation see Remark in page 97. The Bregman iteration that corresponds to (3.75) is (un+1, u˜n+1, vn+1, wn+1) = argmin u,u˜,v,w ‖XΩ\D(u− f)‖22 + α‖v‖1 + β‖w‖1 + λ0 2 ‖bn0 + u˜− u‖22 + λ1 2 ‖bn1 +∇u˜− v‖22 + λ2 2 ‖bn2 +∇2u˜− w‖22, (3.76) bn+10 = b n 0 + u˜ n+1 − un+1, bn+11 = b n 1 +∇u˜n+1 − vn+1, bn+12 = b n 2 +∇2u˜n+1 − wn+1. Solving (3.76) approximately by minimising alternatingly with respect to u, u˜, v and w, leads to the split Bregman algorithm for TV–TV2 inpainting. We also refer the reader to the works by Getreuer in IPOL [Pas12a, Pas12b, Pas12c] which use the split Bregman method for TV denoising, deblurring and inpainting respectively. In our case the split Bregman formulation reads as follows: Split Bregman for L2 −TV −TV2 inpainting – Greyscale images Subproblem 1: un+1 = argmin u∈RN×M 1 2 ‖XΩ\D(u− f)‖22 + λ0 2 ‖bn0 + u˜n − u‖22 (3.77) Subproblem 2: u˜n+1 = argmin u˜∈RN×M λ0 2 ‖bn0 + u˜− un+1‖22 + λ1 2 ‖bn1 +∇u˜− vn‖22 (3.78) + λ2 2 ‖bn2 +∇2u˜− wn‖22, Subproblem 3: vn+1 = argmin v∈(RN×M )2 α‖v‖1 + λ1 2 ‖bn1 +∇u˜n+1 − v‖22, (3.79) Subproblem 4: wn+1 = argmin w∈(RN×M )4 β‖w‖1 + λ2 2 ‖bn2 +∇2u˜n+1 − w‖22, (3.80) Update on b0: b n+1 0 = b n 0 + u˜ n+1 − un+1, (3.81) Update on b1: b n+1 1 = b n 1 +∇u˜n+1 − vn+1, (3.82) 96 3.7. Applications to image inpainting and online demo in IPOL Update on b2: b n+1 2 = b n 2 +∇2u˜n+1 − wn+1. (3.83) Note that problem (3.78) is solved through its optimality condition, using FFT, exactly like the corresponding problem (3.68). Similarly, problems (3.79) and (3.80) are solved via shrinkage, like problems (3.69) and (3.70). Lastly, it is easy to show that problem (3.77) has a closed form solution, so it can be solved very fast as well. Indeed, we have, un+1 = argmin u∈RN×M 1 2 ‖XΩ\D(u− f)‖22 + λ0 2 ‖bn0 + u˜n − u‖22 = argmin u∈RN×M ∑ i,j XΩ\D(u(i, j)− f(i, j))2 + λ0 2 (bn0 (i, j) + u˜ n(i, j)− u(i, j))2 ⇒ un+1(i, j) = argmin z∈R XΩ\D(z − f(i, j))2 + λ0 2 (bn0 (i, j) + u˜ n(i, j)− z)2 . It is trivial to calculate the solution of the last minimisation problem: un+1(i, j) = ( 2XΩ\D(i, j) λ0 + 2 ) f(i, j)+ ( λ0 + 2(1−XΩ\D(i, j)) λ0 + 2 ) (bn0 (i, j)+ u˜ n(i, j)). (3.84) Remark: One can notice here the utility of the auxiliary variable u˜. Without this variable, the minimisation problem (3.77) would be absent and the problem (3.78) would have the form un+1 = argmin u∈RN×M ‖XΩ\D(u− f)‖22 + λ1 2 ‖bn1 +∇u− vn‖22 + λ2 2 ‖bn2 +∇2u− wn‖22. (3.85) However, we cannot take advantage of the fast Fourier transform in order to solve the optimality condition of (3.85) since in general F(XΩ\D) 6= XΩ\DF(u). An alternative approach would be to solve (3.85) approximately, using one or more iterations of Gauss– Seidel to the linear system that corresponds to its optimality condition as it is done in [GO09, Pas12a, Pas12c, Pas12b]. That would reduce the number of subproblems to be solved but it would have the disadvantage that one of them is not solved exactly. However, we prefer to introduce this auxiliary variable u˜ increasing the number of subproblems to four but being able to solve them exactly. In order to justify that, see Figure 3.15 for a quantitative comparison of the two approaches. There, we perform TV–TV2 inpainting on the image of Figure 3.15(a) using both algorithms i.e., (3.68)–(3.72) using one iteration of Gauss-Seidel for (3.68) and (3.77)–(3.83) using FFT in (3.77). For each approach, we plot the relative error ‖un −GT‖2 ‖GT‖2 , where GT denotes a ground solution which is taken to be an iterate un for a large enough 97 The combined TV–TV2 approach for image reconstruction (a) Original image. (b) Inpainting domain (c) Inpainted image with α = β = 0.001. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 10−10 10−8 10−6 10−4 10−2 100 No of iterations R e la t iv e e r r o r b e tw e e n it e r a t e s -s o lu t io n FFT GS, λ 1 = λ 2 = 0.01 GS, λ 1 = λ 2 = 0.001 GS, λ 1 = λ 2 = 0.005 GS, λ 1 = λ 2 = 0.05 (d) Plot of the relative error ‖uk −GT‖2/‖GT‖2 versus the number of iterations. Here sol denotes a ground solution which is taken to be the solution un for a large enough n (3000). Figure 3.15: Comparison of the efficiency of the use of fast Fourier transform for the solution of (3.78) over one iteration of Gauss–Seidel for the solution of (3.85). n (3000). For the FFT approach, the choice of λ’s is done according to the rule (3.105) presented in Section 3.7.4, while for the Gauss–Seidel the selection of λ’s is shown in Figure 3.15(d). As Figure 3.15(d) suggests (and also observed empirically) solving the linear subproblem with FFT leads to a faster convergence. Using more than one iterations of Gauss-Seidel does not improve the situation as it increases further the computational time, see also the corresponding discussion in [GO09]. 3.7.3 The colour image case So far we have only considered greyscale images, i.e., u ∈ Rn×m. However it is easy to derive a version of the algorithm (3.77)–(3.83) for colour images, that is to say when u ∈ (R3)N×M , where for simplicity we only consider RGB images. Starting from the continuous setting, if u = (u1, u2, u3) = (u, u, u) ∈ BV(Ω,R3), then recall that the total variation of u is defined as TV(u) = sup { 3∑ a=1 ˆ Ω uadivva dx : v ∈ C1c (Ω,R3×2), ‖v‖∞ ≤ 1 } . (3.86) 98 3.7. Applications to image inpainting and online demo in IPOL As in the scalar case, if u ∈W 1,1(Ω,R3), then TV(u) = ˆ Ω |∇u| dx. (3.87) Analogously we define TV2(u). In the discrete setting the first and second order total variation have the form TV(u) = ‖(∇u,∇u,∇u)‖1, (3.88) TV2(u) = ‖(∇2u,∇2u,∇2u)‖1. (3.89) The minimisation problem corresponding to (3.74) now reads min u∈(R3)N×M ‖XΩ\D(u− f)‖22 + α‖∇u‖1 + β‖∇2u‖1. (3.90) It is not hard to check that the only difference to the minimisation algorithm of the greyscale case lies in the subproblems (3.79) and (3.80). This is the only point where the colours are “mixed”. Subproblems (3.77), (3.78) and the updates (3.81)–(3.83) are done separately for each channel. This favours parallelisation of the algorithm. Thus the split Bregman algorithm for colour image inpainting reads as follows: Split Bregman for L2 −TV −TV2 inpainting – colour images Subproblem 1: Solve (3.77) separately for un+1, un+1, un+1, (3.91) Subproblem 2: Solve (3.77) separately for u˜n+1, u˜n+1, u˜n+1, (3.92) Subproblem 3: vn+1 = argmin v∈((RN×M )2)3 α‖v‖1 + λ1 2 ‖bn1 +∇u˜n+1 − v‖22, (3.93) Subproblem 4: wn+1 = argmin w∈((RN×M )4)3 β‖w‖1 + λ2 2 ‖bn2 +∇2u˜n+1 −w‖22, (3.94) Update on b0: Do update (3.81) separately for b n+1 0 , b n+1 0 , b n+1 0 , (3.95) Update on b1: Do update (3.82) separately for b n+1 1 , b n+1 1 , b n+1 1 , (3.96) Update on b2: Do update (3.83) separately for b n+1 2 , b n+1 2 , b n+1 2 . (3.97) 99 The combined TV–TV2 approach for image reconstruction As far as the solution of (3.93) is concerned, similarly as before we set sn(i, j) = (sn1 (i, j), s n 2 (i, j), s n 1 (i, j), s n 2 (i, j), s n 1 (i, j), s n 2 (i, j)) = ( bn1,1(i, j) +D1u˜ n+1(i, j), bn1,2(i, j) +D2u˜ n+1(i, j), bn1,1(i, j) +D1u˜ n+1(i, j), bn1,2(i, j) +D2u˜ k+1(i, j), bn1,1(i, j) +D1u˜ n+1(i, j), bn1,2(i, j) +D2u˜ n+1(i, j) ) , and we can easily compute vn+1x (i, j) = max ( |sn(i, j)| − α λ1 , 0 ) sn1 (i, j) |sn(i, j)| , (3.98) vn+1y (i, j) = max ( |sn(i, j)| − α λ1 , 0 ) sn2 (i, j) |sn(i, j)| , (3.99) vn+1x (i, j) = max ( |sn(i, j)| − α λ1 , 0 ) sn1 (i, j) |sn(i, j)| , (3.100) vn+1y (i, j) = max ( |sn(i, j)| − α λ1 , 0 ) sn2 (i, j) |sn(i, j)| , (3.101) vn+1x (i, j) = max ( |sn(i, j)| − α λ1 , 0 ) sn1 (i, j) |sn(i, j)| , (3.102) vn+1y (i, j) = max ( |sn(i, j)| − α λ1 , 0 ) sn2 (i, j) |sn(i, j)| . (3.103) Analogously, we compute the solution of (3.94). We finally note that an analogous gen- eralisation to the colour case can be done for the denoising and deblurring algorithm (3.68)–(3.72). 3.7.4 Stopping criteria and the selection of λ’s Establishing stopping criteria for our algorithm is necessary for the online implementation. After extensive experimentation, we have found empirically that a stopping criterion de- pending on the relative residue of the iterates un is an appropriate one for the termination of the algorithm. Thus, the iterations are terminated when max { |uk − uk−1| |uk−1| , |uk − uk−1| |uk−1| , |uk − uk−1| |uk−1| } ≤ 8 · 10−5. (3.104) The values of the parameters λ0, λ1 and λ2 have to be chosen carefully. This selection is done automatically in the online demo. All the combinations eventually lead to con- vergence but the number of iterations that are needed for convergence differ dramatically for different combinations. Note that in the case of denoising and deblurring, only two parameters have to be specified, λ1 and λ2 instead of three here. This makes the choice a more difficult task as they have to be chosen such that they balance each other in an 100 3.7. Applications to image inpainting and online demo in IPOL optimal way within the problems (3.77)–(3.80). Empirically we have found that λ1 and λ2 have to be one or two orders of magnitude larger than α and β respectively. The value of λ0 should not be much larger than α and β as that leads to a slow convergence. Here we choose the following values of these parameters, a combination that has shown to ensure fast convergence: λ1 = 10α, λ2 = 10β, λ0 = max{α, β}. (3.105) Table 3.2 and Figure 3.16 justify this empirical choice. In Figure 3.16, we perform pure TV2 inpainting to the image of Figure 3.15 for different choices of λ0 and λ2 and fixed β = 0.001. In each case we plot the relative error between the iterates and the solution. The solution is taken to be a large iterate (n = 4000) that is obtained with the choice (3.105). What we generally observe can be seen in Table 3.2. The choice (3.105) leads to a fast convergence with small oscillatory behaviour in the iterates, see black line in Figure 3.16. Choice of λ’s Convergence to solution Qualitative behaviour λ0 = β, λ2 = 10β Fast Small oscillations λ0 = β, λ2  10β Fairly fast Smooth transition λ0 = β, λ2  10β Very slow Wild oscillations λ0  β, λ2 = 10β Very slow Smooth transition λ0  β, λ2 = 10β Slow Smooth transition Table 3.2: Different behaviours of the iterates for different choices of the λ parameters. We refer the reader to the Appendix of [PSS13] for some further experimental dis- cussion on the effect of different choices of λ’s. We should note here that the stopping criterion (3.104) works well only in combination with the choices (3.105). Some choices of λ’s lead to such a slow convergence that even though criterion (3.104) might be satisfied, the gap in the inpainting domain may not be filled in yet. 3.7.5 Inpainting examples In this section we give some inpainting examples of the TV–TV2 method. Mainly we want to emphasise the differences in practice between pure first order (β = 0) and pure second order (α = 0) total variation inpainting. In Figure 3.17 we see a first example where we try to remove a large font text from a natural image. The TV2 result seems more pleasant to the human eye (subjective) as TV inpainting produces a piecewise constant reconstruction in the inpainting domain. This is confirmed quantitatively by the PSNR values inside the inpainting domain, as these are 101 The combined TV–TV2 approach for image reconstruction 0 200 400 600 800 1000 1200 1400 1600 1800 2000 10−10 10−8 10−6 10−4 10−2 100 No of iterations R e la t iv e e r r o r b e t w e e n it e r a t e s - s o lu t io n λ 2 = 10 −2 , λ 0 = 10 −3 λ 2 = 10 −1 , λ 0 = 10 −3 λ 2 = 10 −4 , λ 0 = 10 −3 λ 2 = 10 −2 , λ 0 = 10 −1 λ 2 = 10 −2 , λ 0 = 10 −5 Figure 3.16: Plot of the relative error of the iterates and the solution (pure TV2 inpainting of the image in Figure 3.15) for different choices of the parameters λ0 and λ2. The value of β is 0.001 and the solution is taken to be the 4000–th iterate obtained with λ0 = 0.001, λ2 = 0.01, see black line. We observe that other choices of λ’s generally lead to a slower convergence. (a) Original image with text (b) Pure TV inpainting, α=0.001 (c) Pure TV2 inpainting, β=0.001 (d) Detail of original image (e) Detail of TV inpainting (f) Detail of TV2 inpainting Figure 3.17: Removing large font text from an image. Even though this is subjective, we believe that the pure TV2 result looks more realistic since the piecewise constant TV re- construction is not desirable for natural images, see for example the inpainting of the letter “T”. The PSNR inside the inpainting domain for the three channels is (34.19, 34.97, 36.99) for the TV inpainting and (34.74, 35.34, 37.20) for the TV2 inpainting. higher for the TV2 inpainting, see the caption of Figure 3.17. In Figure 3.18 we see a second example with colour stripes. Here we include also 102 3.7. Applications to image inpainting and online demo in IPOL (a) Original image. (b) Harmonic inpainting, α=0.001 (c) Pure TV inpainting, α=0.001 (d) Pure TV2 inpaint- ing, β=0.001 (e) Detail of TV inpainting (f) Detail of TV2 inpainting (g) Detail of harmonic inpainting (h) Detail of TV2 inpainting Figure 3.18: colour stripes inpainting. Observe the difference between TV and TV2 in- painting in the light blue stripe at the left, Figures (e) and (f), where in the TV2 case the blue colour is propagated inside the inpainting domain. a harmonic inpainting example. As expected, in the case of TV inpainting the image is piecewise constant inside the inpainting domain, see Figure 3.18(c). This produces a desirable result in stripes whose width is larger than the size of the gap, connecting the gap and preserving the edges there, while TV2 inpainting, Figure 3.18(d), adds some additional blur at the edge of the stripe that belongs to the inpainting domain. However, TV inpainting fails to connect the thin stripes while TV2 propagates the right colour more reliably, see Figures 3.18(e) and 3.18(f). Finally, notice the difference between harmonic and TV2 inpainting in terms of connectivity, as the example of the yellow stripe of Figures 3.18(g) and 3.18(h) indicates. Let us note here that Euler’s elastica is also able of achieving connection along large gaps but even its fast implementation in [THC11] is more than ten times slower than our method, see the corresponding tables in that paper. We also want to examine how the inpainting result behaves while we are moving from pure TV to pure TV2 inpainting in a continuous way. We can see this in Figure 3.19, where, keeping the geometry of the inpainting domain fixed, we see how connectivity is achieved while we are making the transition from pure TV to pure TV2. The parameters α and β vary as follows: α : 0.01→ 0.008→ 0.006→ 0.005→ 0.004→ 0.002→ 0, 103 The combined TV–TV2 approach for image reconstruction (a) Inpainting domain (b) α=0, β=0.01 (c) α=0.008, β=0.002 (d) α=0.006, β=0.004 (e) α=0.005, β=0.005 (f) α=0.004, β=0.006 (g) α=0.002, β=0.008 (h) α=0, β=0.01 Figure 3.19: Inpainting of a stripe with a large gap. Transition from pure TV to pure TV2. Connectivity is achieved only for large ratio β/α, see Figures (g) and (h). No connectivity is obtained when the weights α and β are equal, see Figure (e). β : 0 → 0.002→ 0.004→ 0.005→ 0.006→ 0.008→ 0.01. As we see in Figure 3.19, the large gap in the stripe is connected only for large values of β and small values of α. In the case where both weights α and β are equal we observe no connection. Finally we want to show the influence that the inpainting domain has on large gap con- nectivity. As we saw, in Figures 3.13(d) and 3.19(h), pure TV2 inpainting has the ability to connect large gaps along the inpainting domain. However, as numerical experiments have shown, the quality of the connection depends on the geometry of the inpainting domain D, see Figures 3.20 and 3.21. In particular, for the broken stripe example of Figures 3.20 and 3.21, the domain must extend above, underneath, left and right of the gap between the two parts of the stripe. For example in Figure 3.20(e), where the inpainting domain is just the area between the two parts, we do not have connection, see Figure 3.20(j), nor we have when we extend the domain only in the vertical direction, Figures 3.21(e) and (j). In conclusion, TV–TV2 inpainting, especially its pure TV2 version, is a fast inpainting method that is capable of producing visually pleasing results. Its computational cost is not much higher than the one needed for TV inpainting and its connectivity properties make it in our opinion an appealing method to use, as far as local inpainting is concerned. 104 3.7. Applications to image inpainting and online demo in IPOL (a) Domain 1 (b) Domain 2 (c) Domain 3 (d) Domain 4 (e) Domain 5 (f) Domain 1, TV2 inpainting (g) Domain 2, TV2 inpainting (h) Domain 3, TV2 inpainting (i) Domain 4, TV2 inpainting (j) Domain 5, TV2 inpainting Figure 3.20: Different pure TV2 inpainting results for different inpainting domains of decreasing height. In all computations we set β = 0.001. (a) Domain 1 (b) Domain 6 (c) Domain 7 (d) Domain 8 (e) Domain 9 (f) Domain 1, TV2 inpainting (g) Domain 6, TV2 inpainting (h) Domain 7, TV2 inpainting (i) Domain 8, TV2 inpainting (j) Domain 9, TV2 inpainting Figure 3.21: Different pure TV2 inpainting results for different inpainting domains of decreasing width. In all computations we set β = 0.001. 105 The combined TV–TV2 approach for image reconstruction 106 Chapter 4 Exact solutions of the one dimensional TGV regularisation problem 4.1 Introduction As we have already seen in the introduction of the thesis, the total generalised variation of second order is a high quality regulariser that incorporates second order derivatives. In its more general definition, it incorporates derivatives up to k–th order, see [BKP10]. Given a function u ∈ L1(Ω) with Ω ⊆ R2 being open and bounded, its total generalised variation of order k is defined as: TGVkα(u) = sup {ˆ Ω udivkv dx : v ∈ Ckc (Ω,Symk(Rd)), ‖diviv‖∞ ≤ αi, i = 0, . . . , k − 1 } , (4.1) where α = (α0, . . . , αk−1) is a multi-index of strictly positive entries and Symk(Rd) denotes the space of symmetric tensors of order k with arguments in Rd. However, regarding its use in regularisation problems of the type min u Φ(Tu, f) + TGVkα(u), only the case k = 2 is typically considered. This is because of the increased computa- tional cost to minimise TGVkα(u), for k > 2 and also due to the fact that no significant improvement to the reconstructed image is observed to justify this additional cost. Note also that, as it follows easily from the definition, TGV1α(u) = αTV(u). 107 Exact solutions of the one dimensional TGV regularisation problem We are thus focusing on the second order total generalised variation, which can be written in a more familiar way as TGV2β,α(u) = sup {ˆ Ω udiv2v dx : v ∈ C2c (Ω,Sd×d), ‖v‖∞ ≤ β, ‖divv‖∞ ≤ α } , (4.2) where α, β > 0 and Sd×d is the space of d× d symmetric matrices. As we mentioned in the introduction, the use of TGV in variational image reconstruc- tion problems is becoming increasingly popular, due to the high quality of the restoration results. Like TV, it has the ability to preserve edges in the reconstructed images and in the same time it avoids the creation of the staircasing effect. Our motivation for this chapter is to gain more understanding of these properties of TGV by computing exact solutions for the one dimensional TGV denoising problem with L2 fidelity term min u 1 2 ˆ Ω (u− f)2dx+ TGV2β,α(u), (4.3) for simple data functions f . We examine under which conditions discontinuities are pre- served and what is the role of the parameters α and β. The data functions we are con- sidering is a piecewise constant function fp.c., a piecewise affine function fp.a. and a hat function fhat, see Figure 4.1. The rationale is that by choosing as data the functions fp.c. h L L (a) Function fp.c. h L L f ′p.a. = λ f ′p.a. = λ (b) Function fp.a. L L λL f ′hat = −λ f ′hat = λ (c) Function fhat Figure 4.1: Data functions for which we compute exact solutions of the L2–TGV2β,α de- noising problem. and fp.a. we can study the preservation of jumps and piecewise affinity in the solutions while by choosing fhat we examine the effect of TGV on local extrema. 4.1.1 Organisation of the chapter We start by describing some basic properties of the TGV functional in Section 4.2, includ- ing a useful equivalent definition that we are going to use for the rest of the chapter. In Section 4.3 we formulate the one dimensional L2–TGV2β,α denoising problem. We identify its predual problem for which we prove well-posedness and derive the correspond- ing optimality conditions using Fenchel–Rockafellar duality theory, as this was described in Section 2.5.2. 108 4.2. Basic properties of TGV Section 4.4 deals with a description of the properties of the solutions. More specifically, in Section 4.4.1, the structure of solutions is examined with emphasis on piecewise affinity, the preservation of discontinuities, behaviour near and away of the boundary of Ω and near jump points. In Section 4.4.2 we prove some interesting facts about the L2–linear regression of the data. Among other results, we provide certain thresholds for the param- eters α and β, above which, the resulting solution is the L2–linear regression of the data f . Section 4.4.3 starts with some remarks on the inheritance of the data’s symmetry to the solution. We proceed with proving the equivalence of TGV and TV regularisation, at least for even data, under some assumptions on α and β. In Section 4.5 we extend the latter result to dimension two. In particular, we prove that for symmetric enough data functions f , the two dimensional TGV and TV regularisations coincide for large enough ratio β/α and we also provide some numerical examples that justify this result. The computation of exact solutions for simple data functions f is done in Section 4.6. In Section 4.6.1 exact solutions are computed, taken as data a piecewise constant function with a single jump. We show that only four types of solutions are possible and we show for which combinations of the parameters α and β we have each kind of solution. Having these results as a basis, we determine straightforwardly the solutions for a piecewise affine function with a single jump, in Section 4.6.2. In Section 4.6.3 we compute exact solutions for a hat (absolute value–type of) function and also provide the corresponding combinations for α and β that lead to each type of solutions. As in the previous case, only four different types of solutions can occur. Finally, in Section 4.6.4 we provide some numerical experiments. We show that the exact solutions coincide with the numerical ones in the absence of noise, as expected. We also numerically compute some solutions with noisy data, in which we observe only a slight deviation from the corresponding solutions with clean data. 4.2 Basic properties of TGV In this section we recall some of the basic properties of the TGV functional as these were proved in [BKP10, BV11, BKV13]. Proposition 4.2.1. Let Ω ⊆ Rd be open and bounded. Then the following facts hold: (i) TGV2β,α is a seminorm on the Banach space BGV2β,α(Ω) = { u ∈ L1(Ω) : TGV2β,α(u) <∞ } , where ‖u‖BGV2β,α(Ω) := ‖u‖L1(Ω) + TGV 2 β,α(u). (ii) TGV2β,α(u) = 0 if and only of u is a polynomial of degree less than 2. 109 Exact solutions of the one dimensional TGV regularisation problem (iii) TGV2β,α is rotationally invariant. (iv) TGV2β,α is a proper, convex, lower semicontinuous functional on every L p(Ω), 1 ≤ p <∞. (v) If Ω has Lipschitz boundary then there exist constants 0 < c < C that depend only on Ω, such that for every u with TGV2β,α(u) <∞ we have c‖u‖BV(Ω) ≤ ‖u‖L1(Ω) + TGV2β,α(u) ≤ C‖u‖BV(Ω). The statement (v) of Proposition 4.2.1 actually says that BGV2β,α(Ω) and BV(Ω) are topo- logically equivalent Banach spaces. Thus, the minimisation of the L2–TGV2β,α problem is done over BV(Ω). We now present an alternative definition of TGV2β,α that was proved in [BKV13]. Proposition 4.2.2. Let u ∈ L1(Ω). Then TGV2β,α(u) = min w∈BD(Ω) α‖Du− w‖M + β‖Eu‖M, (4.4) where BD(Ω) is the space of functions of bounded deformation, i.e., the space of all func- tions in L1(Ω,Rd) such that the symmetrised gradient Eu can be represented by a finite Radon measure. In one dimension we have BD(Ω) = BV(Ω) and E(u) = Du, thus (4.4) can be written as TGV2β,α(u) = min w∈BV(Ω) α‖Du− w‖M + β‖Du‖M. (4.5) As we have mentioned in the introduction, (4.4) is called the minimum or primal definition of TGV2β,α and it is the one that we are going to use for our purposes, while (4.2) is called the supremum or predual definition. 4.3 Formulation of the problem and optimality conditions We are interested in studying the one dimensional second order TGV denoising problem with L2 fidelity term, i.e., min u∈BV(Ω) 1 2 ˆ Ω (u− f)2dx+ TGV2β,α(u), (4.6) where Ω = (a, b) is an open interval of R and f ∈ L2(Ω). Note that existence of solution for (4.6) follows from a simple application of the direct method of calculus of variations using the weak∗ compactness in BV(Ω) and the lower semicontinuity of TGV while uniqueness follows from the strict convexity of (4.6). Using the formulation (4.5), the problem (4.6) 110 4.3. Formulation of the problem and optimality conditions is equivalent to min u∈BV(Ω) w∈BV(Ω) 1 2 ˆ Ω (u− f)2dx+ α‖Du− w‖M + β‖Dw‖M. (P) Let us note here that while a solution (w, u) for P exists and uniqueness is guaranteed for u, the same is not true for w. In fact w is a solution to an L1–TV problem, which is not strictly convex: w ∈ argmin w∈BV(Ω) α‖Du− w‖M + β‖Dw‖M ⇐⇒ w ∈ argmin w∈BV(Ω) ‖Dau+Dsu− w‖M + β α |Dw|(Ω) ⇐⇒ w ∈ argmin w∈BV(Ω) ˆ Ω |u′ − w| dx+ β α TV(w). In order to study exact solutions of the problem P, we essentially follow [BKV13] where the corresponding L1 fidelity term case is studied. We identify the predual problem of P and derive the optimality conditions using Fenchel–Rockafellar duality. Consider the following problem sup {ˆ Ω fv′′dx− 1 2 ˆ Ω (v′′)2dx : v ∈ H20 (Ω), ‖v‖∞ ≤ β, ‖v′‖∞ ≤ α } . (P ′) We shall prove that P ′ is the predual problem of P. Firstly, we show that P ′ has a solution indeed. Proposition 4.3.1. The problem P ′ admits a unique solution in H20 (Ω). Proof. Because of the estimate ‖v‖∞ + ‖v′‖∞ ≤ C‖v‖H20 (Ω), ∀v ∈ H 2 0 (Ω), see [Eva10], we have that the set K = { v ∈ H20 (Ω) : ‖v‖∞ ≤ β, ‖v′‖∞ ≤ α } , is norm-closed and since it is convex, it is also weakly closed from Mazur’s theorem. Since, according to Proposition 2.5.2, (12‖ · ‖2L2(Ω))∗ = 12‖ · ‖2L2(Ω), we have that sup v∈L2(Ω) ˆ Ω fv dx− 1 2 ˆ Ω v2 dx = (1 2 ‖ · ‖2L2(Ω) )∗ (f) = 1 2 ‖f‖2L2(Ω). Thus the supremum in P ′ is finite and we denote it with supP ′. Consider now a maximising 111 Exact solutions of the one dimensional TGV regularisation problem sequence (vn)n∈N such that ˆ Ω fv′′ndx− 1 2 ˆ Ω (v′′n) 2dx→ supP ′, as n→∞. Then there exists a positive constant M such that∣∣∣∣ˆ Ω fv′′n dx− 1 2 ˆ Ω (v′′n) 2 dx ∣∣∣∣ ≤M, ∀n ∈ N. (4.7) We will show that the sequence (‖v′′n‖L2(Ω))n∈N is bounded as well. Suppose not, then lim supn→∞ ´ Ω(v ′′ n) 2 dx =∞. Thus, passing to a subsequence if necessary we have ˆ Ω fv′′n dx− 1 2 ˆ Ω (v′′n) 2 dx ≤ ‖f‖L2(Ω)‖v′′n‖L2(Ω) − 1 2 ‖v′′n‖2L2(Ω) → −∞, as n→∞. which is a contradiction from (4.7). Hence, the sequence (vn)n∈N is bounded in H20 (Ω) and since that space is reflexive we get the existence of a subsequence (vnk)k∈N converging weakly to a function v ∈ H20 (Ω). Since K is weakly closed we have that v ∈ K. Moreover the maximising functional is weakly upper semicontinuous since ´ Ω f(·)′′ dx and−12‖·‖2L2(Ω) are weakly continuous and weakly upper semicontinuous in H20 (Ω) respectively. Thus, we have supP ′ ≥ ˆ Ω fv′′ dx− 1 2 ˆ Ω (v′′)2dx ≥ lim sup k→∞ ˆ Ω fv′′nkdx− 1 2 ˆ Ω (v′′nk) 2dx = supP ′, which means that v is a solution to P ′. This solution is unique as the maximising functional is strictly concave, defined on a convex domain. We define now the following spaces and operators: • X = H20 (Ω)×H10 (Ω), Y = H10 (Ω)× L2(Ω), • Λ : X → Y, linear, bounded, • F1 : X → R, F2 : Y → R, with Λ(v, ω) = (ω + v′, ω′), F1(v, ω) = I{‖·‖∞≤β}(v) + I{‖·‖∞≤α}(ω), F2(φ, ψ) = I{0}(φ) + ˆ Ω fψ dx+ 1 2 ˆ Ω ψ2 dx. It is easy to check that under these definitions the problem P ′ is a equivalent to − min (v,ω)∈X F1(v, ω) + F2(Λ(v, ω)). (4.8) 112 4.3. Formulation of the problem and optimality conditions The dual problem of (4.8), see Section 2.5.2, is min (w,u)∈Y ∗ F ∗1 (−Λ∗(w, u)) + F ∗2 (w, u). (4.9) Moreover, X, Y are Banach spaces, F1, F2 are proper, lower semicontinuous functions and the condition Y = ⋃ λ≥0 λ(dom(F2)− Λ(dom(F1))), (4.10) holds. Thus we have that Theorem 2.5.6 holds, i.e., no duality gap occurs:( min (v,ω)∈X F1(v, ω) + F2(Λ(v, ω)) ) + ( min (w,u)∈Y ∗ F ∗1 (−Λ∗(w, u)) + F ∗2 (w, u) ) = 0. The fact that condition (4.10) holds follows from [BKV13, Proposition 3.6], where the same condition is proved for the same X, Y , F1, Λ and for a F2 with a smaller domain. Our next step is to identify the dual problem (4.9) with the problem P. Proposition 4.3.2. The problem min (w,u)∈Y ∗ F ∗1 (−Λ∗(w, u)) + F ∗2 (w, u), (4.11) is equivalent to P in the sense that (w, u) solve (4.11) if and only if (w, u) ∈ BV(Ω) × BV(Ω) and solve P. Proof. The proof follows closely the proof of the corresponding theorem in [BKV13]. Firstly, we compute F ∗1 . For a pair (σ, τ) ∈ H20 (Ω)∗ ×H10 (Ω)∗ we have that F ∗1 (σ, τ) = sup (v,ω)∈H20 (Ω)×H10 (Ω) ‖v‖∞≤β ‖ω‖∞≤α 〈σ, v〉+ 〈τ, ω〉 = β sup v∈H20 (Ω) ‖v‖∞≤1 〈σ, v〉+ α sup ω∈H10 (Ω) ‖ω‖∞≤1 〈τ, ω〉. Using density arguments, it can be checked that F ∗1 (σ, τ) = β sup v∈C∞c (Ω) ‖v‖∞≤1 〈σ, v〉+ α sup ω∈C∞c (Ω) ‖ω‖∞≤1 〈τ, ω〉. (4.12) We also have for every (w, u) ∈ Y ∗ and (v, ω) ∈ X, −〈Λ∗(w, u︸︷︷︸ ∈Y ∗ ) ︸ ︷︷ ︸ ∈X∗ , (v, ω︸︷︷︸ ∈X )〉 = −〈(w, u),Λ(v, ω)〉 = −〈w, v′ + ω〉 − 〈u, ω′〉. (4.13) 113 Exact solutions of the one dimensional TGV regularisation problem Combining (4.12) and (4.13) we have that for every (w, u) ∈ Y ∗ F ∗1 (−(Λ∗(w, u))) = β sup v∈C∞c (Ω) ‖v‖∞≤1 −〈w, v′〉+ α sup ω∈C∞c (Ω) ‖ω‖∞≤1 −〈w,ω〉 − 〈u, ω′〉. (4.14) Since w ∈ H10 (Ω)∗ and u ∈ L2(Ω), they can be regarded as distributions and thus (4.14) can be written as F ∗1 (−(Λ∗(w, u))) = β sup v∈C∞c (Ω) ‖v‖∞≤1 〈Dw, v〉+ α sup ω∈C∞c (Ω) ‖ω‖∞≤1 〈Du− w,ω〉. (4.15) Since we consider the minimisation problem (4.11), both terms in (4.15) are finite. Thus, see Theorem 2.2.3 and the discussion after that, we have that Dw is a Radon measure, w is an L1 function (thus a BV function) and Du is a Radon measure, i.e., u is a BV function as well. Thus, if (w, u) is a pair of minimisers for (4.11) we have that (w, u) ∈ BV(Ω)×BV(Ω) and F ∗1 (−(Λ∗(w, u))) = β‖Dw‖M + α‖Du− w‖M. We now compute F ∗2 . We have F ∗2 (w, u) = sup (φ,ψ)∈Y φ=0 〈w, φ〉+ ˆ Ω uψ dx− ˆ Ω fψ dx− 1 2 ˆ Ω ψ2 dx = sup ψ∈L2(Ω) ˆ Ω (u− f)ψ dx− 1 2 ˆ Ω ψ2dx = ( 1 2 ‖ · ‖2L2(Ω) )∗ (u− f) = 1 2 ˆ Ω (u− f)2dx, and the proof is complete. We are now ready to derive the optimality conditions that link the solutions of the problems P and P ′. We first need the following definition. Definition 4.3.3. Let µ ∈M(Ω). We define the set-valued sign, Sgn(µ) as Sgn(µ) = {v ∈ L∞(Ω) ∩ L∞(Ω, |µ|) : ‖v‖L∞(Ω) ≤ 1, v = sgn(µ), |µ| − a.e.} In other words Sgn(µ) is the set of all functions v that are equal to the Radon– Nikody´m density µ/|µ|, µ–almost everywhere with the additional property that |v(x)| ≤ 1, Lebesgue–almost everywhere. The following lemma was proved in [BKV13]. 114 4.3. Formulation of the problem and optimality conditions Lemma 4.3.4. If µ ∈M(Ω) then Sgn(µ) ∩ C0(Ω) = ∂‖ · ‖M(µ) ∩ C0(Ω). The following proposition characterises the minimisers of the problem P. Proposition 4.3.5 (Optimality conditions for L2–TGV2β,α). A pair (w, u) ∈ BV(Ω) × BV(Ω) is a minimiser for P if and only if there exists a function v ∈ H20 (Ω), such that v′′ = f − u, (Cf ) −v′ ∈ α Sgn(Du− w), (Cα) v ∈ β Sgn(Dw). (Cβ) Proof. Since there is no duality gap, from Theorem 2.5.5 we get that the solutions (v, ω) and (w, u) of P ′ and P respectively (or equivalently (4.8) and (4.9)) are linked through the optimality conditions (v, ω) ∈ ∂F ∗1 (−Λ∗(w, u)), (4.16) Λ(v, ω) ∈ ∂F ∗2 (w, u). (4.17) Conversely, (w, u) is a solution for P if there exists a v ∈ H20 (Ω) ⊆ C0(Ω) such that (4.16)– (4.17) hold with ω = −v′. We have that the first optimality condition (4.16) is equivalent to F ∗1 (−Λ∗(w, u)) + 〈(v, ω), (σ, τ) + Λ∗(w, u)〉 ≤ F ∗1 (σ, τ), ∀(σ, τ) ∈ X∗ ⇔ α‖Du− w‖M + β‖Dw‖M + 〈(v, ω), (σ −Dw, τ − (Du− w))〉 ≤ α‖τ‖M + β‖σ‖M, ∀(σ, τ) ∈ X∗ ⇔ α‖Du− w‖M + β‖Dw‖M + 〈v, σ −Dw〉+ 〈ω, τ − (Du− w)〉 ≤ α‖τ‖M + β‖σ‖M, ∀(σ, τ) ∈ X∗ ⇔ α‖Du− w‖M + 〈ω, τ − (Du− w)〉 ≤ α‖τ‖M, ∀τ ∈ H10 (Ω)∗β‖Dw‖M + 〈v, σ −Dw〉 ≤ β‖σ‖M, ∀σ ∈ H20 (Ω)∗ ⇔ α‖Du− w‖M + 〈ω, ψ − (Du− w)〉 ≤ α‖τ‖M, ∀τ ∈M(Ω)β‖Dw‖M + 〈v, σ −Dw〉 ≤ β‖σ‖M, ∀τ ∈M(Ω) ⇔ ω ∈ α∂‖ · ‖M(Du− w)v ∈ β ∂‖ · ‖M(Dw) (4.18) 115 Exact solutions of the one dimensional TGV regularisation problem Using Lemma 4.3.4, the relations (4.18) are equivalent to ω ∈ α Sgn(Du− w), (4.19) v ∈ β Sgn(Dw). (4.20) Now the second optimality condition (4.17) is equivalent to F ∗2 (w, u) + 〈Λ(v, ω), (wˆ, uˆ)− (w, u)〉 ≤ F ∗2 (wˆ, uˆ), ∀(wˆ, uˆ) ∈ Y ∗ ⇔ 1 2 ˆ Ω (u− f)2dx+ 〈ω + v′, wˆ − w〉+ 〈ω′, uˆ− u〉 ≤ 1 2 ˆ Ω (uˆ− f)2dx, ∀(wˆ, uˆ) ∈ Y ∗ ⇔ 0 + 〈ω + v′, wˆ − w〉 ≤ 0, ∀wˆ ∈ H10 (Ω)∗1 2 ´ Ω(u− f)2dx+ 〈ω′, uˆ− u〉 ≤ 12 ´ Ω(uˆ− f)2dx, ∀uˆ ∈ L2(Ω)∗ ⇔ ω = −v′ω′ ∈ ∂ (12‖ · −f‖2L2(Ω)) (u) ⇔ ω = −v′ω′ = u− f (4.21) Combining (4.21) with (4.19) and (4.20) we deduce (Cf )–(Cβ). We note here that analogous optimality conditions hold for the L1–TGV2β,α problem as well as for the corresponding L1–TV, L2–TV problems, see [BKV13] and [Rin00]. We provide a summary of these in Table 4.1. Optimality conditions – Dimension one L1–αTV L2–αTV L1–TGV2β,α L 2–TGV2β,α u ∈ BV(Ω), v ∈ H10(Ω) u ∈ BV(Ω), v ∈ H10(Ω) u,w ∈ BV(Ω), v ∈ H20(Ω) u,w ∈ BV(Ω), v ∈ H20(Ω) v′ ∈ Sgn(f − u), −v ∈ α Sgn(Du) v′ = f − u, −v ∈ α Sgn(Du) v′′ ∈ Sgn(f − u), −v′ ∈ α Sgn(Du− w), v ∈ β Sgn(Dw) v′′ = f − u, −v′ ∈ α Sgn(Du− w), v ∈ β Sgn(Dw) Table 4.1: Summary of the optimality conditions for the one dimensional TV and TGV denoising problems with L1 and L2 fidelity terms. 116 4.4. Properties of solutions 4.4 Properties of solutions 4.4.1 Basic properties Before we proceed to our computations of exact solutions for simple data functions, we firstly show some properties of the solutions of the one dimensional L2–TGV2β,α regular- isation. The following two propositions were proved in [BKV13] for the one dimensional L1–TGV2β,α model and they hold in the L 2 fidelity case as well. Their proofs are minor adjustments of the corresponding proofs for the L1 fidelity case and we omit them. Recall also the definitions of the good representatives uup and ulow of a function u ∈ BV(Ω), defined after Theorem 2.3.12. Proposition 4.4.1. Let f ∈ BV(Ω) and suppose that u,w ∈ BV(Ω) solve P. Suppose that uup < flow on an open interval I ⊆ Ω. Then the following hold: (i) (Du− w)bI = 0, that is to say u′ = w on I and |Dsu|(I) = 0. (ii) w′ = 0 on I and 0 ≤ −DwbI  δx for some x ∈ I. (iii) The function w = u′ is non-increasing on I. Similarly, if ulow > f up on I then we have (i) (Du− w)bI = 0, that is to say u′ = w on I and |Dsu|(I) = 0. (ii) w′ = 0 on I and 0 ≤ DwbI  δx for some x ∈ I. (iii) The function w = u′ is non-decreasing on I. The following proposition states that the jump set of the solution u is contained in the jump set of the solution data f . Proposition 4.4.2. Let f ∈ BV(Ω) and let Gf := {(x, t) ∈ Ω× R : x ∈ Jf , t ∈ [flow(x), fup(x)]} . If u solves the L2–TGV2β,α minimisation problem with data function f , then Gu ⊆ Gf . (4.22) In particular Ju ⊆ Jf . Let us note that in higher dimensions (Ω ⊆ Rd), an analogue inclusion holds (up to an Hd−1–null set) for the jump sets of the data and the solution of the L2–TV problem, see [CCN07] and also [Jal12, Jal14]. The corresponding problem for the L2–TGV2β,α model in higher dimensions still remains open, however, significant advances have been made recently by Valkonen [Val14a, Val14b]. The following proposition states that at least away from the boundary of Ω, the solution u can be bounded pointwise by the data f . 117 Exact solutions of the one dimensional TGV regularisation problem Proposition 4.4.3 (Behaviour away from the boundary). Let u be the solution to the problem P with data f and let (c, d) ⊆ {x ∈ (a, b) : fup(x) < ulow(x)} be a maximal interval with that property and a < c < d < b. Then ulow(c) ∈ [flow(c), fup(c)], (4.23) uup(d) ∈ [flow(d), fup(d)], (4.24) and inf x∈(c,d) fup(x) ≤ min x∈[c,d] ulow(x) ≤ max x∈[c,d] uup(x) ≤ max{fup(c), fup(d)}. (4.25) Analogue results hold for the uup < flow case. Proof. Let us note first that the set {x ∈ (a, b) : fup(x) < ulow(x)} is open, see [BKV13]. We prove here (4.23) while (4.24) can be shown similarly. Notice that in the case where ulow(c) < u up(c), i.e., c ∈ Ju, (4.23) follows directly from (4.22), since in that case we have [ulow(c), u up(c)] ⊆ [flow(c), fup(c)]. Thus, we suppose that ulow(c) = u up(c). We have two cases now, either flow(c) = f up(c) or flow(c) < f up(c). Consider the first case, flow(c) = f up(c), and suppose towards contradiction that fup(c) < ulow(c). In that case we claim that there exists a δ > 0 such that for every x ∈ (c − δ, c] we have fup(x) < ulow(x) which is a contradiction by the maximality of (c, d). Indeed, if the claim is not true, there exists a sequence (xn)n∈N with xn < c for every n ∈ N such that xn → c and ulow(xn) ≤ fup(xn) for every n ∈ N. Since both ulow and fup are continuous at c that would mean that ulow(c) ≤ fup(c). If fup(c) > ulow(c) we arrive again in contradiction since from the continuity of fup and ulow at c there must be points in (c, d) where fup > ulow which contradicts our assumption. We work similarly for the case fup(c) < ulow(c). In order to prove (4.25), notice first that Proposition 4.4.1 implies that u is continuous and convex in (c, d), thus uup = ulow there. From the convexity of u we have for every 0 ≤ λ ≤ 1 uup(λc+ (1− λ)d) ≤ λuup(c) + (1− λ)uup(d) ≤ λmax{fup(c), fup(d)}+ (1− λ) max{fup(c), fup(d)} = max{fup(c), fup(d)}. (4.26) Finally, we have that fup(x) < ulow(x) for all x ∈ (c, d). From the existence of the side limits we have that ulow can be extended continuously to [c, d] and thus inf x∈(c,d) fup(x) ≤ min x∈[c,d] ulow(x). (4.27) 118 4.4. Properties of solutions Combining (4.26) and (4.27) we get (4.25). We note that the above pointwise estimates hold only away from the boundary of Ω. Indeed as we will see in the next sections, there are examples where u < inf f and u > sup f near the boundary (in the sense of good representatives). This is in fact in contrast to TV regularisation, where one can show that inf f < ulow < u up < sup f in the whole domain Ω. In the following proposition we investigate further the structure of the solution u close to the boundary of Ω. Proposition 4.4.4 (Behaviour near the boundary). The following two statements hold: (i) If (a, c) is a maximal interval subset of {x ∈ (a, b) : uup < flow}, then uup is an affine function there. (ii) Suppose that u = f a.e. in a maximal set of a form (a, c) and suppose that uup < flow in a set of the form (c, d). Then u is affine in (a, d). Analogue results hold for the case ulow > f up and also near b. Proof. (i) We know from from Proposition 4.4.1 that u is continuous and concave in (a, c). We claim that Dw = 0, i.e., w is a constant function there, and thus from the fact that u′ = w we get that u is affine. Indeed, we have that v is strictly convex on (a, c) with v(a) = v′(a+) = 0. This means that v is strictly increasing in (a, c). If Dwb(a, c) = −δx for some x ∈ (a, c) we would get from (Cβ) that v(x) = −β and v(x′) < −β for every a < x′ < x which is a contradition. (ii) Because of the fact that u = f a.e. in (a, c) we have from (Cf ) that v is an affine function there and since v ∈ H20 (a, b) we have that v = 0 in (a, c). Condition (Cβ) forces w to be constant and condition (Cα) forces Du = w and thus u is an affine function in (a, c) say with derivative equal to λ1. Since u up < flow in (c, d) we have that u is also an affine function there, say with derivative equal to λ2. Moreover, u must be continuous in c because otherwise from condition (Cα) we would have that |v′(c)| = α and then v′ would be discontinuous at c. Also, we have λ1 = λ2. This is because of the fact that wb(a, c) = λ1, wb(c, d) = λ2 and if λ1 6= λ2, from condition (Cβ) we would have that |v(c)| = β and thus v would be discontinuous at c. Note finally, that u cannot have a gradient change in (c, d) because then v would have a local minimum in (c, d) which is again impossible. The second part of Proposition 4.4.4 tells us that the case u = f near the boundary can possibly happen only if f is affine there. In the next proposition we point out that the derivative of u cannot change across a jump point. Proposition 4.4.5 (Behaviour at a jump point). Let u be a solution to the problem P with data f . Suppose that u has a jump discontinuity at a point x0 and that there exists 119 Exact solutions of the one dimensional TGV regularisation problem an  > 0 such that uup < flow or ulow > f up in (x0− , x0 + ) where also f is continuous. Then there exists 0 < δ <  such that u′ is constant on (x0− δ, x0 + δ), i.e., the derivative of u does not jump on x0. Proof. Since u has a jump discontinuity at x0, from condition (Cα) we have that |v′(x0)| = α. (4.28) Since uup < flow or ulow > f up in (x0 − , x0 + ) we have from Proposition 4.4.1 that Du = w in (x0 − , x0) and also in (x0, x0 + ). Moreover, there exists δ ≤  such that Du is constant in each one of the intervals (x0−δ, x0) and (x0, x0 +δ). Thus if Du has a jump discontinuity then the same is true for w and thus from condition (Cβ) we have that |v(x0)| = β. (4.29) However its is easy to check that (4.28) and (4.29) cannot hold simultaneously, since v has an extremum in x0 and hence its derivative must be zero there. Thus we have a contradiction. In Figure 4.2 we provide an illustration about how the possible solutions can look like. For example, in Figure 4.2(a) solution of the type u1 cannot occur as according to Proposition 4.4.1 u must be piecewise affine in (c, d). Similarly, u3 violates the fact that u′ must be non-increasing in (c, d) and also contradicts Proposition 4.4.5 since its derivative changes at the jump point d. Solution u5 comes in contrast to Proposition 4.4.2, as it introduces a new jump. In Figure 4.2(b), solution u1 cannot occur since f is not affine near the boundary, see Proposition 4.4.4, while u3 violates the (i) part of the same proposition. Finally, in Figures 4.2(c) and 4.2(d) we describe two examples where the solution can possibly coincide with the data near the boundary. Note that the solution u1 in Figure 4.2(a) cannot occur as again the (ii) part of Proposition 4.4.4 is violated. 4.4.2 L2–linear regressions We continue with some results concerning the L2–linear regression of the data f . In [BKV13], it was proved that in the L1 fidelity case, there exist some thresholds α∗, β∗ > 0 such that if α > α∗, β > β∗ then the solution u of the one dimensional L1–TGV2β,α problem is the L1–linear regression f of f , where f ∈ argmin v affine ‖f − v‖L1(Ω). There, the values of α∗ and β∗ are independent of the function f and depend only on the domain Ω. Here we show that these thresholds exist for the L2 fidelity case as well but they depend on the data f . For a function f ∈ BV(Ω), we define f? to be the L2–linear 120 4.4. Properties of solutions u2 u3 u4 u5 c d f u1 (a) Away from the boundary a u1 u2 u3 f (b) Near the boundary a b u1 f u2 (c) Solution u equal to the data f near the boundary a b f u (d) Solution u equal to the data f near the boundary Figure 4.2: Possible (blue) and not possible (red) solutions of the L2–TGV2β,α problem as these are dictated by Propositions 4.4.1–4.4.5. These are not actual computed solutions but they illustrate their qualitative properties. regression of f , i.e., f? := argmin v affine ‖f − v‖2L2(Ω). We derive the following result: Proposition 4.4.6 (Thresholds for L2–linear regression). Let u be the solution to P with data function f . Then u is an affine function if and only if u = f?. Moreover, if α ≥ b− a 2 ‖f − f?‖∞, (4.30) β ≥ (b− a) 2 4 ‖f − f?‖∞, (4.31) then u is equal to f?. Proof. We firstly show that if u is an affine function and solves P, then u = f?. Since u is affine we have that D2u = 0, thus the optimum w in P is w = Du. Then, we obviously have u = argmin u˜ affine 1 2 ‖f − u˜‖2L2(Ω), 121 Exact solutions of the one dimensional TGV regularisation problem i.e., u = f?. For the second part, since Df? −w = 0 and Dw = 0, from (Cf )–(Cβ) we get that in order for the solution to be equal to f? it suffices to find a function v ∈ H20 (Ω), such that v′′ = f − f?, ‖v‖∞ ≤ β and ‖v′‖∞ ≤ α. (4.32) It is easy to see that for every function v ∈ W 1,∞(Ω) with v(a) = v(b) = 0 (recall that v is necessarily continuous) we have the estimate ‖v‖∞ ≤ b− a 2 ‖v′‖∞. Hence, a function v ∈ H20 (Ω) that satisfies v′′ = f − f?, it satisfies ‖v′‖∞ ≤ b− a 2 ‖f − f?‖∞, (4.33) ‖v‖∞ ≤ (b− a) 2 4 ‖f − f?‖∞. (4.34) Thus, in order to have this kind of solution, it suffices to choose α and β as in (4.30)–(4.31) and to find a function v ∈ H20 (Ω) that satisfies v′′ = f − f?. It can be easily checked that the function v(x) = ˆ x a ˆ t a (f − f∗) dsdt, satisfies these conditions, baring also in mind that the conditions ˆ b a (f − f?)′ dx = 0 and ˆ b a x(f − f?) dx = 0, hold for the L2–linear regression. We note that in Sections 4.6.1 and 4.6.3, we provide some sharper L2–linear regression thresholds for the data functions for which we are computing exact solutions. The next proposition states that the solution u to the L2–TGV2β,α problem, has the same L 2–linear regression with the data f . Proposition 4.4.7. Let uf,β,α be the solution to L 2–TGV2β,α minimisation problem with data f . Then f? = u?f,β,α, i.e., argmin v affine ‖f − v‖2L2(Ω) = argmin v affine ‖uf,β,α − v‖2L2(Ω). Proof. Since the L2–linear regression is a geometric notion, we can assume without loss of generality that Ω is an interval of the form (0, b). From (Cf ), we have that v ′′ = f −uf,β,α with v ∈ H20 (Ω). Thus, by integrating by parts, we have that ´ Ω v ′′ dx = 0 and ´ Ω xv ′′ dx = 122 4.4. Properties of solutions 0 which implies ˆ Ω f dx = ˆ Ω uf,β,α dx and ˆ Ω xf dx = ˆ Ω xuf,β,α dx. Then the proof is finished by simply observing that for a function g we have argmin v affine ‖g − v‖2L2(Ω) = argmin v=λx+µ ‖g − λx− µ‖2L2(Ω) = argmin v=λx+µ 1 2 λ2b2 + λb3 + bµ2 − 2λ ˆ Ω xg dx− 2µ ˆ Ω g dx, which means that g? depends only on the quantities ´ Ω g dx and ´ Ω xg dx. 4.4.3 Even and odd functions Before we proceed with the computation of exact solutions, we point out some facts concerning odd and even data functions. In particular, at the end of this section we prove that in some symmetric cases, TV and TGV regularisations coincide, given that α and β satisfy a very simple condition. It is convenient for this section to assume that Ω is an interval of the type (−L,L). We first prove the following two propositions, which essentially state that the symmetry of the data f is inherited to the solutions u and v of the problems P and P ′ respectively. Proposition 4.4.8. Let f ∈ BV(−L,L) be an odd (even) function. Then the solution v ∈ H20 (−L,L) to the problem P ′ is also odd (even). Proof. We prove the odd case, the even case is proved analogously. Suppose that f(x) = −f(−x) a.e.. We will show that v(x) = −v(−x). Let fˆ(x) = −f(−x) and vˆ to be the solution to P ′ with data function fˆ and set v˜(x) = −v(−x). Obviously ‖v‖∞ ≤ β, ‖v′‖∞ ≤ α ⇐⇒ ‖v˜‖∞ ≤ β, ‖v˜′‖∞ ≤ α. Since v˜(x) = −v(−x), we have v˜′′(x) = −v′′(−x) a.e. and thus ˆ L −L fˆ(x)v˜′′(x)dx− 1 2 ˆ L −L (v˜′′(x))2dx ≤ ˆ L −L fˆ(x)vˆ′′(x)dx− 1 2 ˆ L −L (vˆ′′(x))2dx =⇒ ˆ L −L f(x)v′′(x)dx− 1 2 ˆ L −L v′′(x)2dx ≤ ˆ L −L f(x)(−vˆ′′(−x))dx− 1 2 ˆ L −L ((−vˆ(−x))′′)2dx, thus v(x) = −vˆ(−x) but since fˆ = f we have that vˆ = v. Proposition 4.4.9. Let f ∈ BV(−L,L) be an odd (even) function. Then the solution u to the problem P is also odd (even). 123 Exact solutions of the one dimensional TGV regularisation problem Proof. Again we only prove the odd case as the even case can be proved in a similar fashion. Set fˆ(x) = −f(−x), uˆ the solution of P with data function fˆ and set u˜(x) = −u(−x). One can easily notice from the rotational invariance of TGV that TGV2β,α(u) = TGV 2 β.α(u˜). We have 1 2 ˆ L −L (uˆ(x)− fˆ(x))2dx+ TGV2β,α(uˆ) ≤ 1 2 ˆ L −L (u˜(x)− fˆ(x))2dx+ TGV2β,α(u˜) =⇒ 1 2 ˆ L −L (−uˆ(−x)− f(x))2dx+ TGV2β,α(−uˆ(−·)) ≤ 1 2 ˆ L −L (u(x)− f(x))2dx+ TGV2β,α(u). Thus we have u(x) = −uˆ(−x). Since fˆ = f we have uˆ = u and thus u(x) = −u(−x). We are going to prove that at least for even data functions f , if the ratio β/α of the TGV2β,α parameters is large enough, then TGV regularisation is equivalent to TV regularisation with weight α. We firstly need the following lemma concerning the total variation of even functions. It states that the total variation of an even function cannot be decreased by adding an affine function to it. Lemma 4.4.10. Let u ∈ BV(−L,L) be an even function. Then for every c ∈ R we have ‖Du‖M ≤ ‖Du+ c‖M. (4.35) Proof. We can assume without loss of generality that c > 0. In order to show (4.35), it suffices to show that ˆ (−L,L) |u′| dx ≤ ˆ (−L,L) |u′ + c| dx. (4.36) Since u is even we have that u′ is odd and thus up to null sets we have {u′ > 0} ∩ (−L, 0) = −{u′ < 0} ∩ (0, L), {u′ < 0} ∩ (−L, 0) = −{u′ > 0} ∩ (0, L). Then we estimate ˆ (−L,L) |u′| dx = ˆ {u′>0}∩(−L,0) u′ dx − ˆ {u′<0}∩(0,L) u′ dx + ˆ {u′>0}∩(0,L) u′ dx − ˆ {u′<0}∩(−L,0) u′ dx = ˆ {u′>0}∩(−L,0) c+ u′ dx − ˆ {u′<0}∩(0,L) c+ u′ dx + ˆ {u′>0}∩(0,L) c+ u′ dx − ˆ {u′<0}∩(−L,0) c+ u′ dx ≤ ˆ {u′>0}∩(−L,0) c+ u′ dx + ˆ {u′<0}∩(0,L) |c+ u′| dx + ˆ {u′>0}∩(0,L) c+ u′ dx + ˆ {u′<0}∩(−L,0) |c+ u′| dx 124 4.4. Properties of solutions + ˆ {u′=0}∩(−L,L) |c| dx = ˆ (−L,L) |u′ + c| dx. Theorem 4.4.11. Let f ∈ BV(−L,L) be an even function. If β α ≥ L, (4.37) then the solution of the L2–TGV2β,α regularisation problem is the same with the solution of the following TV regularisation problem: min u∈BV(−L,L) 1 2 ˆ L L (u− f)2dx+ αTV(u). Proof. Recall that for every u ∈W 1,1(−L,L) the following Poincare´ inequality holds, ‖u− uΩ‖L1(−L,L) ≤ L‖∇u‖L1(−L,L), i.e., the Poincare´ constant is equal to L, see for example [AFP00]. Using density arguments, we can prove that the same inequality holds for the space BV(−L,L) with the same constant, i.e., for every u ∈ BV(−L,L) ‖u− uΩ‖L1(−L,L) ≤ L‖Du‖M. (4.38) Since f is even, from Proposition 4.4.9 we have that the solution of min u∈BV(−L,L) 1 2 ˆ Ω (u− f)2dx+ TGV2β,α(u), (4.39) is an even function as well. Thus, the problem (4.39) is equivalent to min u∈BV(−L,L) u even 1 2 ˆ L −L (u− f)2dx+ TGV2β,α(u). (4.40) Similarly we can prove that the outcome of TV regularisation for even data, is even as well. Thus, it suffices to prove that for an even function u, we have TGV2β,α(u) = αTV(u) := α‖Du‖M, provided (4.37) holds. We calculate successively: TGV2β,α(u) = min w∈BV(Ω) α‖Du− w‖M + β‖Dw‖M 125 Exact solutions of the one dimensional TGV regularisation problem ≤ α‖Du‖M (choosing w = 0), ≤ α‖Du− wΩ‖M ∀w ∈ BV(Ω), (from Lemma 4.4.10), ≤ α‖Du− w‖M + α‖w − wΩ‖L1(Ω) ≤ α‖Du− w‖M + βαL β ‖Dw‖M (Poincare´ inequality), ≤ max { 1, aL β } (α‖Du− w‖M + β‖Dw‖M) = α‖Du− w‖M + β‖Dw‖M (since (4.37) holds). Thus, we have proved above that α‖Du‖M ≤ α‖Du− w‖M + β‖Dw‖M ∀w ∈ BV(Ω) which implies that TGV2β,α(u) ≤ α‖Du‖M ≤ min w∈BV(Ω) α‖Du− w‖M + β‖Dw‖M = TGV2β,α(u), and the proof is complete. Let us emphasise the fact that the condition (4.37) is independent of the function f . Moreover, as we will see in Section 4.6.1 there exist simple examples of odd data functions (function fp.c.) where the TGV and TV regularisations do not coincide for any choice of α and β. In Figure 4.3, we verify numerically the above result. Both TGV and TV minimisation problems are solved with the Chambolle–Pock primal dual method [CP11] as it is described in [Bre14]. 4.5 A note on the relationship of TV and TGV in dimension two In this section we want to point out that a version of Theorem 4.4.11 holds in dimension two as well, i.e., if the ratio β/α is large enough and the data function f is symmetric enough then TV and TGV regularisations coincide. To do so, we first recall some results from the theory of functions of bounded deformation, see [TS80, Tem85]. We state them in dimension two but higher dimensional analogues also hold. Proposition 4.5.1. Let Ω ⊆ R2 be an open, bounded set with Lipschitz boundary. Then 126 4.5. A note on the relationship of TV and TGV in dimension two 0 50 100 150 200 250 300 350 400 450 500 −2 −1 0 1 2 3 4 f TGV: α = 10, β = 200 TGV: α = 10, β = 3000 TV : α = 10 Figure 4.3: Numerical verification of Theorem 4.4.11. Using the even function f shown above as a data function and setting β/α ≥ L, the solutions of the TGV2β,α and αTV regularisation problems coincide. If we decrease the ratio β/α violating condition (4.37), the two regularisations do not coincide any more with the TGV solution (red) being different than the TV one (blue). for every function w ∈ BD(Ω) there exists an element rw ∈ KerE such that ‖w − rw‖L1(Ω,R2) ≤ C‖Ew‖M, (4.41) where the constant C depends only on the domain Ω. Note moreover that KerE = {r ∈ L1(Ω,R2) : r(x) = Ax+B, B ∈ R2, A ∈ R2×2 is a skew symmetric matrix}. Recall that in dimension one the variable w is a solution to an L1–TV problem. In dimension two, w is a solution to the following L1–E problem: w ∈ argmin w˜∈BD(Ω) ˆ Ω |Dαu− w˜| dx+ β α ‖Ew˜‖M. (4.42) We are thus turning our attention to the more generic L1–E problem min w∈BD(Ω) ‖f − w‖L1(Ω,R2) + λ‖Ew‖M, f ∈ L1(Ω,R2), λ > 0. (4.43) Notice that (4.43) does not have necessarily unique solutions as it is not strictly convex. The next theorem states that that if the parameter λ is large enough then a solution w of (4.43) belongs to KerE . This is analogous to the L1–TV problem, where for large enough value of the parameter λ we have that the solution is constant, i.e., it belongs to the kernel 127 Exact solutions of the one dimensional TGV regularisation problem of TV. Proposition 4.5.2. Let Ω ⊆ R2 be an open, bounded set with Lipschitz boundary and let f ∈ L1(Ω,R2). Then there exists a λ∗ > 0 such that for every λ > λ∗, if wλ is a solution of (4.43) with parameter λ then wλ ∈ argmin w∈KerE ‖f − w‖L1(Ω,R2). Proof. Since wλ is a solution of (4.43) we have that ‖f − wλ‖L1(Ω,R2) + λ‖Ewλ‖M ≤ ‖f‖L1(Ω,R2). (4.44) From inequality (4.41) we get ‖f − wλ‖L1(Ω,R2) + λ C ‖wλ − rwλ‖L1(Ω,R2) ≤ ‖f‖L1(Ω,R2). (4.45) It is now easy to see that Wλ := wλ − rwλ solves the following problem: min w∈BD(Ω) ‖(f − rwλ)− w‖L1(Ω,R2) + λ‖Ew‖M. (4.46) Indeed, one has to check that ‖(f − rwλ)−Wλ‖L1(Ω,R2) + λ‖EWλ‖M ≤ ‖(f − rwλ)− w‖L1(Ω,R2) + λ‖Ew‖M, ∀w ∈ BD(Ω) ⇐⇒ ‖(f − wλ‖L1(Ω,R2) + λ‖Ewλ‖M ≤ ‖f − (w + rwλ)‖L1(Ω,R2) + λ‖E(w + rwλ)‖M, ∀w ∈ BD(Ω) ⇐⇒ ‖(f − wλ‖L1(Ω,R2) + λ‖Ewλ‖M ≤ ‖f − w‖L1(Ω,R2) + λ‖Ew‖M, ∀w ∈ BD(Ω), with the last inequality being true since wλ is a solution of (4.43). Setting Fλ := f − rwλ , using the fact that Wλ solves (4.46) and the inequality ‖Wλ‖L1(Ω,R2) ≤ C‖EWλ‖M we get ‖Fλ −Wλ‖L1(Ω,R2) + λ C ‖Wλ‖L1(Ω,R2) ≤ ‖Fλ‖L1(Ω,R2). (4.47) A simple application of triangle inequality in (4.47) yields that if λ > C, then we must have Wλ = 0, i.e., wλ = rwλ . We thus set λ ∗ = C and it is obvious that if λ > λ∗ then wλ ∈ argmin w∈KerE ‖f − w‖L1(Ω,R2). We are now ready to prove that under some symmetry assumptions for Ω and f and for a large enough ratio β/α, TV and TGV regularisations coincide in dimension two as 128 4.5. A note on the relationship of TV and TGV in dimension two well. For simplicity we will assume that Ω is a square but other symmetric domains can be considered as well. Theorem 4.5.3. Suppose that Ω ⊆ R2 is a bounded square, centred at the origin. Let f ∈ BV(Ω) satisfy the following symmetry properties (i) f is symmetric with respect to both x and y axis, i.e., f(x1, x2) = f(−x1, x2), f(x1, x2) = f(x1,−x2), for a.e. (x1, x2) ∈ Ω. (4.48) (ii) f is invariant under pi/2 rotations, i.e., f(Opi/2x) = f(x), where Opi/2 denotes counterclockwise rotation by pi/2 degrees. (4.49) Then if β/α > C, where C is the constant appearing in (4.41), the solution of the L2– TGV2β,α regularisation problem with data f is the same with the solution of the following TV regularisation problem: min u∈BV(Ω) 1 2 ˆ Ω (u− f)2dx+ αTV(u). Proof. Recall that the L2–TGV2β,α reads min u∈BV(Ω) w∈BD(Ω) 1 2 ˆ Ω (u− f)2dx+ α‖Du− w‖M + β‖Ew‖M, (4.50) where as we have mentioned the variable w satisfies w ∈ argmin w∈BD(Ω) ˆ Ω |Dαu− w| dx+ β α ‖Ew‖M. Since β/α > C is satisfied from Proposition 4.5.2 we have that w is of the form w(x) = Ax+B for some B ∈ R2 and a skew symmetric matrix A. Since f satisfies the symmetry properties (4.48)–(4.49) from the rotational invariance of TGV we have the same holds for the solution u. That means that Dαu has the following properties Dαu(x) = −Dαu(−x), Dα1 u(x1, x2) = Dα1 u(x1,−x2), Dα2 u(x1, x2) = Dα2 u(−x1, x2) (4.51) D1u(Opi/2x) = D2u(x), (4.52) for almost every x = (x1, x2) ∈ Ω. Since w satisfies w ∈ argmin w˜∈KerE ‖Dαu− w˜‖L1(Ω,R2), and w is of the form w(x) = Ax+B it is easy to check (see Lemma A.2.3 in Appendix A) 129 Exact solutions of the one dimensional TGV regularisation problem that w is zero. In Figure 4.5 we provide two numerical examples that verify Theorem 4.5.3. There, we apply TV and TGV denoising to a characteristic function of a circle centred both at the origin, Figure 4.5.3(a), and away from it, Figure 4.5.3(f). Note that the symmetry properties (4.48)–(4.49) are satisfied for the first case. There, we observe that by choosing the ratio β/α large enough TGV2β,α and αTV regularisations produce the same results, Figures 4.5.3(b) and 4.5.3(c). However, TV and TGV do not coincide for small ratio β/α, Figure 4.5.3(d). Finally note that when the symmetry is broken, Figure 4.5.3(f), the TV and TGV solutions do not coincide even for large ratio β/α, Figures 4.5.3(g) and 4.5.3(h). 4.6 Computation of exact solutions In this section we compute exact solutions for the one dimensional L2–TGV2β,α regulari- sation problem for several data functions f . In particular we calculate exact solutions for simple piecewise constant, piecewise affine and hat functions. We focus on the structure of the solutions and their relationship with the parameters α and β. 4.6.1 Piecewise constant function with a single jump For convenience in the calculations, for this sections, Ω will be an interval of the form (0, 2L). We define fp.c. to be a piecewise constant function with a single jump, i.e., fp.c.(x) = 0 if x ∈ (0, L),h if x ∈ [L, 2L), (4.53) for some h > 0, see Figure 4.4. h L L Figure 4.4: Piecewise constant function fp.c. with a jump discontinuity at x = L. In order to compute exact solutions for fp.c. our strategy is as follows: Firstly, we investigate what are the possible types of solutions, describing then in a qualitative way and then we calculate them explicitly. 130 4.6. Computation of exact solutions (a) Original image, 200× 200 pixels (b) TV denoising, α = 10 (c) TGV denoising, α = 10, β = 106 (d) TGV denoising, α = 10, β = 200 0 50 100 150 200 250 0 0.2 0.4 0.6 0.8 1 f TGV: α = 10, β = 106 TGV: α = 10, β = 200 TV : α = 10 (e) Corresponding middle row slices (f) Original image, 200× 200 pixels (g) TV denoising, α = 10 (h) TGV denoising, α = 10, β = 106 0 50 100 150 200 250 0 0.2 0.4 0.6 0.8 1 f TGV: α = 10, β = 106 TV : α = 10 (i) Corresponding middle row slices Figure 4.5: Illustration of the two dimensional TV–TGV equivalence for symmetric data when β/α is large enough. Notice that the equivalence does not hold once the symmetry is broken, Figures (f)–(i). The red lines to the original images indicate the slices in Figures (d) and (f). 131 Exact solutions of the one dimensional TGV regularisation problem Since after a simple translation fp.c. is an odd function, from Proposition 4.4.9 we have that the solution u is symmetric with respect to the point (L, h/2). That is to say u(x) = −u(2L− x) + h, ∀x ∈ (L, 2L). Thus, we only need to describe the solution in (0, L). The following theorem states which kinds of solutions are allowed to occur. Theorem 4.6.1. Let u be the solution to the L2–TGV2β,α regularisation problem with data fp.c.. Then u can only be of the following form on (0, L): • There exists 0 < x1 < L such that u is strictly negative and affine in (0, x1) with u(x1) = 0. • There exists potentially 0 < x1 ≤ x˜1 < L such that u = 0 in [x1, x˜1]. • u > 0 in (x˜1, L) consisting of at most two affine parts of increasing gradient in (x˜1, x2) and (x2, L) respectively with x˜1 < x2 < L. Moreover in the case x˜1 = x1, u has the same gradient in (0, x1) and (x1, x2). Note: In fact, as we will see later, x˜1 must be equal to x1, i.e., the allowed solutions are strictly negative in (0, x1) and strictly positive in (x1, L). However this is not obvious by simply interpreting the optimality conditions (Cf ), (Cα) and (Cβ) (which is how Theorem 4.6.1 is proved) but it is a result of subsequent calculations, see Proposition 4.6.2 and Propositions A.1.1–A.1.3 in Appendix A. Proof of Theorem 4.6.1. Let us note first that from Propositions 4.4.2 and 4.4.3, we have that u is continuous in (0, L) with u(L−) ≥ 0. Also, from Proposition 4.4.4 we have that if u < 0 on a set of the form (0, x1) then u must be affine there. Moreover, Proposition 4.4.1 implies that there is no interval (c, d) ⊆ (0, L) such that u < 0 (u > 0) in (c, d) with u(c+) = u(d−) = 0. Indeed, for the u < 0 case, there would be a point x ∈ (c, d) such that u is affine on (c, x) and (x, d) with strictly negative and strictly positive gradient respectively on these intervals. But then u′ would be increasing in an interval where uup < (fp.c.)low, which is a contradiction. We work similarly for the u > 0 case. We conclude that u can be strictly negative only in an interval of the type (0, c), 0 ≤ c ≤ L and strictly positive only in an interval of the type (d, L), 0 ≤ d ≤ L, in the latter possibly consisting of two affine parts of increasing gradient. In particular, u is increasing in (0, L). The proof will be complete if we prove that the following situations cannot happen: (i) u > 0 in (0, L). (ii) u = 0 in (0, x1), 0 < x1 < L and u > 0 in (x1, L). (iii) u = 0 in (0, L). (iv) u < 0 in (0, x1), 0 < x1 < L and u = 0 in (x1, L). 132 4.6. Computation of exact solutions (v) u < 0 in (0, L). For (i) suppose that u > 0 in (0, L). By symmetry we have that u < h in (L, 2L). This means from (Cf ) that v is strictly convex in (0, L), strictly concave in (L, 2L) with v(0+) = v ′(0+) = v(2L−) = v′(2l−) = 0 something that cannon happen. For (ii) suppose that there exists a point 0 < x1 < L such that u = 0 in (0, x1) and u > 0 in (x1, L). Then from condition (Cf ) v will be an affine function in (0, x1) but since v ∈ H20 (0, 2L), we have v(0+) = v′(0+) = 0 and thus v = 0 in (0, x1). Moreover Du = 0 in (0, x1) forcing w = 0 a.e. there as well, because otherwise from condition (Cα) we would have that |v′| = α on a set of positive measure in (0, x1). Now, since u > 0 in (x1, L) (assume without loss of generality that u is affine there) from Proposition 4.4.1 we have that w = Du = c > 0 there, something that makes w to have jump discontinuity at x1 and thus from (Cβ) we get that v(x1) = β. However, this contradicts to the continuity of v. For (iii) suppose that u = 0 in (0, L). Then by symmetry we have that u = h in (L, 2L). Thus, from condition (Cf ) again, we have that v will be affine in those intervals and from its boundary conditions and continuity we get that v = 0 in (0, L). But since u has a jump discontinuity at x = L, according to condition (Cα), we must have v ′(L) = −α, a contradiction. For (iv) suppose that there exists a point 0 < x1 < L such that u < 0 in (0, x1) and u = 0 in (x1, L). Then, from (Cf ) v will be strictly convex and thus have a strictly increasing derivative in (0, x1). Since v ′(0+) = 0 we have that v′(x1) > 0. Moreover again from (Cf ), v will be affine in (x1, L) something that forces v(L) > 0. However, since after a simple translation fp.c. is an odd function and according to Proposition 4.4.8 the same is true for the continuous function v, we must have v(L) = 0. The case (v) is excluded in a similar way. Finally, for the last statement of the proposition, suppose that u < 0 in (0, x1), u > 0 in (x1, L) and suppose towards contradiction that the derivative of u has a jump discontinuity at x1. Since Du = w in (0, x1) and (x1, L) this means that w will have a jump discontinuity at x1 and condition (Cβ) imposes that |v(x1)| = β. The case v(x1) = −β is impossible since v is strictly increasing in (0, x1), and also if v(x1) = β from the fact that v ′(x1) > 0 and v′ is continuous at x1, we have that v is increasing in (x1, x1 + δ) for some δ > 0, thus v(x1 + δ) > β, contradicting (Cβ). Notice how we take advantage of the symmetries of u and v in order to prove Theorem 4.6.1. It remains to rule out the possibility that x1 < x˜1. In particular, we have to show 133 Exact solutions of the one dimensional TGV regularisation problem that the four following situations cannot occur 1: (i) u < 0 in (0, x1), u = 0 in (x1, x˜1), u > 0 with one affine part in (N-0-P1-J) (x˜1, L), jump discontinuity at x = L, (ii) u < 0 in (0, x1), u = 0 in (x1, x˜1), u > 0 with two affine parts in (N-0-P2-J) (x˜1, L), jump discontinuity at x = L, (iii) u < 0 in (0, x1), u = 0 in (x1, x˜1), u > 0 with one affine part in (N-0-P1-C) (x˜1, L), continuous at x = L, (iv) u < 0 in (0, x1), u = 0 in (x1, x˜1), u > 0 with two affine parts in (N-0-P2-C) (x˜1, L), continuous at x = L, where 0 < x1 < x˜1 < L, see also Figure 4.6. f u (a) N-0-P1-J f u (b) N-0-P2-J f u (c) N-0-P1-C f u (d) N-0-P2-C Figure 4.6: Not allowed type of solutions for the data function fp.c.. In the following we show that N-0-P1-J cannot occur. The exclusions of the cases N-0- P2-J, N-0-P1-C, N-0-P2-C are done in a similar way and they can be found in Appendix A. Proposition 4.6.2. The case N-0-P1-J cannot occur. Proof. Suppose that u < 0 in (0, x1), u = 0 in (x1, x˜1) and u > 0 and affine in (x˜1, L) with 0 < x1 < x˜1 < L. We claim that in that case, u ′ is constant, say equal to c > 0, in (0, x1) ∪ (x˜1, L), w = c in (0, L) and v is affine in (x1, x˜1) with gradient equal to α there. Indeed, let c be the value of the derivative of u in (0, x1). We know already from Proposition 4.4.1 that w = u′ there. Now, if w 6= c on a set of positive measure in (x1, x˜1), that will mean that Dw 6= 0 somewhere in (x1, x˜1) and thus from (Cβ), v must be ±β somewhere in (x1, x˜1) and also |v| ≤ β there. But this is not possible since v is strictly convex in (0, x1) with v ′(0+) = 0 and it is affine in (x1, x˜1) and thus v is strictly increasing in (0, x˜1). Moreover, w = v ′ = c in (x˜1, L) because otherwise w would have a jump discontinuity at x˜1, something that forces v(x˜1) = β but this cannot happen since 1N: negative, 0: zero, P1: positive one affine part, P2: positive two affine parts, J: jump discontinuity, C: continuous. 134 4.6. Computation of exact solutions v′(x˜1) > 0. Finally, since Du− w = −c < 0 in (x1, x˜1), from condition (Cα) we have that v′ = α there. Also note that since u has a jump discontinuity at x = L, condition (Cα) forces v′(L) = −α and from the symmetry of v we must have v(L) = 0. For convenience we set `1 = x1 > 0, `2 = x˜1 − x1 > 0, `3 = L− x˜1 > 0 with `1 + `2 + `3 = L. and v1 = vb(0, x1), v2 = vb(x1, x˜1), v3 = vb(x˜1, L). Since u is an affine function on (0, x1) we have that v1 is a cubic function v1(x) = a1x 3 + b1x 2 + c1x+ d1, x ∈ (0, x1), that satisfies the conditions v1(0) = 0, v ′ 1(0) = 0, v ′ 1(`1) = α, v ′′ 1(`1) = 0. (4.54) Thus v1 will have the form v1(x) = − α 3`21 x3 + α `1 x2, x ∈ (0, x1). (4.55) Since v2 is an affine function in (x1, x2) with gradient α and v is continuous at x1, we have that v2 will be of the form v2(x) = α(x− `1) + 2α`1 3 , x ∈ (x1, x˜1). (4.56) Since u is a strictly positive affine function on (x˜1, L) (with the same gradient as in (0, x1)), we have that v3 must be a cubic function v3(x) = a3x 3 + b3x 2 + c3x+ d3, x ∈ (x˜1, L), that satisfies the following conditions at x˜1 = `1 + `2: v3(`1+`2) = α`2+ 2α`1 3 , v′3(`1+`2) = α, v ′′ 3(`1+`2) = 0, v ′′′ 3 (`1+`2) = − 2α `21 , (4.57) and also the following conditions at L = `1 + `2 + `3: v3(`1 + `2 + `3) = 0, (4.58) v′3(`1 + `2 + `3) = −α. (4.59) 135 Exact solutions of the one dimensional TGV regularisation problem Conditions (4.57) give v3(x) = − α 3`21 (x− `1 − `2)3 + α(x− `1 − `2) + α`2 + 2α`1 3 , x ∈ (x˜1, L). (4.60) What is left is to find the relationship among `1, `2 and `3. Condition (4.59) gives `3 = √ 2`1, (4.61) and the condition (4.58) together with (4.61) gives 2 + √ 2 3 `1 + `2 = 0, (4.62) which is a contradiction since both `1 and `2 are strictly positive. Thus, this kind of solution cannot occur. From all the previous results, it follows that only the following situations can occur: (i) u < 0 in (0, x1), u > 0 with one affine part in (x1, L), jump (N-P1-J) discontinuity at x = L, (ii) u < 0 in (0, x1), u > 0 with two affine parts in (x1, L), jump (N-P2-J) discontinuity at x = L, (iii) u < 0 in (0, x1), u > 0 with one affine part in (x1, L), continuous (N-P1-C) at x = L, (iv) u < 0 in (0, x1), u > 0 with two affine parts in (x1, L), continuous (N-P2-C) at x = L, where 0 < x1 < L. In Figure 4.7 we provide a qualitative description of how these four allowed solutions look like, along with a description of the corresponding variables w and v. Notice that from Theorem 4.4.6, the solution of the type N-P1-C is in fact the L2–linear regression of fp.c.. Note finally that it can easily checked that the direction of the jump of u at x = L, is the same with that of the data fp.c.. Otherwise, from (Cα) we would have that v′(L) = α and that would lead again to contradictions. Our next step is to identify the combinations of the parameters α and β that lead to each type of solution. Proposition 4.6.3 (N-P1-J). The solution u of the problem P with data function fp.c., is of the type N-P1-J, see Figure 4.7(a), if and only if α and β satisfy the conditions β α ≥ 4L 27 and α < hL 8 . (4.63) 136 4.6. Computation of exact solutions x f u x Du JδL v x β −β w v′ = −α J (a) Solution of the type N-P1-J. x f u x Du J JδL v x β −β v′ = −α w (b) Solution of the type N-P2-J. x f u x Du v x β −β |v′| ≤ α w (c) Solution of the type N-P1-C. x f u x Du w v x β −β |v′| ≤ α (d) Solution of the type N-P2-C. Figure 4.7: All the possible types of solution u to the minimisation problem P for the data function fp.c.. We also show the forms of the corresponding variables w and v. More specifically in that case: u(x) =  6αL2x− 2αL , if x ∈ (0, L),6α L2 (x− L) + h− 4αL , if x ∈ (L, 2L), (4.64) i.e., u has a jump discontinuity of size h− 8α/L at x = L. Proof. It is easy to check that the solution u will be of the type N-P1-J if and only if we can find a function v in [0, L] that satisfies the following conditions: v is a cubic polynomial, (from condition (Cf ), v ′′ = −u is an affine function), (4.65) v(0) = 0, v′(0) = 0, (boundary conditions for v), (4.66) v(L) = 0, (v is an odd function), (4.67) v′(L) = −α, (u jumps at L, condition (Cα)), (4.68) 0 < −v′′(L) < h 2 , (u jumps at L & symmetry), (4.69) |v′| ≤ α, (condition (Cα)), (4.70) |v| ≤ β, (condition (Cβ)). (4.71) It is now easy to check that the function v(x) = − α L2 x3 + α L x2, x ∈ [0, L], satisfies conditions (4.65)–(4.68) and also the condition (4.70). Observe that v obtains its 137 Exact solutions of the one dimensional TGV regularisation problem maximum in [0, L] at x = 2L/3, with v(2L/3) = 4αL/27, thus (4.71) can be written as 4αL 27 ≤ β. Finally, condition (4.69) can be written as 0 < 4α L < h 2 . The last two inequalities form the two conditions in (4.63). Using the fact that u = −v′′ in (0, L) and taking advantage of the symmetry of u, we compute (4.64). We note here that the computation of the above type of exact solution for fp.c. (for a fixed jump h = 2) is also done in [Ben11] in order to indicate that fp.c. cannot be recovered by TGV regularisation up to a loss of contrast, as it is done in TV regularisation We now proceed to the study of the case N-P2-J. As we will see, it is computationally inaccessible to find the exact conditions on α and β that result to this kind of solution but we are able to provide some sufficient conditions that are not far away from being necessary as well. Proposition 4.6.4 (N-P2-J). If α and β satisfy the conditions β α < 4L 27 and β > 4α2 3h , (4.72) then the solution u of the problem P with data function fp.c. is of the type N-P2-J, see Figure 4.7(b). More specifically, there exist points 0 < x1 < x2 < 0 with x1 = x2 2 and L− 3β α < x2 < L− 9β 4α , such that u < 0 and u > 0 in (0, x1) and (x1, L) respectively with u ′ having a positive jump at x2. Proof. Again, it is easily verified that u will be a solution of the type N-P2-J if and only if we can find `1 > 0, `2 > 0 with `1 + `2 = L and two functions v1, v2 defined on [0, `1] and [`1, `1 + `2] respectively, such that the following conditions are satisfied: v1, v2 are cubic polynomials, (from condition (Cf )), (4.73) v1(0) = 0, v ′ 1(0) = 0, (boundary conditions for v), (4.74) v1(`1) = β, v ′ 1(`1) = 0, (v has a max at `1, condition (Cα)), (4.75) v2(`1) = β, v ′ 2(`1) = 0, v ′′ 2(`1) = v ′′ 1(`1), (continuity of v, v ′ and u), (4.76) v′′′2 (`1) < v ′′′ 1 (`1), (u ′ has a positive jump at `1) (4.77) v2(L) = 0, (v is an odd function), (4.78) 138 4.6. Computation of exact solutions v′2(L) = −α, (u jumps at L, condition (Cα)), (4.79) 0 < −v′′2(L) < h 2 , (u jumps at L & symmetry), (4.80) |v′i| ≤ α, i = 1, 2, (condition (Cα)), (4.81) |vi| ≤ β, i = 1, 2, (condition (Cβ)). (4.82) In that case `1 = x2 and `2 = L− `1. From conditions (4.73)–(4.75), we get that v1(x) = −2β `31 x3 + 3β `21 x2, x ∈ [0, `1]. (4.83) One can easily check from the fact that u = −v′′1 in (0, `1), that x1 = `1/2. We can also check that condition (4.81) for v2 is equivalent to 3β 2α ≤ `1, (4.84) while condition (4.82) is also satisfied in that case. After some computations, we have that conditions (4.76), (4.78), (4.79) give that v2 will be of the form v2(x) = ( 2β `21`2 − α 3`22 ) (x− `1)3 − 3β `21 (x− `1)2 + β. (4.85) We also get the following relationship between `1 and `2: γ`22 + `2` 2 1 − γ`21 = 0, with γ := 3β α , (4.86) where one can check that `2 < γ independently of the value of `1. Notice now, that if the condition (4.77) is satisfied, then v2 will be decreasing, with decreasing derivative as well something that would imply that conditions (4.81)–(4.82) hold for v2. With the help of (4.85)–(4.86) and condition `1 + `2 = L, we get that condition (4.77) is equivalent to `21 − 2L`1 + 2γL < 0. (4.87) The inequality (4.87) is true if and only if γ < L 2 and L− √ L2 − 2γL < `1 ⇐⇒ `2 < √ L2 − 2γL. (4.88) From the fact that `1 + `2 = L and (4.86) we get that `2 is the unique solution of φ(`2) := √ γ`22 γ − `2 + `2 − L = 0. (4.89) In view of (4.89) we have that `2 = √ L2 − 2γL if and only if γ = 4L/9. Since φ is strictly 139 Exact solutions of the one dimensional TGV regularisation problem increasing, one can check that (4.88) is equivalent to the very simple expression β α < 4L 27 . (4.90) Notice that in that case (4.84) is also satisfied. Finally, the last condition that has to be satisfied is (4.80). Using (4.85) and (4.86) it can be checked that it is equivalent to 4`2 − 2γ `22 < h 2α . (4.91) Ideally, one would like to obtain an explicit expression for `2 from (4.89) and obtain an inequality involving α and β using (4.91). However, this is practically impossible as one would have to solve a cubic equation for `2. This is why we are giving some estimates instead. One can check that from (4.89) and (4.90) we can derive that 3 4 γ < `2 < γ. (4.92) We have that the expression (4`2 − 2γ)/`22 in (4.91) is a strictly increasing function of `2 provided that `2 < γ which is true in our case. Thus, in order to satisfy (4.91) a sufficient (but not necessary) condition is 4γ − 2γ γ2 < h 2α ⇐⇒ β > 4α 2 3h . Remark: Let us also point out the following fact: Suppose that α∗ and β∗ are such, so that conditions (4.72) hold, i.e., the solution is of the type N-P2-J. Then there exists a 0 < c < β∗ such that for every β ≤ c the condition (4.91) is violated, so we do not have this type of solution. Indeed from (4.91) and (4.92) we have that 16 9γ < 4`2 − 2γ `22 < h 2α . Thus keeping α fixed and choosing β such that 16 9γ > h 2α ⇐⇒ β < 32α 2 27h , (4.93) we cannot have the solution of the type N-P2-J any more. We now turn our attention to the solution of the type N-P1-C. As we mentioned earlier, in that case u is an affine function and thus also the L2–linear regression of fp.c., see Theorem 4.4.6. In the same theorem we gave some thresholds for α and β in the general case but we can be more explicit in this specific example. 140 4.6. Computation of exact solutions Proposition 4.6.5 (N-P1-C). The solution u of the problem P with data function fp.c. is of the type N-P1-C, see Figure 4.7(c), if and only if α and β satisfy the conditions α ≥ hL 8 and β ≥ hL 2 54 . (4.94) Moreover, in that case u will be the L2–linear regression of fp.c. and equal to u(x) = 3h 4L x− h 4 , x ∈ (0, 2L). (4.95) Proof. Again we can check that the solution will be of the type N-P1-C if and only if we can find a function v defined on [0, L] such that the following conditions hold: v is a cubic polynomial, (4.96) v(0) = 0, v′(0) = 0, (4.97) v(L) = 0, (4.98) −v′′(L) = h 2 , (condition (Cf ), continuity at L & symmetry), (4.99) |v′| ≤ α, (4.100) |v| ≤ β. (4.101) We can easily check that conditions (4.96)–(4.99) give v(x) = − h 8L x3 + h 8 x2, x ∈ [0, L]. (4.102) Moreover, the maximum values of |v| and |v′| are hL254 and hL8 respectively. Thus, conditions (4.100)–(4.101) will be satisfied if and only if (4.94) holds. Finally, we investigate under which combinations of α and β we have solutions of the type N-P2-C. As in the case N-P2-J, it is computationally inaccessible to provide explicit conditions, thus we provide again some sufficient but not necessary conditions. Proposition 4.6.6 (N-P2-C). If α and β satisfy the conditions β < hL2 54 and β ≤ 4Lα 9 − L 2 27 , (4.103) then the solution u of the problem P with data function fp.c. is of the type N-P2-C, see Figure 4.7(d), and u′ has a positive jump at a point x2 where 2L3 < x2 < L. Proof. As in the N-P2-J case, u will be a solution of the type N-P2-C if and only if we can find `1 > 0, `2 > 0 with `1 + `2 = L and two functions v1, v2 defined on [0, `1] and 141 Exact solutions of the one dimensional TGV regularisation problem [`1, `1 + `2] respectively, such that the following conditions are satisfied: v1, v2 are cubic polynomials, (4.104) v1(0) = 0, v ′ 1(0) = 0, (4.105) v1(`1) = β, v ′ 1(`1) = 0, (4.106) v2(`1) = β, v ′ 2(`1) = 0, v ′′ 2(`1) = v ′′ 1(`1), (4.107) v′′′2 (`1) < v ′′′ 1 (`1), (4.108) v2(L) = 0, (4.109) −v′′(L) = h 2 , ((Cf ), continuity at L & symmetry), (4.110) |v′i| ≤ α, i = 1, 2, (4.111) |vi| ≤ β, i = 1, 2. (4.112) Conditions (4.104)–(4.106) together with (4.111)–(4.112) yield v1(x) = −2β `31 x3 + 3β `21 x2, x ∈ [0, `1], (4.113) with 3β 2α ≤ `1. (4.114) Supposing that v2 is of the form v2(x) = a3(x− `1)3 + a2(x− `1)2 + a1(x− `1) + a0, x ∈ [`1, `1 + `2], conditions (4.107), (4.109), (4.110) give v2(x) = a3(x− `1)3 − 3β `21 (x− `1)2 + β, x ∈ [`1, `1 + `2], (4.115) and a3` 3 2 − 3β`22 `21 + β = 0, (4.116) −6a3`2 + 6β `21 = h 2 . (4.117) One can check that conditions (4.107) and (4.110) impose v′2 to be decreasing , so in order to satisfy (4.111) and (4.112) it suffices to impose |v′2(`1 + `2)| ≤ α, which, with the help of (4.115) and (4.117), is equivalent to 3β`2 `21 + h`2 4 ≤ α. (4.118) 142 4.6. Computation of exact solutions Moreover combining (4.116), (4.117) and the fact that `1+`2 = L, we can get a relationship between `1 and `2 and an equation for `1: `2 = √√√√ 12β h+ 24β `21 , (4.119) φ(`1) := `1 + √√√√ 12β h+ 24β `21 − L = 0. (4.120) It is easy to see that the equation (4.120) has a unique solution in (0, L) but one has to solve a quartic in order to express it explicitly. Thus, as in the case N-P2-J we give some estimates. Observe firstly that using (4.113), (4.115), (4.117) and (4.119) condition (4.108) is satisfied if and only if 18β`1 `2 − 6β` 3 1 `32 < −12β, (4.121) which is satisfied if and only if `2 < `1 2 . However, from (4.119), this is true if and only if `1 > √ 24β h . From the fact that the function φ is strictly increasing, the last inequality is true if and only if β < hL2 54 . (4.122) It remains to identify some sufficient conditions for (4.118). Observe that under (4.122) we have φ(2L/3) < 0 and φ(L > 0), which means that 2L 3 < `1 < L and 0 < `2 < L 3 . (4.123) Using (4.123), we find that a sufficient (but not necessary) condition for (4.118) is 3β (2L3 ) 2 L 3 + hL 12 < α ⇐⇒ β ≤ 4Lα 9 − L 2h 27 . (4.124) We can easily verify that under (4.124) the condition (4.114) is satisfied as well. We now summarise how the solutions of the problem P with data function fp.c. are affected by the different choices of the parameters α and β. In Figure 4.8 we have par- titioned the set {α > 0, β > 0} into different areas that correspond to the four different possible solutions. These areas are: (1) Blue colour: { α < hL8 , β ≥ 4L27α } , (necessary and sufficient condition), N-P1-J, see Figure 4.7(a). (2) Purple colour: { β < 4L27α, β ≥ 3627hα2 } , (sufficient condition), N-P2-J, see Figure 143 Exact solutions of the one dimensional TGV regularisation problem hL 8 αhL 12 (0, 0) hL2 54 β = 4L27α (4) (3) (1) (1) (2) (3) (4) β hL 9 (2) β = 3227hα 2 β = 3627hα 2 β = 4L9 α− hL 2 27 Figure 4.8: Illustration of the four different types of solutions of P for the piecewise constant function fp.c. that result for different combinations of the parameters α and β. 4.7(b). (3) Yellow colour: { α ≥ hL8 , β ≥ hL 2 54 } , (necessary and sufficient condition), N-P1-C, see Figure 4.7(c). (4) Orange colour: { β ≤ 3227hα2, β < hL 2 54 } , (sufficient condition), N-P2-C, see Figure 4.7(d). Notice that according to Proposition 4.6.6 our initial sufficient conditions for the solu- tion of the type N-P2-C to happen were β < hL 2 54 and β ≤ 4L9 α− L 2h 27 . However, according to the remark after Proposition 4.6.4, page 140, when condition β ≤ 3227hα2 holds then the solution of the type N-P2-J cannot happen and since the conditions for N-P1-C and N-P2-C are necessary and sufficient we conclude that when β ≤ 3227hα2 holds (together with α ≤ hL8 ) then the solution of the type N-P2-C occur. Notice that{ α > 0, β > 0 : β ≤ 4L 9 α− L 2h 27 , α ≤ hL 8 } ⊆ { α > 0, β > 0 : β ≤ 32 27h α2, α ≤ hL 8 } , see Figure 4.8. As we have mentioned, due to computational issues it is difficult to derive necessary 144 4.6. Computation of exact solutions and sufficient conditions for the solutions of the type N-P2-J and N-P2-C. However, the estimates we have provided are not far away from being sharp as the unknown grey area in Figure 4.8, {β > 3227hα2, β < 3627hα2, β < 4L27α} is relatively small. 4.6.2 Piecewise affine function with a single jump In this section we choose the data function to be a simple piecewise affine function fp.a., defined as fp.a.(x) = λ(x− L) if x ∈ (0, L),λ(x− L) + h if x ∈ [L, 2L), (4.125) for some h > 0 and λ ∈ R, see also Figure 4.9. h L L f ′p.a. = λ f ′p.a. = λ Figure 4.9: Piecewise affine function fp.c. with a jump discontinuity at x = L. Note that fp.c. is a special case of fp.a. where λ = 0. However, we decided to treat the fp.c. case as a separate one since as the next proposition shows the solutions of P with data fp.a. can be obtained from the ones that correspond to fp.c.. The latter ones are more easy to compute since fp.c. has a simpler form. The following Proposition essentially states that if an affine function is added to the data f then the solution u of P is also shifted by the same affine function. Proposition 4.6.7. The function ufp.c. is a solution to the minimisation problem P with data function fp.c. if and only if ufp.a. is a solution to P with data function fp.a., where ufp.a.(x) = ufp.c.(x) + λ(x− L). Proof. Suppose that ufp.c. is a solution of P with data fp.c., for some combination of the parameters α and β and let vfp.c. , wfp.c. be the corresponding v and w variables. We will show that ufp.a.(x) = ufp.c.(x) + λ(x − L) is a solution for data fp.a. for the same 145 Exact solutions of the one dimensional TGV regularisation problem combination of α and β and vice versa. The optimality conditions (Cf ), (Cα), (Cβ) read: v′′fp.c. = f − ufp.c. , −v′fp.c. ∈ α Sgn(Dufp.c. − wfp.c.), vfp.c. ∈ β Sgn(Dwfp.c.). Observing that fp.a.(x) = fp.c.(x)+λ(x−L) we set vfp.a. = vfp.c. and wfp.a.(x) = wfp.c.(x)+ λ. Then we have v′′fp.a. = v ′′ fp.c. = fp.c. − ufp.c. = fp.a. − ufp.a. , and −v′fp.a. ∈ α Sgn(Dufp.a. − wfp.a.) ⇐⇒ −v′fp.a. ∈ α Sgn(Dufp.c. + λ− wfp.c. − λ) ⇐⇒ −v′fp.c. ∈ α Sgn(Dufp.c. − wfp.c.). Finally, vfp.a. ∈ β Sgn(Dwfp.a.) ⇐⇒ vfp.a. ∈ β Sgn(Dwfp.c.) ⇐⇒ vfp.c. ∈ β Sgn(Dwfp.c.), thus (Cf ), (Cα), (Cβ) hold for ufp.a. , vfp.a. and wfp.a. . We can similarly show that if ufp.a. is a solution for data fp.a. then ufp.c.(x) = ufp.a.(x)−λ(x−L) is a solution for data fp.c.. (a) Solution of the type N-P1-J. (b) Solution of the type N-P2-J. (c) Solution of the type N-P1-C. (d) Solution of the type N-P2-C. Figure 4.10: All the possible types of solution u to the minimisation problem P for the data function fp.a.. In Figure 4.10 we show all the possible type of solutions for the data fp.a.. These solutions correspond to the same combinations of α and β that are shown in Figure 4.8. One can observe here the capability of TGV to preserve piecewise affine structures. This is in contrast to TV regularisation which promotes piecewise constant reconstructions. In fact, one can easily check using the optimality conditions of the L2–TV problem in Table 4.1 that all TV solutions have the form shown in Figure 4.11. 146 4.6. Computation of exact solutions Figure 4.11: Type of solutions for the L2–TV problem with data fp.a.. 4.6.3 Hat function In this section we compute exact solutions for the hat function fhat(x) = λ|x− L| − λL, where λ > 0, see Figure 4.12. The study of this case gives an insight about how TGV L L λL f ′hat = −λ f ′hat = λ Figure 4.12: The hat function fhat. affects local extrema. Note that since fhat is an even function, the solutions will be even as well. Working similarly as we did for the function fp.c., we conclude that the only types of solutions u are the following2: (i) u is constant in (0, x1), equal to fhat in (x1, x2) and constant in (x1, L), (C-E-C) (ii) u is affine in (0, x1), equal to fhat in (x1, x2) and affine in (x1, L), (A-E-A) (iii) u is constant in (0, L), (C) (iv) u is affine in (0, L), (A) where 0 < x1 < x2 < L. In Figure 4.13 we show how these four types of solutions look like, along with the corresponding w and v variables. Notice that the solution of the type C is the L2–linear regression of fhat. Note also, that the solutions of the type C-E-C correspond to the TV solutions predicted by Theorem 4.4.11. We finally note that the 2C: constant, E: equal to fhat, A: affine. 147 Exact solutions of the one dimensional TGV regularisation problem solution of the type A is also computed in [Ben11, Mu¨l13, BBBM13] in order to indicate that fhat can be recovered by TGV regularisation up to a loss of contrast. t t x Du w β v x x v′ = α v′ = −α u (a) Solution of the type C-E-C. t t x Du w β v x x v′ = −αv′ = α u (b) Solution of the type A-E-A. L 2 λL 2 x Du w β v x x u (c) Solution of the type C. t t L 2 Du w β v x x x u (d) Solution of the type A. Figure 4.13: All the possible types of solution u to the minimisation problem P for the data function fhat. We also show the forms of the corresponding variables w and v. In the following, we investigate which combinations of the parameters α and β corre- spond to each type of solution. Unlike the case with the function fp.c., here we are able to give necessary and sufficient conditions for all the types of solutions. We summarise our results in the following proposition. Proposition 4.6.8. Consider the problem P with data function fhat. Then we have the following four cases: (i) The solution u is of the type C-E-C, Figure 4.13(a), if and only if α < λL2 8 and β ≥ αL− 2α 3 √ 2α λ . (4.126) In that case u(x) =  −√2αλ if x ∈ ( 0, √ 2α λ ) , −λx if x ∈ (√ 2α λ , L− √ 2α λ ) , −λL+√2αλ if x ∈ ( L− √ 2α λ ), L ) . (ii) The solution u is of the type A-E-A, Figure 4.13(b), if and only if β > 2Lα 3 and β < αL− 2α 3 √ 2α λ . (4.127) 148 4.6. Computation of exact solutions In that case u(x) =  −µx−√2α(λ− µ) if x ∈ (0, 32 (L− βα)) , −λx if x ∈ ( 3 2 ( L− βα ) , L− 32 ( L− βα )) , −µx− (λ− µ)L+ 2√2α(λ− µ) if x ∈ (L− 32 (L− βα) , L) , with µ = λ− 8α 9 ( L− βα )2 . (iii) The solution u is of the type C, Figure 4.13(c), if and only if β ≥ λL 3 12 and α ≥ λL 2 8 . (4.128) In that case u(x) = λL 2 , x ∈ (0, 2L). (iv) The solution u is of the type A, Figure 4.13(d), if and only if β < λL3 12 and β ≤ 2Lα 3 . (4.129) In that case u(x) = µ|x− L| − (λL− t), x ∈ (0, 2L) with µ = λ− 12β L3 and t = 6β L2 . Proof. (i) Since we are looking for solutions of the type C-E-C, we must find 0 < x1 < x2 < L such that u = t for a constant t in (0, x1), u = fhat in (x1, x2) and u = −t−λ(x2−x1) in (x2, L). As far as the variable w is concerned, we have that w = Du = 0 in (0, x1)∪(x2, L). However since u = fhat in (x1, x2), we have that v is affine in that interval, thus it cannot have an extremum there. Thus, condition (Cβ) forces w = 0 in (x1, x2) as well and (Cα) forces v′ = α there. We conclude that the solution u will be of the type C-E-C if and only if we can find 0 < x1 < x2 < L and a function v with the following properties: v and v′ are continuous, (v ∈ H20 ), (4.130) v is a cubic polynomial on (0, x1) and (x2, L), (condition (Cf )), (4.131) v(0) = 0, v′(0) = 0, (boundary conditions for v), (4.132) v is affine and v′ = α on (x1, x2), (as explained above), (4.133) v′(L) = 0, (v is an even function), (4.134) 149 Exact solutions of the one dimensional TGV regularisation problem |v| ≤ β, (condition (Cβ)), (4.135) |v′| ≤ α, (condition (Cα)). (4.136) After some computations we find that x1 = L− x2 = √ 2α λ , t = √ 2αλ, v(x) =  −λ6x3 + √ αλ 2 x 2 if x ∈ ( 0, √ 2α λ ) , α(x− x1) + λx 2 1 3 if x ∈ (√ 2α λ , L− √ 2α λ ) , −λ6 (x+ x1 − L)3 + α(x+ x1 − L) +α(L− 2x1) + λx 3 1 3 if x ∈ ( L− √ 2α λ , L ) . From the equation v′′ = fhat − u we can get an expression for u. Moreover x1 < x2 holds if and only if α < λL2 8 . Finally one can check that (4.136) holds and in order to satisfy (4.135), since v is increasing in (0, L), it suffices to have v(L) ≤ β something that translates to β ≥ αL− 2α 3 √ 2α λ . (ii) The proof follows essentially the proof of (i). Here we are looking for solutions of the type u = −µx− t instead of u = t on (0, x1). In addition to the conditions (4.130)–(4.136) for v here we also have v(L) = β, (condition (Cβ)), (4.137) because w makes a positive jump at x = L, see also Figure 4.13(b). Again after some computations we find x1 = L− x2 = √ 2α λ− µ, t = √ 2α(λ− µ), v(x) =  −λ−µ6 x3 + √ α(λ−µ) 2 x 2 if x ∈ ( 0, √ 2α λ−µ ) , α(x− x1) + (λ−µ)x 2 1 3 if x ∈ (√ 2α λ−µ , L− √ 2α λ−µ ) , −λ−µ6 (x+ x1 − L)3 + α(x+ x1 − L) +α(L− 2x1) + (λ−µ)x 3 1 3 if x ∈ ( L− √ 2α λ−µ , L ) . 150 4.6. Computation of exact solutions Again one can check that |v′| < α. Moreover the condition v(L) = β gives αL− 2α 3 √ 2α λ−−µ = β ⇐⇒ µ = λ− 8α 9 ( L− βα )2 . (4.138) Since we are looking into cases where 0 < µ < λ we must have √ 2α λ−µ > √ 2α λ and using (4.138) this translates to β < αL− 2α 3 √ 2α λ . (4.139) Finally, it is easily checked that in order to impose x1 < x2, we must have β > 2Lα 3 . (4.140) (iii) In this case, we are looking for solutions of the type u = −t, t > 0. The function v will be a cubic polynomial that satisfy the conditions: v(0) = 0, v′(0) = 0, v′(L) = 0, v′′ = h+ t, |v| ≤ β, |v′| ≤ α. We easily compute t = λL 2 , v(x) = −λ 6 x3 + 1 2 tx2, x ∈ (0, L). We can also check that the conditions |v| ≤ β, |v′| ≤ α are equivalent to β ≥ λL 3 12 , α ≥ λL 2 8 , respectively. (iv) The proof is similar to (iii). We are looking for a solution of the type u(x) = −µx− t, x ∈ (0, L), 0 < µ < λ and t > 0. As before, the function v will be a cubic polynomial satisfying the conditions: v(0) = 0, v′(0) = 0, v(L) = β, v′(L) = 0, v′′ = h+ t, |v| ≤ β, |v′| ≤ α. We get that v(x) = −λ− µ 6 x3 + t 2 x2, x ∈ (0, L), with t = 6β L2 , µ = λ− 12β L3 . Thus 0 < µ < λ is equivalent to β < λL3 12 . 151 Exact solutions of the one dimensional TGV regularisation problem Finally, we check easily that |v| ≤ β holds and |v′| ≤ α is equivalent to β ≤ 2Lα 3 . α β λL2 8 λL3 12 β = 2Lα3 (3) (4) (2) (1) (2) (3) β = αL− 2α3 √ 2α λ (1) (4) (0, 0) Figure 4.14: Illustration of the four different types of solutions of P for the hat function fhat that result for different combinations of the parameters α and β. As we did for the function fp.c. we summarise how the solutions of the problem P with data fhat are determined by α and β. In Figure 4.14 we have partitioned again the set {α > 0, β > 0} into different areas that correspond to the four different possible solutions. We note again that in contrast to the cases of the functions fp.c. and fp.a., here we provide both necessary and sufficient conditions for all the four different types of solutions. The corresponding areas are: (1) Blue colour: { α < λL 2 8 , β ≥ αL− 2α3 √ 2α λ } , C-E-C, see Figure 4.13(a). 152 4.6. Computation of exact solutions (2) Purple colour: { β > 2Lα3 , β < αL− 2α3 √ 2α λ } , A-E-A, see Figure 4.13(b). (3) Yellow colour: { α ≥ λL28 , β ≥ λL 3 12 } , C, see Figure 4.13(c). (4) Orange colour: { β < λL 3 12 , β ≤ 2Lα3 } , A, see Figure 4.13(d). 4.6.4 Numerical experiments In this final section, we compare our theoretical results with numerical ones obtained by solving the discrete version of P with the primal-dual algorithm of Chambolle-Pock, [CP11]. A description of the algorithm for TGV minimisation can be found in [Bre14]. We also compute some numerical results with the presence of Gaussian noise that show that TGV regularisation is quite robust for noisy data. Let us note here that even though some sensitivity analysis can be done [BV11], it is not an easy task to prove that the cor- responding solutions of P with clean and corrupted data have the same structure provided the noise is sufficiently small. Some relevant work has been done in [BB13] in terms of ground states of regularisation functionals. In Figure 4.15, we plot the exact and the numerical solutions of P using the function fp.c. as a data function. We choose the parameters α and β so that we have the solution of the type N-P1-J. We observe that for clean data, the exact and the numerical solutions coincide, see Figure 4.15(a). Moreover, even under the presence of noise, the numerical solution is not far away from the corresponding solution without the noise, Figure 4.15(b). We observe similar results in Figures 4.16 and 4.17 where we choose the parameters so that we have solutions of the type N-P2-J and N-P1-C respectively. In Figure 4.18 we use the function fhat as a data function. Again, without the presence of noise, the exact solution agrees with the numerical one, Figure 4.18(a). However, when noise is added, even though the numerical solution is close to the one that corresponds to the clean data, some staircasing is observed, Figure 4.18(b). This is not surprising as with these combinations of α and β, TGV behaves like TV as it was shown in Theorem 4.4.11. In Figure 4.19 the parameters are chosen so that the solution of the type A-E-A occurs. In the clean data case we have agreement between the exact and the numerical solution, Figure 4.19(a), but in the noisy case a kind of “affine” staircasing effect appears in the area where the exact solution equals with the data, see detail of Figure 4.19(b). Finally, in Figure 4.20, the parameters are chosen so the solution of the type A occurs. Again, the numerical solution agrees with the exact one and deviates from it slightly in the presence of noise, while no staircasing of any kind appears this time. 153 Exact solutions of the one dimensional TGV regularisation problem 0 50 100 150 200 250 300 350 400 450 500 −40 −20 0 20 40 60 80 100 120 140 f TGV–numerical: α = 1250, β = 50000 TGV–exact (a) Clean data, the numerical solution agrees with the exact one. 0 50 100 150 200 250 300 350 400 450 500 −40 −20 0 20 40 60 80 100 120 140 f noisy, Gaussian noise, σ = 10 TGV–numerical: α = 1250, β = 50000 TGV solution of unnoisy f (b) Noisy data, the numerical solution deviates slightly from the corresponding exact solution with clean data. Figure 4.15: Piecewise constant function fp.c.. The parameters α and β satisfy the condi- tions (4.63), thus the solution is of the type N-P1-J. 0 50 100 150 200 250 300 350 400 450 500 −40 −20 0 20 40 60 80 100 120 140 f TGV–numerical: α = 1250, β = 30000 TGV–exact (a) Clean data, the numerical solution agrees with the exact one. 0 50 100 150 200 250 300 350 400 450 500 −40 −20 0 20 40 60 80 100 120 140 f noisy, Gaussian noise, σ = 10 TGV–numerical: α = 1250, β = 30000 TGV solution of unnoisy f (b) Noisy data, the numerical solution deviates slightly from the corresponding exact solution with clean data. Figure 4.16: Piecewise constant function fp.c.. The parameters α and β satisfy the condi- tions (4.72), thus the solution is of the type N-P2-J. 154 4.6. Computation of exact solutions 0 50 100 150 200 250 300 350 400 450 500 −40 −20 0 20 40 60 80 100 120 140 f TGV–numerical: α = 4000, β = 120000 TGV–exact (a) Clean data, the numerical solution agrees with the exact one. 0 50 100 150 200 250 300 350 400 450 500 −40 −20 0 20 40 60 80 100 120 140 f noisy, Gaussian noise, σ = 10 TGV–numerical: α = 4000, β = 120000 TGV solution of unnoisy f (b) Noisy data, the numerical solution deviates slightly from the corresponding exact solution with clean data. Figure 4.17: Piecewise constant function fp.c.. The parameters α and β satisfy the condi- tions (4.94), thus the solution is of the type N-P1-C (L2–linear regression). 0 100 200 300 400 500 600 700 800 −10 0 10 20 30 40 50 f TGV–numerical: α = 400, β = 200000 TGV–exact (a) Clean data, the numerical solution agrees with the exact one. 0 100 200 300 400 500 600 700 800 −10 0 10 20 30 40 50 f noisy, Gaussian noise, σ = 1 TGV–numerical: α = 400, β = 200000 TGV solution of unnoisy f Detail (b) Noisy data, appearance of staircasing effect. Figure 4.18: Hat function fhat. The parameters α and β satisfy the conditions (4.126), thus the solution is of the type C-E-C. 0 100 200 300 400 500 600 700 800 −10 0 10 20 30 40 50 f TGV–numerical: α = 400, β = 120000 TGV–exact (a) Clean data, the numerical solution agrees with the exact one. 0 100 200 300 400 500 600 700 800 −10 0 10 20 30 40 50 f noisy, Gaussian noise, σ = 1 TGV–numerical: α = 400, β = 120000 TGV solution of unnoisy f Detail (b) Noisy data, appearance of an “affine” staircas- ing effect. Figure 4.19: Hat function fhat. The parameters α and β satisfy the conditions (4.127), thus the solution is of the type A-E-A. 155 Exact solutions of the one dimensional TGV regularisation problem 0 100 200 300 400 500 600 700 800 −10 0 10 20 30 40 50 f TGV–numerical: α = 400, β = 100000 TGV–exact (a) Clean data, the numerical solution agrees with the exact one. 0 100 200 300 400 500 600 700 800 −10 0 10 20 30 40 50 f noisy, Gaussian noise, σ = 1 TGV–numerical: α = 400, β = 100000 TGV solution of unnoisy f Detail (b) Noisy data, the numerical solution deviates slightly from the corresponding exact solution with clean data. Figure 4.20: Hat function fhat. The parameters α and β satisfy the conditions (4.129), thus the solution is of the type A. 156 Chapter 5 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications 5.1 Introduction In this chapter we are concerned with two different forms of non-local Hessian functionals. For the first formulation (explicit formulation) we prove its localisation to the classical Hessian as the non-locality vanishes and we use it to characterise some higher order Sobolev and BV spaces. The second formulation (implicit formulation) that we introduce, is better tuned for use in variational problems in imaging. We provide some numerical examples for image denoising where we show that the model is capable of outperforming TGV as far as the restoration of piecewise affine data is concerned. Explicit formulation We are initially interested in studying the following non-local Hessian functional (explicit formulation): Hnu(x) = d(d+ 2) 2 ˆ RN d 2u(x, y) |x− y|2 ( (x− y)⊗ (x− y)− |x−y|2d+2 Id ) |x− y|2 ρn(x− y)dy, x ∈ R d, (5.1) with d 2u(x, y) := u(y)− 2u(x) + u(x+ (x− y)), 157 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications see Figure 5.1, where here Id is the d× d identity matrix and ρn is a sequence of L1(Rd) radial functions, i.e., ρn(x) = ρn(|x|), that satisfy the properties ρn ≥ 0, ˆ Rd ρndx = 1, lim n→∞ ˆ |x|>δ ρndx = 0, ∀δ > 0. (5.2) This means that eventually all the mass of ρn is concentrating to the origin, leading finally to the localisation of (5.1). x y1 y2 y3 y4 x+ (x− y1) x+ (x− y2) x+ (x− y3) x+ (x− y4) Figure 5.1: Configuration of the second order finite difference scheme d 2u(x, y) = u(y)− 2u(x) + u(x+ (x− y)). As we have already mentioned in the introduction, analogous first order non-local func- tionals have been introduced and studied in the literature. More particularly, Bourgain, Brezis and Mironescu [BBM01] examined functionals of the type ˆ Ω ˆ Ω |u(x)− u(y)|p |x− y|p ρn(x− y)dxdy, (5.3) where Ω is a smooth bounded domain in Rd and 1 ≤ p <∞. They characterised W 1,p(Ω) and BV(Ω) by proving that, u ∈W 1,p(Ω) ⇐⇒ u ∈ Lp(Ω) and lim inf n→∞ ˆ Ω ˆ Ω |u(x)− u(y)|p |x− y|p ρn(x− y)dxdy <∞, p > 1, u ∈ BV(Ω) ⇐⇒ u ∈ L1(Ω) and lim inf n→∞ ˆ Ω ˆ Ω |u(x)− u(y)| |x− y| ρn(x− y)dxdy <∞, p = 1. Ponce in [Pon04] derived similar characterisations by studying functionals of the type ˆ Ω ˆ Ω ω ( |u(x)− u(y)| |x− y| ) ρ(x− y)dxdy, (5.4) 158 5.1. Introduction where ω is a continuous function and ρ are not necessarily radial. Analogous were the results by Gobbino and Mora in [GM01]. Mengesha and Spector in [MS13] introduced the following non-local gradient operator: Gnu(x) = d ˆ Ω u(x)− u(y) |x− y| x− y |x− y|ρn(x− y)dy, x ∈ Ω. (5.5) Functional (5.5) is defined rigorously as a distribution. The authors proved the localisation of the functionals (5.5) to their classical analogue, ∇u, in the topology that corresponds to the regularity of the function u and they obtained yet another characterisation of the spaces W 1,p(Ω) and BV(Ω): u ∈W 1,p(Ω) ⇐⇒ u ∈ Lp(Ω) and lim inf n→∞ ‖Gnu‖Lp(Ω) <∞, p > 1 u ∈ BV(Ω) ⇐⇒ u ∈ L1(Ω) and lim inf n→∞ ‖Gnu‖L1(Ω) <∞, p = 1. In this chapter, we prove the localisation of functional (5.1) to the classical Hessian∇2u in various topologies and we obtain some novel characterisations of the spaces W 2,p(Rd) and BV2(Rd). In summary, we prove the following localisations: if u ∈ C2c (Rd) =⇒ Hnu→ ∇2u uniformly, if u ∈W 2,p(Rd) =⇒ Hnu→ ∇2u in Lp(Rd,Rd×d), 1 ≤ p <∞, if u ∈ BV2(Rd) =⇒ HnuLd → D2u weakly∗ in measures. We also prove the following derivative-free characterisations of W 2,p(Rd) and BV2(Rd) in the spirit of [BBM01] and [MS13]: u ∈W 2,p(Rd) ⇐⇒ u ∈ Lp(Rd) and lim inf n→∞ ‖Hnu‖Lp(Rd,Rd×d) <∞, p > 1, (5.6) u ∈ BV2(Rd) ⇐⇒ u ∈ L1(Rd) and lim inf n→∞ ‖Hnu‖L1(Rd,Rd×d) <∞, p = 1. (5.7) We must note here that the results in [MS13] hold for the vector value case as well, that is to say one can define a form of non-local Hessian as Gn(∇u)(x) = d ˆ Ω ∇u(x)−∇u(y) |x− y| ⊗ x− y |x− y|ρn(x− y)dy, (5.8) and obtain a straitforward characterisation of W 2,p(Rd) and BV2(Rd): u ∈W 2,p(Rd) ⇐⇒ u ∈W 1,p(Rd) and lim inf n→∞ ‖Gn∇u‖Lp(Rd,Rd×d) <∞, p > 1 (5.9) u ∈ BV2(Rd) ⇐⇒ u ∈W 1,1(Rd) and lim inf n→∞ ‖Gn∇u‖L1(Rd,Rd×d) <∞, p = 1. (5.10) However the advantage of the characterisations (5.6)–(5.7) over (5.9)–(5.10) is that they 159 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications are derivative-free and require the function u to belong to an Lp space instead of W 1,p. Implicit formulation We then proceed to the definition of the implicit formulation of a non-local Hessian func- tional, see Definition 5.7.1. The non-local Hessian Hσxu(x) of a function u at a point x, is defined implicitly through a minimisation problem. We note that in this formulation the weighting function σx is not necessarily radial and depends heavily on the point x. It is constructed in such a way (by solving a weighted Eikonal equation) so that when x lies near an edge, points across the edge are weighted with small or zero weight something that eventually leads to an edge preservation scheme. We solve the corresponding regularisation problem for the case of image denoising, where we observe that the model is able to recover to a very large degree piecewise affine data, outperforming TGV. 5.1.1 Organisation of the chapter In Section 5.2 we prove the localisation of the sequence (Hnu)n∈N to the classical Hessian ∇2u for smooth, compactly supported functions u. This localisation occurs in the uniform topology. In Section 5.3, after proving that Hnu is well defined for functions u ∈ W 2,p(Rd), we show that (Hnu)n∈N localises to the second order weak derivative ∇2u. A second order non-local integration by parts formula is shown in Section 5.4 where also a second order non-local divergence is introduced. This formula is an essential tool for proving the localisation of (Hnu)n∈N for the BV2 case as well as for the non-local characterisation of BV2(Rd). In Section 5.6 we derive the non-local characterisations of W 2,p(Rd) and BV2(Rd), after redefining the non-local Hessian functional as a distribution. Finally, in Section 5.7 we introduce the implicit formulation of non-local Hessian and we provide some numerical examples in image denoising. 5.2 Localisation – The smooth case Our target for this section is to show that at least for sufficiently smooth functions u, the sequence (Hnu)n∈N converges to the classical Hessian ∇2u uniformly. We first need the following lemma: Lemma 5.2.1. Let ci,jn (x) be the following quantities for i, j = 1, . . . , d: ci,jn (x) = (A(d))i,j ˆ Rd (xi − yi)2(xj − yj)2 |x− y|4 ρn(x− y)dy, 160 5.2. Localisation – The smooth case where (A(d))i,j = d2 + 2d if i 6= j,d2+2d 3 if i = j. Then ci,jn = 1 for every n ∈ N and i, j = 1, . . . , d. Proof. The proof follows directly from Lemma A.2.2 in Appendix A. We are now ready to prove the localisation of Hnu to ∇2u for sufficiently smooth functions: Theorem 5.2.2. Suppose that u ∈ C2c (Rd). Then Hnu→ ∇2u, uniformly as n→∞. Proof. Note first that by using the mean value theorem for one dimensional restrictions of u we have d 2u(x, y) = (x− y)T (ˆ 1 0 ˆ 1 0 ∇2u(x+ (t+ s− 1)(y − x))dsdt ) (x− y). (5.11) Define now the following quantity for δ > 0: Qδu(x) = ∣∣∣∣∣ ˆ B(x,δ) d 2u(x, y)− (x− y)T∇2u(x)(x− y) |x− y|2 (x− y)⊗ (x− y) |x− y|2 ρn(x− y)dy ∣∣∣∣∣ . (5.12) We claim that we can make Qδu(x) as small as possible, independently of x, by choosing sufficiently small δ > 0. Indeed, given  > 0, using the uniform continuity of the partial derivatives of u, we can choose δ > 0 such that for every i, j = 1, . . . , d we have∣∣∣∣ ∂u∂xi∂xj (x)− ∂u∂xi∂xj (y) ∣∣∣∣ < , whenever |x− y| < δ. Using (5.11) we can estimate Qδu(x) = ∣∣∣∣∣ ˆ B(x,δ) d 2u(x, y)− (x− y)T∇2u(x)(x− y) |x− y|2 (x− y)⊗ (x− y) |x− y|2 ρn(x− y)dy ∣∣∣∣∣ = ∣∣∣∣∣∣ ˆ B(0,δ) zT (´ 1 0 ´ 1 0 ∇2u(x+ (1− s− t)z)−∇2u(x)dsdt ) z |z|2 z ⊗ z |z|2 ρn(z)dz ∣∣∣∣∣∣ ≤ d ˆ B(0,δ) |x− y||z| |z|2 |z||z| |z|2 ρn(z)dz ≤ d. (5.13) 161 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications On the other hand we have that Qδu(x) is also equal to Qδu(x) = ∣∣∣∣∣ ˆ B(x,δ) d 2u(x, y) |x− y|2 (x− y)⊗ (x− y) |x− y|2 ρn(x− y)dy − ˆ B(x,δ) (x− y)T∇2u(x)(x− y) |x− y|2 (x− y)⊗ (x− y) |x− y|2 ρn(x− y)dy︸ ︷︷ ︸ C ∣∣∣∣∣∣∣∣∣ , (5.14) where C is an d× d matrix whose (i0, j0)–th element is equal to C(i0, j0) = d∑ i,j=1 ∂u ∂xi∂xj (x) ˆ B(0,δ) zizjzi0zj0 |z|4 ρn(z)dz. (5.15) However due to radial symmetry we have that the following integrals vanish: i0 = j0 : ˆ B(0,δ) ziz 3 j0 |z|4 ρn(z)dz = 0, for i 6= j0, i0 = j0 : ˆ B(0,δ) zizjz 2 j0 |z|4 ρn(z)dz = 0, for i 6= j0, j 6= j0, i 6= j, i0 6= j0 : ˆ B(0,δ) zizjzi0zj0 |z|4 ρn(z)dz = 0, for i 6= i0, j 6= j0. This means that when i0 6= j0, the only integrals that are not zero in (5.15) correspond to the cases i = i0, j = j0 or i = j0, j = i0. Thus, using also the symmetry of the mixed derivatives, we have that C(i0, j0) = 2 ∂u ∂xi0∂xj0 (x) ˆ B(0,δ) z2i0z 2 j0 |z|4 ρn(z)dz, i0 6= j0. (5.16) Now, when i0 = j0 we have that the only integrals that are not zero in (5.15) correspond to the case i = j. This means that C(i0, j0) = d∑ i=1 ∂2u ∂x2i (x) ˆ B(0,δ) z2i z 2 i0 |z|4 ρn(z)dz, i0 = j0. (5.17) We now proceed to the final steps of the proof. Since ci,jn = 1, in order to prove the theorem it suffices to prove that ∣∣Hnu− cn ×∇2u∣∣→ 0, uniformly as n→∞. where here “×” denotes pointwise multiplication between matrices. We prove that the above quantity converges to zero uniformly, component-wise, i.e., ∣∣∣(Hnu− cn ×∇2u)(i0,j0)∣∣∣→ 162 5.2. Localisation – The smooth case 0 by considering two cases i0 6= j0 and i0 = j0. Case i0 6= j0 : Combining, (5.13), (5.14) and (5.16) we have that for every  > 0 there exists a δ > 0 such that∣∣∣∣∣ ˆ B(0,δ) d 2u(x, x+ z) |z|2 zi0zj0 |z|2 ρn(z)dz − 2 ∂u ∂xi0∂xj0 (x) ˆ B(0,δ) z2i0z 2 j0 |z|4 ρn(z)dz ∣∣∣∣∣ ≤ . (5.18) Using (5.18), we have for large enough n ∣∣∣(Hnu− cn ×∇2u)(i0,j0)∣∣∣ = ∣∣∣∣d(d+ 2)2 ˆ Rd d 2u(x, y) |x− y|2 (xi0 − yi0)(xj0 − yj0) |x− y|2 ρn(x− y)dy − ∂u ∂xi0∂xj0 (x)d(d+ 2) ˆ Rd z2i0z 2 j0 |z|4 ρn(z)dz ∣∣∣∣∣ ≤ d(d+ 2) 2 + d(d+ 2) 2 ∣∣∣∣∣ ˆ |z|≥δ d 2u(x, x+ z) |z|2 zi0zj0 |z|2 ρn(z)dz ∣∣∣∣∣ + d(d+ 2) 2 ∣∣∣∣ ∂u∂xi∂xj (x) ∣∣∣∣ ∣∣∣∣∣ ˆ |z|≥δ z2i0z 2 j0 |z|4 ρn(z)dz ∣∣∣∣∣ ≤ d(d+ 2) 2 + d(d+ 2) 2 4‖u‖∞ δ2 ˆ |z|≥δ ρn(z)dz + d(d+ 2) 2 ‖∇2u‖∞ ˆ |z|≥δ ρn(z)dz < d(d+ 2). Case i0 = j0 : This case is technically more difficult. Note that we cannot follow the corresponding proof for the i0 6= j0 case as all the pure second derivatives appear in (5.17). However we can take advantage of the fact that ∂ 2u ∂x2i0 (x) appears with different weight than ∂ 2u ∂x2i (x), for i 6= i0. Note that this is the reason for the correction |x−y| 2 d+2 in the diagonal elements of the non-local Hessian Hnu. As before, combining, (5.13), (5.14) and (5.17) we have that for every  > 0 there exists a δ > 0 such that∣∣∣∣∣ ˆ B(0,δ) d 2u(x, x+ z) |z|2 z2i0 |z|2 ρn(z)dz − d∑ i=1 ∂2u ∂x2i (x) ˆ B(0,δ) z2i z 2 i0 |z|4 ρn(z)dz ∣∣∣∣∣ ≤ . (5.19) By considering the inequality (5.19) for every i 6= i0 and summing over all i 6= i0 we get∣∣∣∣∣∣∣ d∑ i=1 i 6=i0 ˆ B(0,δ) d 2u(x, x+ z) |z|2 z2i |z|2 ρn(z)dz − d∑ i=1 i 6=i0 d∑ j=1 ∂2u ∂x2j (x) ˆ B(0,δ) z2j z 2 i |z|4 ρn(z)dz ∣∣∣∣∣∣∣ ≤ (d− 1). (5.20) 163 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications Observe now that using Lemma 5.2.1, the second term in the lefthand side of (5.20) can be written as follows: d∑ i=1 i 6=i0 d∑ j=1 ∂2u ∂x2j (x) ˆ B(0,δ) z2j z 2 i |z|4 ρn(z)dz = (d− 1) ∂2u ∂x2i0 (x) ˆ B(0,δ) z2i z 2 i0 |z|4︸ ︷︷ ︸ i 6=i0 ρn(z)dz + d∑ i=1 i 6=i0 ∂2u ∂x2i (x) ˆ B(0,δ) z4i |z|4 ρn(z)dz + d∑ i=1 i 6=i0 (d− 2)∂ 2u ∂x2i (x) ˆ B(0,δ) z2j z 2 i |z|4︸ ︷︷ ︸ j 6=i ρn(z)dz = d− 1 3 ∂2u ∂x2i0 (x) ˆ B(0,δ) z4i0 |z|4 ρn(z)dx + d∑ i=1 i 6=i0 (d+ 1) ∂2u ∂x2i (x) ˆ B(0,δ) z2j z 2 i |z|4︸ ︷︷ ︸ j 6=i ρn(z)dz. (5.21) Now, (5.20) combined with (5.21) give∣∣∣∣∣∣∣ 1 d+ 1 d∑ i=1 i 6=i0 ˆ B(0,δ) d 2u(x, x+ z) |z|2 z2i |z|2 ρn(z)dz (5.22) − d− 1 3(d+ 1) ∂2u ∂x2i0 (x) ˆ B(0,δ) z4i0 |z|4 ρn(z)dx− d∑ i=1 i 6=i0 ∂2u ∂x2i (x) ˆ B(0,δ) z2i0z 2 i |z|4 ρn(z)dz ∣∣∣∣∣∣∣ ≤ d− 1 d+ 1 . Using (5.19), (5.22) and the triangle inequality we can now vanish the terms ∂ 2u ∂x2i (x), for i 6= i0: ∣∣∣∣∣∣∣ ˆ B(0,δ) d 2u(x, x+ z) |z|2 z2i0 − 1d+1 ∑d i=1 i 6=i0 z2i |z|2 ρn(z)dy − 2d+ 4 3(d+ 1) ∂2u ∂x2i0 (x) ˆ B(z,δ) z4i0 |z|4 ρn(z)dz ∣∣∣∣∣ ≤ 2dd+ 1. (5.23) which is equivalent to∣∣∣∣∣d(d+ 2)2 ˆ B(0,δ) d 2u(x, x+ z) |z|2 z2i0 − 1d+2 |z|2 |z|2 ρn(z)dz 164 5.3. Localisation – The W 2,p(Rd) case − ∂ 2u ∂x2i0 (x)(A(d))i0,j0 ˆ B(0,δ) z4i0 |z|4 ρn(z)dz ∣∣∣∣∣ ≤ d2. (5.24) Notice that the last inequality (5.24) corresponds to inequality (5.18) for the i0 6= j0 case. Using (5.24) we can now finish the proof in a similar fashion. For large enough n we have ∣∣(Hnu− cn ×∇2u)(i0,i0)∣∣ = ∣∣∣∣∣d(d+ 2)2 ˆ Rd d 2u(x, y) |y − x|2 (yi0 − xi0)2 − 1d+2 |y − x|2 |y − x|2 ρn(x− y)dy − ∂ 2u ∂x2i0 (x)(A(d))i0,j0 ˆ Rd z4i0 |z|4 ρn(z)dz ∣∣∣∣∣ ≤ d2+ d(d+ 2) 2 ˆ |z|≥δ ∣∣d 2u(x, x+ z)∣∣ |z|2 ∣∣∣∣∣z2i0 − 1d+2 |z|2|z|2 ∣∣∣∣∣︸ ︷︷ ︸ ≤1 ρn(z)dz + ∣∣∣∣∣ ∂2u∂x2i0 (x) ∣∣∣∣∣ (A(d))i0,j0 ˆ |z|≥δ z4i0 |z|4︸︷︷︸ ≤1 ρn(z)dz ≤ d2+ d(d+ 2) 2 4‖u‖∞ δ2 ˆ |z|≥δ ρn(z)dz + ‖∇2u‖∞(A(d))i0,j0 ˆ |z|≥δ ρn(z)dz ≤ 2d2. This completes the proof of the Theorem. 5.3 Localisation – The W 2,p(Rd) case The objective of this section is to show that if u ∈W 2,p(Rd), 1 ≤ p <∞ then the non-local Hessian Hnu converges to ∇2u in Lp. The first step is to show that in that case, Hnu is indeed an Lp function. This follows from the estimate in the following lemma. Lemma 5.3.1. Suppose that u ∈W 2,p(Rd), where 1 ≤ p <∞. Then Hnu ∈ Lp(Rd,Rd×d) with ˆ Rd |Hnu(x)|pdx ≤M‖∇2u‖pLp(Rd,Rd×d), (5.25) where the constant M depends only on d and p. Proof. Recall, see [Bre83], that there exists a constant M = M(p, d) such that for every v ∈W 1,p(Rd) ˆ Rd |v(x− y)− v(x)|pdx ≤M |y|p‖∇v‖p Lp(Rd,Rd). (5.26) We first prove the result for functions v ∈ C∞(Rd) ∩W 2,p(Rd) and then we complete the 165 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications proof using a density argument. Using (5.26), Ho¨lder’s and Jensen’s inequality, equation (5.11) as well as Fubini’s theorem we have the following successive estimates (the constant is always denoted with M and everytime it depends only on d and p): ˆ Rd |Hnv(x)|pRd×ddx = ( d(d+ 2) 2 )p ˆ Rd ∣∣∣∣∣∣ ˆ Rd d 2v(x, y) |x− y|2 ( (x− y)⊗ (x− y)− |x−y|2d+2 Id ) |x− y|2 ρn(x− y)dy ∣∣∣∣∣∣ p dx ≤M ˆ Rd (ˆ Rd |d 2v(x, y)| |x− y|2 ρn(x− y)dy )p dx ≤M ˆ Rd (ˆ Rd |d 2v(x, y)|p |x− y|2p ρn(x− y)dy )ˆ Rd ρ(x− y)dy︸ ︷︷ ︸ =1  p/p′ dx (5.27) ≤M ˆ Rd (ˆ Rd | ´ 10 ∇v(x+ t(y − x))−∇v(x+ (t− 1)(y − x))dt|p|x− y|p |x− y|2p ρn(x− y)dy ) dx ≤M ˆ Rd (ˆ Rd ´ 1 0 |∇v(x+ t(y − x))−∇v(x+ (t− 1)(y − x))|pdt |x− y|p ρn(x− y)dy ) dx ≤M ˆ Rd (ˆ Rd (ˆ 1 0 ˆ 1 0 ∣∣∇2v(x+ (t+ s− 1)(y − x))∣∣p dsdt) ρn(x− y)dy) dx = M ˆ 1 0 ˆ 1 0 ˆ Rd (ˆ Rd (∣∣∇2v(x+ (t+ s− 1)(y − x))∣∣p) ρn(x− y)dy) dxdsdt = M ˆ 1 0 ˆ 1 0 ˆ Rd (ˆ Rd (∣∣∇2v(x+ (t+ s− 1)ξ)∣∣p) ρn(ξ)dξ) dxdsdt, = M ˆ 1 0 ˆ 1 0 ˆ Rd ( ρn(ξ) ˆ Rd (∣∣∇2v(x+ (t+ s− 1)ξ)∣∣p) dx) dξdsdt, = M ˆ 1 0 ˆ 1 0 ˆ Rd ρn(ξ)‖∇2v‖pLp(Rd,Rd×d)dξdsdt = M‖∇2v‖p Lp(Rd,Rd×d). Consider now a sequence (vk)k∈N in C∞(Rd)∩W 2,p(Rd) approximating u in W 2,p(Rd). We already have from above that ˆ Rd (ˆ Rd |d 2vk(x, y)|p |x− y|2p ρn(x− y)dy ) dx ≤M‖∇2vk‖Lp(Rd,Rd×d), ∀k ∈ N, (5.28) or ˆ Rd (ˆ Rd |d 2vk(x, x+ z)|p |z|2p ρn(z)dz ) dx ≤M‖∇2vk‖Lp(Rd,Rd×d), ∀k ∈ N. (5.29) Since vk converges to u in L p(Rd) we have that there exists a subsequence (vk`)`∈N con- 166 5.3. Localisation – The W 2,p(Rd) case verging to u almost everywhere. From an application of Fatou’s lemma we get that for every x ∈ Rd ˆ Rd |d 2u(x, x+ z))|p |z|2p ρn(z)dz︸ ︷︷ ︸ F (x) ≤M lim inf `→∞ ˆ Rd |d 2vk`(x, x+ z)|p |z|2p ρn(z)dz︸ ︷︷ ︸ Fk` (x) . Applying one more time Fatou’s Lemma to (Fk`)`∈N and F we get that ˆ Rd ˆ Rd |d 2u(x, x+ z))|p |z|2p ρn(z)dzdx ≤M lim inf`→∞ ˆ Rd ˆ Rd |d 2vk`(x, x+ z))|p |z|2p ρn(z)dzdx ≤M lim inf `→∞ ‖∇2vk`‖Lp(Rd,Rd×d) = M‖∇2u‖Lp(Rd,Rd×d). From the last inequality and the fact that (5.27) holds for W 2,p functions as well, the proof is complete. We now have the necessary tools to prove the localisation for W 2,p functions. Theorem 5.3.2. Let 1 ≤ p <∞. Then for every u ∈W 2,p(Rd) we have that Hnu→ ∇2u in Lp(Rd,Rd×d) as n→∞. Proof. We notice first that the result holds for functions v ∈ C2c (Rd). In order to see that observe that it suffices to show that for every  > 0, there exists a constant L > 1 such that sup n∈N ˆ B(0,Lr)c |Hnv(x)|pdx ≤ , (5.30) where B(0, r) is a ball containing the support of v. In that case we can estimate as follows ˆ Rd |Hnv(x)−∇2v(x)|pdx ≤ ˆ B(0,Lr) |Hnv(x)−∇2v(x)|pdx+ , (5.31) and using Theorem 5.2.2 we derive lim sup n→∞ ˆ Rd |Hnv(x)−∇2v(x)|pdx ≤ , from where the result follows since  is arbitrary. In order to get (5.30) we apply Jensen’s inequality with respect to the measure ρnLd ˆ B(0,Lr) |Hnv(x)|pdx ≤ ˆ B(0,Lr)c ˆ Rd |v(y)− 2v(x) + v(x+ (x− y))|p |x− y|2p ρn(x− y)dydx 167 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications = ˆ B(0,Lr)c ˆ B(0,r) |v(y)|p |x− y|2p ρn(x− y)dydx + ˆ B(0,Lr)c ˆ {y:x+(x+y)∈B(0,r)} |v(x+ (x− y))|p |x− y|2p ρn(x− y)dydx. Letting z = x+ (x− y), we obtain ˆ B(0,Lr)c ˆ {y:x+(x+y)∈B(0,r)} |v(x+ (x− y))|p |x− y|p ρn(x− y)dydx = ˆ B(0,Lr)c ˆ B(0,r) |v(z)|p |z − x|2p ρn(z − x)dzdx, and therefore by the symmetry of ρn we have ˆ B(0,Lr)c |Hnv(x)|pdx ≤ 2 ˆ B(0,Lr)c ˆ B(0,r) |v(y)|p |x− y|2p ρn(x− y)dydx ≤ 2 (L− 1)2p ˆ B(0,Lr)c ˆ B(0,r) |v(y)|pρn(x− y)dydx ≤ 2 (L− 1)2p ‖ρn‖L1(Rd)‖v‖ p Lp(Rd). Thus (5.30) follows by choosing L to be large enough. We use now the fact that that C∞c (Rd) and hence C2c (Rd) is dense in W 2,p(Rd), see for example [Bre83]. Let  > 0, then from density we have that there exists a function v ∈ C2c (Rd) such that ‖∇2u−∇2v‖Lp(Rd,Rd×d) ≤ . Thus using also Lemma 5.3.1 we have ‖Hnu−∇2u‖Lp(Rd,Rd×d) ≤ ‖Hnu−Hnv‖Lp(Rd,Rd×d) + ‖Hnv −∇2v‖Lp(Rd,Rd×d) + ‖∇2v −∇2u‖Lp(Rd,Rd×d) ≤ C+ ‖Hnv −∇2v‖Lp(Rd,Rd×d) + , Taking limits as n→∞ we get lim sup n→∞ ‖Hnu−∇2u‖Lp(Rd,Rd×d) ≤ (C + 1), and thus we conclude that lim n→∞ ‖Hnu−∇ 2u‖Lp(Rd,Rd×d) = 0. 168 5.4. Second order non-local integration by parts 5.4 Second order non-local integration by parts Before we proceed to the localisation of Hn for the BV2 case we first need to introduce a second order non-local integration by parts formula which is an essential tool for the proofs of the next section. We define the second order non-local divergence of a function φ = (φij) d i,j=1 as D2nφ(x) = d(d+ 2) 2 ˆ Rd d 2φ(x, y) |x− y|2 · ( (x− y)⊗ (x− y)− |x−y|2d+2 Id ) |x− y|2 ρn(x− y)dy. (5.32) where here A · B = ∑di,j=1AijBij for two d × d matrices A and B. Notice that (5.32) is well defined for φ ∈ C2c (Rd,Rd×d). Theorem 5.4.1 (Second order non-local integration by parts formula). Suppose that u ∈ Lp(Rd), 1 ≤ p <∞, |d 2u(x,y)||x−y|2 ρn(x− y) ∈ L1(Rd×Rd) and let φ ∈ C2c (Rd,Rd×d). Then ˆ Rd Hρu(x) · φ(x)dx = ˆ Rd u(x)D2ρφ(x)dx. (5.33) Proof. The proof is very similar with the corresponding proof in [MS13]. Using Fubini’s and dominated convergence theorem we have ˆ RN Hρu(x)φ(x)dx = d(d+ 2) 2 lim →0 ˆ Rd ˆ Rd\B(x,) d 2u(x, y) |x− y|2 ( (x− y)⊗ (x− y)− |x−y|2d+2 Id ) |x− y|2 ρ(x− y) · φ(x)dydx = d(d+ 2) 2 lim →0 ˆ Dd d 2u(x, y) |x− y|2 ( (x− y)⊗ (x− y)− |x−y|2d+2 Id ) |x− y|2 ρ(x− y) · φ(x)d(L d)2(x, y), where Dd := Rd × Rd \ {|x− y| < }. Similarly we have ˆ Rd u(x)D2ρφ(x)dx = d(d+ 2) 2 lim →0 ˆ Rd ˆ Rd\B(x,) u(x) d∑ i,j=1 d 2φij(x, y) |x− y|2 (xi − yi)(xj − yj)− δij |x−y| 2 d+2 |x− y|2 ρ(x− y)dydx = d(d+ 2) 2 lim →0 ˆ Dd u(x) d∑ i,j=1 d 2φij(x, y) |x− y|2 (xi − yi)(xj − yj)− δij |x−y| 2 d+2 |x− y|2 ρ(x− y)d(L d)2(x, y), where, for notational convenience, we used the standard convention δij = 1 if i = j,0 if i 6= j. 169 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications Thus, it suffices to show that for every i, j and  > 0 we have ˆ Dd d 2u(x, y) |x− y|2 (xi − yi)(xj − yj)− δij |x−y| 2 d+2 |x− y|2 ρ(x− y)φij(x)d(L d)2(x, y) = ˆ Dd u(x) d 2φij(x, y) |x− y|2 (xi − yi)(xj − yj)− δij |x−y| 2 d+2 |x− y|2 ρ(x− y)d(L d)2(x, y). (5.34) In order to show (5.34), it suffices to prove ˆ Dd u(y)φij(x) |x− y|2 (xi − yi)(xj − yj)− δij |x−y| 2 d+2 |x− y|2 ρ(x− y)d(L d)2(x, y) = ˆ Dd u(x)φij(y) |x− y|2 (xi − yi)(xj − yj)− δij |x−y| 2 d+2 |x− y|2 ρ(x− y)d(L d)2(x, y), (5.35) and ˆ Dd u(x+ (x− y))φij(x) |x− y|2 (xi − yi)(xj − yj)− δij |x−y| 2 d+2 |x− y|2 ρ(x− y)d(L d)2(x, y) = ˆ Dd u(x)φij(x+ (x− y)) |x− y|2 (xi − yi)(xj − yj)− δij |x−y| 2 N+2 |x− y|2 ρ(x− y)d(L d)2(x, y). (5.36) Equation (5.35) can be easily showed by alternating x and y and using the symmetry of the domain. Finally equation (5.36) can be proved by employing the substitution u = 2x− y, v = 3x− 2y, noting that x− y = v − u and that the determinant of the Jacobian of this substitution is −1. The following lemma shows the convergence of the second order non-local divergence to the continuous analogue div2φ, where φ ∈ C2c (Rd,Rd×d) and div2φ := d∑ i,j=1 ∂φij ∂xi∂xj . Lemma 5.4.2. Let φ ∈ C2c (Rd,Rd×d). Then for every 1 ≤ p ≤ ∞ we have lim n→∞ ‖D 2 nφ− div2φ‖Lp(Rd) = 0. (5.37) Proof. The convergence for p = ∞, follows immediately from Theorem 5.2.2 and it also holds for 1 ≤ p <∞ from Theorem 5.3.2. 170 5.5. Localisation – The BV2(Rd) case 5.5 Localisation – The BV2(Rd) case In this section we prove that for functions u ∈ BV2(Rd) the sequence Hn localises to D2u weakly∗ in the sense of measures as n tends to infinity. In order to show that, we first need to prove that the non-local Hessian is well defined for BV2 functions. We do that in the following lemma. Lemma 5.5.1. Suppose that u ∈ BV2(Rd). Then Hnu ∈ L1(Rd,Rd×d) with ˆ RN |Hnu(x)|dx ≤M |D2u|(Rd), (5.38) where the constant M depends only on d. Proof. Let (uk)k∈N be a sequence of functions in C∞(Rd) that converges strictly to u in BV2(Rd). From the same calculations as in the proof of Lemma 5.3.1 we have for every k ∈ N ˆ Rd |Hnuk(x)|dx ≤M‖∇2uk‖L1(Rd,Rd×d). Using Fatou’s Lemma in a similar way as in Lemma 5.3.1 we get ˆ Rd |Hnu(x)|dx ≤M lim inf k→∞ |D2uk|(Rd) = |D2u|(Rd). We can now proceed to the proof of the localisation result for BV2 functions. We define (µn)n∈N to be the sequence of Rd×d–valued finite Radon measures with µn := HnuLd. Theorem 5.5.2. Let u ∈ BV2(Rd). Then µn → D2u, weakly∗ in measures, i.e., for every φ ∈ C0(Rd,Rd×d) lim n→∞ ˆ Rd Hnu(x) · φ(x)dx = ˆ Rd φ(x) dD2u. (5.39) Proof. Since C∞c (Rd,Rd×d) is dense in C0(Rd,Rd×d) it suffices to prove (5.39) for every ψ ∈ C∞c (Rd,Rd×d). Indeed, suppose we have done this, let  > 0 and let φ ∈ C0(Rd,Rd×d) and ψ ∈ C∞c (Rd,Rd×d) such that ‖φ− ψ‖∞ < . Then, using also the estimate (5.38), we 171 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications have∣∣∣∣ˆ Rd Hn(x) · φ(x)dx− ˆ Rd φ(x) dD2u ∣∣∣∣ ≤ ∣∣∣∣ˆ Rd Hn(x) · (φ(x)− ψ(x))dx ∣∣∣∣ + ∣∣∣∣ˆ Rd Hn(x) · ψ(x)dx− ˆ Rd ψ(x) dD2u ∣∣∣∣ + ∣∣∣∣ˆ Rd (φ(x)− ψ(x)) dD2u ∣∣∣∣ ≤  ˆ R |Hnu(x)|dx + ∣∣∣∣ˆ Rd Hn(x) · ψ(x)dx− ˆ Rd ψ(x) dD2u ∣∣∣∣ + |D2u|(Rd) ≤M|D2u|(Rd) + ∣∣∣∣ˆ Rd Hn(x) · ψ(x)dx− ˆ Rd ψ(x) dD2u ∣∣∣∣ + |D2u|(Rd). Taking the limit n→∞ from both sides of the above inequality we get lim sup n→∞ ∣∣∣∣ˆ Rd Hn(x) · φ(x)dx− ˆ Rd φ(x) dD2u ∣∣∣∣ ≤ M˜, for a constant M˜ and since  is arbitrary we have (5.39). We thus proceed to prove (5.39) for compactly supported smooth functions. From the estimate (5.38) we have that (|µn|)n∈N is bounded, thus there exists a subsequence (µnk)k∈N and a Rd×d–valued Radon measure µ such that µnk converges to µ weakly ∗. This means that for every ψ ∈ C∞c (Rd,Rd×d) we have lim k→∞ ˆ Rd Hnk(x) · ψ(x)dx = ˆ Rd ψ(x) · dµ. On the other hand from the integration by parts formula (5.33) and Lemma 5.4.2 we get lim k→∞ ˆ Rd Hnk(x) · ψ(x)dx = lim k→∞ ˆ Rd u(x)D2nkψ(x)dx = ˆ Rd u(x)div2ψ(x)dx = ˆ Rd ψ(x) · dD2u. This means that µ = D2u. Observe now that since we actually deduce that every sub- sequence of (µn)n∈N has a further subsequence that converges to D2u weakly∗, then the initial sequence (µn)n∈N converges to D2u weakly∗. Let us note here that in the case d = 1, we can also prove strict convergence of the 172 5.6. Non-local characterisation of W 2,p(Rd) and BV2(Rd) measures µn to D 2u, that is, in addition to (5.39) we also have |µn|(R)→ |D2u|(R). Theorem 5.5.3. Let d = 1. Then the sequence (µn)n∈N converges to D2u strictly as measures, i.e, µn → D2u, weakly∗ in measures, (5.40) and |µn|(R)→ |D2u|(R). (5.41) Proof. The weak∗ convergence was proved in Theorem 5.5.2. Since the total variation norm is lower semicontinuous with respect to the weak∗ convergence, we also have |D2u|(R) ≤ lim inf n→∞ |µn|(R). (5.42) Thus it suffices to show that lim sup n→∞ |µn|(R) ≤ |D2u|(R). (5.43) Note that in dimension one, the non-local Hessian formula is written as Hnu(x) = ˆ R u(y)− 2u(x) + u(x+ (x− y)) |x− y|2 ρn(x− y)dy. (5.44) Following the proof of Lemma 5.3.1, we can easily verify that for v ∈ C∞(R)∩BV2(R) we have ˆ R |Hnv(x)|dx ≤ ‖∇2v‖L1(R), i.e., the constant M that appears in the estimate (5.25) is equal to one. Using Fatou’s Lemma and the BV2 strict approximation of u by smooth functions we get that |µn|(R) = ˆ R |Hnu(x)|dx ≤ |D2u|(R), from where (5.43) follows straightforwardly. 5.6 Non-local characterisation of W 2,p(Rd) and BV2(Rd) Characterisation of Sobolev and BV spaces in terms of non-local, derivative-free energies has been done so far only in the first order case, see [BBM01, Pon04, Men12, MS13]. Here we characterise the spaces W 2,p(RN ), 1 < p < ∞ and BV2(RN ) using our definition of non-local Hessian. For that we need to define the non-local Hessian as a distribution. Note that so far we have used the non-local Hessian operator Hn for regular enough functions so 173 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications that formula (5.1) is well-defined. Indeed, the estimates (5.25) and (5.38) show that (5.1) is well defined for W 2,p(Rd), 1 ≤ p < ∞, and BV2(Rd) functions. One can easily check that it is also well defined for W 2,∞(Rd) functions. Moreover, it is very easy to check that if u ∈ C2c (Rd) then the following estimate hold |Hnu(x)| ≤ d 2(d+ 2) 2 ‖∇2u‖L∞(Rd), ∀x ∈ Rd. (5.45) In fact as the proof of Theorem 5.4.1 shows, a sufficient condition for the well-posedness of (5.1) is |d 2u(x, y)| |x− y|2 ρn(x− y) ∈ L 1(Rd × Rd), (5.46) and in that case we have that Hnu ∈ L1(Rd). However, we define Hn as a distribution, extending thus the definition for arbitrary Lp functions 1 ≤ p <∞. Definition 5.6.1 (Distributional definition of non-local Hessian). Let u ∈ Lp(Rd) for some p ∈ [1,∞). We define the distributional non-local Hessian Hnu as 〈Hnu, φ〉 = ˆ Rd u(x)D2nφ(x)dx, ∀φ ∈ C2c (Ω,Rd×d). (5.47) Observe that since φ ∈ W 2,p(Rd) for every p ∈ [1,∞], estimates (5.25) and (5.45) imply that D2nφ ∈ Lp(Rd) for every p ∈ [1,∞] and that (5.47) is well defined from Ho¨lder’s inequality. Note also that Hnu is indeed a distribution. In order to see that observe that if u ∈ L1(Rd) then estimate (5.45) gives 〈Hnu, φ〉 ≤M‖u‖L1(Rd)‖∇2φ‖∞, (5.48) where M depends only on d. If u ∈ Lp(Rd) with 1 < p <∞ then from the estimate 5.3.1 and the fact that ∇2φ is of compact support we have 〈Hnu, φ〉 ≤M‖u‖Lp(Rd)‖∇2φ‖Lq(Rd,Rd×d) ≤M‖u‖Lp(Rd)‖∇2φ‖∞, (5.49) where 1/p+1/q = 1 and the constant M depends only on p and d. Finally, the integration by parts formula (5.33) shows that if (5.46) holds (for example in the case of W 2,p(Rd), BV2(Rd) and C2c (Ω,Rd×d) functions) then the distribution Hnu can be represented by the function Hnu. We are now ready to prove our characterisations. Theorem 5.6.2. Let u ∈ Lp(Rd) for some 1 < p <∞. Then u ∈W 2,p(Rd) ⇐⇒ Hnu ∈ Lp(Rd,Rd×d) ∀n ∈ N and lim inf n→∞ ˆ Rd |Hnu(x)|pdx <∞. (5.50) 174 5.6. Non-local characterisation of W 2,p(Rd) and BV2(Rd) Let u ∈ L1(Rd). Then u ∈ BV2(Rd) ⇐⇒ Hnu ∈ L1(Rd,Rd×d) ∀n ∈ N and lim inf n→∞ ˆ Rd |Hnu(x)|dx <∞. (5.51) Proof. Firstly, we prove (5.50). Suppose that u ∈W 2,p(Rd). Then, Lemma 5.3.1 gives lim inf n→∞ ˆ Rd |Hnu(x)|pdx ≤M‖∇2u‖pLp(Rd) <∞. Suppose now conversely that lim inf n→∞ ˆ Rd |Hnu(x)|pdx <∞. This means that up to a subsequence, Hnu is bounded in Lp(Rd,Rd×d), thus there exists a subsequence (Hnku)k∈N and v ∈ Lp(Rd,Rd×d) such thatHnku→ v weakly in Lp(Rd,Rd×d). Thus, using the definition of Lp weak convergence together with the non-local integration by parts formula and Lemma 5.4.2, we have for every ψ ∈ C∞c (Rd,Rd×d), ˆ Rd v(x) · ψ(x)dx = lim k→∞ ˆ Rd Hnku(x) · ψ(x)dx = lim k→∞ ˆ Rd u(x)D2nkψ(x)dx = ˆ Rd u(x)div2ψ(x)dx, something that shows that v = ∇2u ∈ Lp(Rd,Rd×d) is the second order weak derivative of u. Now since the second order distributional derivative is a function, the same is true for the first distributional derivatives and they belong to Lp and thus u ∈W 2,p(Rd). This is a consequence of the Gagliardo–Nirenberg interpolation inequality [Nir59] ‖∇u‖Lp(Rd,Rd) ≤ C‖∇2u‖ 1 2 Lp(Rd,Rd×d)‖u‖ 1 2 Lp(Rd), (5.52) where the constant C depends only on d. We now proceed in proving (5.51). Again supposing that u ∈ BV2(Rd) we have that Lemma 5.5.1 gives us lim inf n→∞ ˆ Rd |Hnu(x)|dx ≤ C|D2u|(Rd). Suppose now that lim inf n→∞ ˆ RN |Hnu(x)|dx <∞. Considering again the measures µn = HnuLd we have that there exists a subsequence 175 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications (µnk)k∈N and a finite Radon measure µ such that µnk → µ weakly∗. Then for every ψ ∈ C∞c (Rd,Rd×d) we have, similarly as before, ˆ Rd ψ · dµ = lim k→∞ ˆ Rd Hnku(x) · ψ(x)dx = lim k→∞ ˆ Rd u(x)D2nkψ(x)dx = ˆ Rd u(x)div2ψ(x)dx, something that shows that µ = D2u. Thus, u ∈ BV2(Rd), since if the second order distributional derivative is a finite Radon measure then the first order one is an L1 function. This again due to a special form of the Gagliardo–Nirenberg interpolation inequality for W 2,1 functions this time. In order to see that, notice first that from [TS80] we have first that Du ∈ L1loc(Rd). Fix an Ω b Rd, then we have that ubΩ ∈ BV2(Ω). From Theorem 3.3.7 we can find a sequence (un)n∈N ∈W 2,1(Ω) that converges to ubΩ strictly in BV2(Ω). From the definition of strict convergence together with the lower semicontinuity of total variation with respect to the strong L1 convergence we have ‖∇(ubΩ)‖L1(Ω) ≤ lim inf n→∞ ‖∇un‖L1(Ω) ≤ lim inf n→∞ C‖∇ 2un‖ 1 2 L1(Ω,Rd×d)‖un‖ 1 2 L1(Ω) = C‖∇2(ubΩ)‖ 1 2 L1(Ω,Rd×d)‖ubΩ‖ 1 2 L1(Ω) ≤ C‖∇2u‖ 1 2 L1(Rd,Rd×d)‖u‖ 1 2 L1(Rd), from where we deduce that ∇u ∈ L1(Rd) with ‖∇u‖L1(Rd) ≤ C‖∇2u‖ 1 2 L1(Rd,Rd×d)‖u‖ 1 2 L1(Rd), and hence u ∈ BV2(Rd). 5.7 An alternative non-local Hessian approach and an ap- plication to image denoising In the previous sections we saw that the explicit formulation of non-local Hessian defined in (5.1) has some good analytical properties, i.e., the sequence (Hnu)n∈N converges to the local analogue ∇2u as the non-locality vanishes. It also led to novel non-local character- isations of the spaces W 2,p(Rd) and BV2(Rd). However the formulation (5.1) is not so convenient for implementation in variational problems in imaging. One reason is that it cannot be defined straightforwardly for functions u : Ω → R, where Ω is a bounded do- 176 5.7. An alternative non-local Hessian approach and an application to image denoising main as the point x+ (x−y) does not necessarily belong to Ω whenever x ∈ Ω. Moreover, the radial symmetry of the weighting functions ρn is in practice not desirable for imaging purposes. For example, given that a point x of an image u lies next to an edge, we would like to have the freedom to define a form of non-local Hessian of u at x using a weighting function that puts more weight on points that lie on the same side of x, see Figure 5.2. That would eventually lead to an edge preservation regularisation model. (a) A point near an edge. (b) Weighting of the neighbour- ing pixels that correspond to a radially symmetric weight- ing function. (c) Weighting of the neighbour- ing pixels that correspond to weighting function aware of discontinuities. Figure 5.2: Radially symmetric weighting function versus a weighting function aware of discontinuities. To that scope we define an implicit formulation of a non-local Hessian Hσxu(x) and non-local gradient Gσxu(x) of a function u at a point x with a weighting function σx as follows: Definition 5.7.1 (Implicit definition of non-local Hessian). Let Ω ⊆ Rd be an open bounded domain and u ∈ L2(Ω). We define Gσxu(x) and Hσxu(x) to be the non-local gradient and Hessian respectively, of u at a point x ∈ Ω with a positive weight function σx ∈ L∞(Ω \ {x}) as the minimisers of (Hσxu(x),Gσxu(x)) := argmin H′∈Rd×d,G′∈Rd 1 2 ˆ Ω−{x} Ru,G′,H′(x, z)2σx(z) dz, (5.53) where Ru,G′,H′(x, z) = u(x+ z)− u(x)−G′>z − 1 2 z>H ′z. (5.54) The idea behind Definition 5.7.1 is that at a given point x, we choose Hσxu(x) and Gσxu(x) so that the remaining values of u best fit a quadratic model with gradient Gσxu(x) and Hessian Hσxu(x) located at x. In this section we work with the discretised version of (5.53)–(5.54) in dimension 2, i.e., the integral in (5.53) is interpreted as a sum and Ω as an N ×M grid. We can now 177 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications define the corresponding non-local Hessian regularised minimisation problems as follows: Ls–H regularisation min u∈RN×M 1 s ‖u− f‖ss + α‖Hσu‖1, α > 0, s = 1, 2, (5.55) subject to (Hσxu(x),Gσxu(x)) = argmin H′∈R2×2,G′∈R2 1 2 ˆ Ω−{x} Ru,G′,H′(x, z)2σx(z) dz, (5.56) Ru,G′,H′(x, z) = u(x+ z)− u(x)−G′>z − 1 2 z>H ′z. (5.57) Note that the optimality conditions of the quadratic minimisation problem (5.56)– (5.57) are linear. Thus essentially, problem (5.55)–(5.57) is an optimisation problem with convex objective and linear constraints. We can solve this problem for instance, using CVX in MATLAB, a package for specifying and solving convex programs [GB14, GB08]. Our next concern is how to choose the weighting function σx at a point x. As we have discussed previously ideally this function has to be chosen in a way such that points that lie across an edge have low or zero weights. We do that by defining first a positive cost function cx : Ω→ R, where cx(y) must be high when we cannot move from x to y without passing from points with large gradient (edge). A function like that can be defined by solving a weighted Eikonal equation ‖∇cx‖2 = φ(∇f), cx(x) = 0, (5.58) where φ(∇f) = ‖∇f‖22 + , for a small  > 0. Equation (5.58) can be solved efficiently using a fast marching method [Set99, OF03]. Defining cx as above we have that cx(y) is high if x and y are separated by a strong edge. For efficiency in the numerical implementation we do not use all the points y ∈ Ω for the formulation of Hσxu(x) but only those K ones with the shortest distance cx(y) to x. That is to say, we sort the neighbours of x, y1, y2, . . . such that cx(y1) ≤ cx(y2), . . . , cx(yK) ≤ . . . and we define the weighting function σx we follows: σx(yi) =  1(cx(yi))2 if i ≤ K,0, if i > K. (5.59) This number K can be predefined and for our numerical implementations we use K = 12, while we also set  = 0.01. 178 5.7. An alternative non-local Hessian approach and an application to image denoising (a) Clean image (b) Noisy image, SSIM=0.3261. PSNR=22.82 (c) TV denoised image, SSIM=0.8979, PSNR=31.82 (d) TGV denoised image, SSIM=0.9249, PSNR=33.29 (e) L2–H denoised image, α=0.2, SSIM=0.9262, PSNR= 34.39 (f) L1–H denoised image, α=1.2, SSIM=0.9766, PSNR= 35.97 Figure 5.3: Denoising example for Gaussian noise of variance 0.005. (a) Clean image (b) Noisy image, SSIM=0.3261. PSNR=22.82 (c) TV denoised image, SSIM=0.8979, PSNR=31.82 (d) TGV denoised image, SSIM=0.9249, PSNR=33.29 (e) L2–H denoised image, SSIM=0.9262, PSNR= 34.39 (f) L1–H denoised image, SSIM=0.9766, PSNR= 35.97 Figure 5.4: Corresponding middle row slices for the results of Figure 5.3. 179 Non-local Hessian: Localisation results, characterisation of higher order Sobolev and BV spaces and applications Figure 5.3 depicts a denoising example for the same synthetic example we used in Section 3.6 (Gaussian noise of variance 0.005). In Figure 5.4 we show the corresponding middle row slices. We observe that the L2 fidelity based, non-local Hessian denoised image, is of higher quality then the TGV denoised one, as this is confirmed by both PSNR and SSIM quality measures, see Figure 5.3(e). Notice however that we can achieve a far better result using the L1 norm in the fidelity term, Figure 5.3(f), despite the fact that the noise is Gaussian. This is because this special selection of the weights σx allows for a use of a quite high regularisation parameter, (α = 1.2 in that example), while still preserving the edges. In the same time the L1 fidelity term is capable of preserving the contrast better than the L2 case, compare the slices in Figures 5.4(e) and 5.4(f). 180 Chapter 6 Conclusions Higher order variational regularisation methods have been an active field of research among the mathematical imaging community. These kind of methods have been proposed to tackle both classical and modern image processing tasks. The high quality of the recon- structions make them appealing for a variety of real world applications. In the same time, novel mathematical techniques are developed, aimed to implement these methods as well as to analyse their properties. We provide here a short summary about how the present thesis contributed to this field and we give some directions for future research work. In Chapter 3 we introduced a combined first and second order method for image restoration. After a rigorous analysis of the well-posedness of the model, we applied the method for image denoising, deblurring and inpainting, providing also an online user- friendly implementation for the latter task in IPOL’s website. The simplicity of the model combined with the split Bregman algorithm leads to a very efficient implementation, in- dicating the suitability of the method as a fast pre-processing technique that approaches the state of the art results as far as restoration quality in concerned. We think that a fully automatisation of the method, i.e., automatic selection of the optimal parameters α and β, is a usefull research direction that will enhance further the applicability of the method. Among other techniques, this optimal parameter learning can be achieved for instance through bilevel optimisation approaches [DlRS12, KP13, CDlRS14]. In Chapter 4 we contributed to the analysis of the second order total generalised variation model, a state of the art higher order regulariser, focusing mainly in the one dimensional case. We exhibited results concerning the structure of solutions of the L2– TGV problem as well as the relationship of TV and TGV both in dimension one and two. We computed exact solutions for simple data functions, providing further insights into the regularising mechanism of TGV. A future research direction involves extension of these results in higher dimensions and investigation of the analytical properties of the model for other imaging tasks, e.g., deblurring. Identifying ground states for TGV in the spirit of [BB13] is another possibility. 181 Conclusions Finally in Chapter 5 we introduced and analysed a non-local Hessian functional. As the non-locality vanishes, we proved the convergence of the non-local Hessian of a function u to the continuous analogue with respect to the natural topology that corresponds to the regularity of u. This analysis led to novel characterisations of some higher order Sobolev and BV spaces, namely, W 2,p(Rd) and BV2(Rd). Defining a non-local Hessian in an alternative, implicit way, resulted in a model that can be easily handled numerically for image processing purposes. We gave a flavour of this application in image denoising, showing that the model is capable of obtaining true piecewise affine reconstructions, being in that sense, superior to TGV. Future work involves the investigation of the application of the method to other imaging tasks as well as a study of the relationship between the explicit and the implicit formulation of the non-local Hessian functional. 182 Appendix A A.1 Exclusion of the cases N-0-P2-J, N-0-P1-C, N-0-P2-C In Section 4.6.1 we mentioned that solutions of the type N-0-P1-J, N-0-P2-J, N-0-P1-C and N-0-P2-C cannot occur when we consider the one dimensional L2–TGV2β,α minimisation problem with data fp.c.. In Proposition 4.6.2 we proved that fact for solutions of the type N-0-P1-J. Here, we do the same for the cases N-0-P2-J, N-0-P1-C and N-0-P2-C in Propositions A.1.1, A.1.2 and A.1.3 respectively. Proposition A.1.1. The case N-0-P2-J cannot occur. Proof. Suppose that there exist points 0 < x1 < x˜1 < x2 < L such that u < 0 in (0, x1), u = 0 in (x1, x˜1) and u > 0 in (x˜1, L). Suppose further that u ′ has a positive jump at x2 and u has a jump discontinuity at x = L. Similarly to the proof of Proposition 4.6.2 we can show that the gradient of u in (0, x1) and in (x˜1, x2) is the same, say equal to c, and that w = c in (0, x2). As in Proposition 4.6.2 we set `1 = x1, `2 = x˜1 − x1, `3 = x2 − x˜1, `4 = L− x3 with `1 + `2 + `3 + `4 = L, and v1 = vb(0, x1), v2 = vb(x1, x˜1), v3 = b(x˜1, x2), v4 = b(x2, L). As in Proposition 4.6.2 we have v1(x) = − α 3`21 x3 + α `1 x2, x ∈ (0, x1), (A.1) v2(x) = α(x− `1) + 2α`1 3 , x ∈ (x1, x˜1), (A.2) v3(x) = − α 3`21 (x− `1 − `2)3 + α(x− `1 − `2) + α`2 + 2α`1 3 , x ∈ (x˜1, x2). (A.3) 183 Now the extra conditions for v3 are v′3(`1 + `2 + `3) = 0, (A.4) and v3(`1 + `2 + `3) = β. (A.5) Condition (A.4) gives `3 = `1, (A.6) while condition (A.5) in combination with (A.6) gives 4 3 `1 + `2 = β α . (A.7) The function v4 will be also a cubic function v4(x) = a4x 3 + b4x 2 + c4x+ d4, x ∈ (x2, L). We have v4 must satisfy the conditions at x = `1 + `2 + `3: v4(`1+`2+`3) = β, v ′ 4(`1+`2+`3) = 0, v ′′ 4(`1+`2+`3) = v ′′ 3(`1+`2+`3) = − 2α `1 , (A.8) from which we get that v4 = α4(x− `1 − `2 − `3)3 − α `1 (x− `1 − `2 − `3)2 + β, (A.9) plus the following conditions on x = `1 + `2 + `3 + `4: v4(`1 + `2 + `3 + `4) = 0, v ′ 4(`1 + `2 + `3 + `4) = −α, (A.10) from which, in combination with (A.9) we get a4` 3 4 − α `1 `24 + β = 0, (A.11) 3a4` 2 4 − 2α `1 `4 = −α. (A.12) Multiplying (A.12) by `4 and using (A.11) we end up to `24 + `1`4 − γ`1 = 0, (A.13) where we define γ := 3 β α . 184 A.1. Exclusion of the cases N-0-P2-J, N-0-P1-C, N-0-P2-C Solving (A.13) with respect to `4 we get the (positive solution): `4 = √ `21 + 4γ`1 − `1 2 . (A.14) We now use the fact that `1 +`2 +`3 +`4 = L in combination with (A.6), (A.7) and (A.14) and we get √ `21 + 4γ`1 2 = L− γ 3 − `1 6 . (A.15) Squaring up both sides of (A.15) after a bit of algebra we end up to the following quadratic equation for `1 2`21 + `1(8γ + 3L)− (3L− γ)2 = 0. (A.16) whose only possible positive root is `1 = 3 √ 9L2 + 8γ2 − (8γ + 3L) 4 . (A.17) However we must make sure that `2 > 0 thus from (A.7) we must impose 0 < `1 < γ/4. (A.18) We claim now that the condition |v′4| ≤ α is violated. Notice firstly that a4 > 0. Indeed from (A.12) we have that a4 > 0 if and only if `4 > `1/2 and using (A.14) that is if and only if `1 < 4 3γ which is true from (A.18). Using that and the fact that v ′ 4(`1 +`2 +`3) = 0, v′4(L) = −α if v′′4(x) = 0 for some x ∈ (`1 + `2 + `3, L) then v′4(x) < v′4(L) = −α and we are done. Indeed we have that v′′4 vanishes at x0 = `1 + `2 + `3 + α 3a4`1 and we have that x0 < L if and only if `4 > `1 which again from (A.14) that is if and only if `1 < γ 2 which is true from (A.18). We conclude that neither this type of solution is possible. Proposition A.1.2. The case N-0-P1-C cannot occur. Proof. Suppose that there exist points 0 < x1 < x˜1 < L such that u < 0 in (0, x1), u = 0 in (x1, x˜1) and u > 0 in (x˜1, L). Suppose further that u ′ is constant in (x˜1, L) and u is continuous at x = L. Similarly to the proof of Propositions 4.6.2 and A.1.1 we can show that the gradient of u in (0, x1) and in (x˜1, L) is the same, say equal to c, and that w = c in (0, L). Note that due to symmetry we have that condition u(L) = h/2 holds. As in the Propositions 4.6.2 and A.1.1 we have that, setting `1 = x1, `2 = x˜1 − x1, `3 = L− x˜1, v1(x) = − α 3`21 x3 + α `1 x2, x ∈ (0, x1), (A.19) 185 v2(x) = α(x− `1) + 2α`1 3 , x ∈ (x1, x˜1), (A.20) v3(x) = − α 3`21 (x− `1 − `2)3 + α(x− `1 − `2) + α`2 + 2α`1 3 , x ∈ (x˜1, L), (A.21) and v3 has the following conditions on x = L = `1 + `2 + `3: v3(`1 + `2 + `3) = 0 ⇐⇒ − ` 3 3 3`21 + `3 + `2 + 2`1 3 = 0, (A.22) |v′3(`1 + `2 + `3)| ≤ α ⇐⇒ 0 ≤ `23 `21 ≤ 2, (A.23) −v′′3(`1 + `2 + `3) = h 2 ⇐⇒ `21 = 4α`3 h . (A.24) Combining (A.23) and (A.24) we get 0 ≤ `3 ≤ 8α h . (A.25) Moreover combining (A.22) and (A.24) we get −h`23 12α + `3 + `2 + 2`1 3 = 0. (A.26) However one can easily check that the term −h`23 12α +`3 is positive if and only if 0 ≤ `3 ≤ 12αh , which is the case from condition (A.25) thus (A.26) forms a contradiction. We conclude that neither this solution can happen. Proposition A.1.3. The case N-0-P2-C cannot occur. Proof. Suppose that there exist points 0 < x1 < x˜1 < x2 < L such that u < 0 in (0, x1), u = 0 in (x1, x˜1) and u > 0 in (x˜1, L). Suppose further that u ′ has a positive jump at x2 and u is continuous at x = L. As in the previous cases we have v1(x) = − α 3`21 x3 + α `1 x2, x ∈ (0, x1), (A.27) v2(x) = α(x− `1) + 2α`1 3 , x ∈ (x1, x˜1), (A.28) v3(x) = − α 3`21 (x− `1 − `2)3 + α(x− `1 − `2) + α`2 + 2α`1 3 , x ∈ (x˜1, x2), (A.29) 186 A.1. Exclusion of the cases N-0-P2-J, N-0-P1-C, N-0-P2-C with v3 having the following conditions v′3(`1 + `2 + `3) = 0 ⇐⇒ `3 = `1, (A.30) v3(`1 + `2 + `3) = β ⇐⇒ 4 3 `1 + `2 = β α . (A.31) Finally as before, v4 will be of the form v4 = a4(x− `1 − `2 − `3)3 − α `1 (x− `1 − `2 − `3)2 + β, x ∈ (x2, L), (A.32) with the following conditions on x = L = `1 + `2 + `3 + `4: v4(`1 + `2 + `3 + `4) = 0 ⇐⇒ a4`34 − α`24 `1 + β = 0, (A.33) |v′4(`1 + `2 + `3 + `4)| ≤ α ⇐⇒ ∣∣∣∣3a4`24 − 2α`4`1 ∣∣∣∣ ≤ α, (A.34) −v′4(`1 + `2 + `3 + `4) = h 2 ⇐⇒ −6a4`4 + 2α `1 = h 2 . (A.35) Combining (A.34) and (A.35) we end up to `4 `1 + h`4 4α ≤ 1, (A.36) which in particular implies that `4 < `1. (A.37) Observe now that because of the fact that `2 > 0 from (A.31) we get that `1 < γ/4. But now, since v4(`1 + `2 + `3) = β, v4(`1 + `2 + `3 + `4) = 0, from the mean value theorem and the fact that the derivative of v4 is decreasing we have that v′4(`1 + `2 + `3 + `4) < − β `4 . (A.38) Moreover `4 < `1 < γ 4 ⇒ − β `4 < −4α 3 < −α, and thus from (A.38) we have that v′4(L) < −α, which is a contradiction. Thus neither this kind of solution is possible. 187 A.2 Some useful Lemmas The following lemma (pointed out by Luca Calatroni) was used in the proof of Theorem 3.5.1. Lemma A.2.1 (Kronecker’s lemma). Suppose that (an)n∈N and (bn)n∈N are two sequences of real numbers such that ∑∞ n=1 an <∞ and 0 < b1 ≤ b2 ≤ . . . with bn →∞. Then 1 bn n∑ k=1 bkak → 0, as n→∞. In particular, if (cn)n∈N is a decreasing positive real sequence such that ∑∞ n=1 c 2 n < ∞, then cn n∑ k=1 ck → 0, as n→∞. The following Lemma was used in the proof of Lemma 5.2.1. Lemma A.2.2. Let d ≥ 2. Then for every i, j = 1, . . . , d with i 6= j and for every R > 0 we have ˆ |z| 3. d = 2: Employing the transformation z1 = r cos θ, z2 = r sin θ, we have that ˆ |z| 3: For this case, we employ the hyperspherical coordinate transformation: z1 = r cos θ1, z2 = r sin θ1 cos θ2, z3 = r sin θ1 sin θ2 cos θ3, ... zd−1 = r sin θ1 . . . sin θd−2 cos θd−1, zd = r sin θ1 . . . sin θd−2 sin θd−1. Here θ1, θ2, . . . , θd−2 range over [0, pi) while θd−1 ranges over [0, 2pi). The determinant of the Jacobian of theis transformation is rd−1 sind−2 θ1 sind−3 θ2 . . . sin θd−2, thus we can calculate ˆ |z|