This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

# Introduction to Functional Analysis

### December  ‍6, 2021

Abstract: This is lecture notes for several courses on Functional Analysis at School of Mathematics of University of Leeds. They are based on the notes of Dr. ‍Matt Daws, Prof. ‍Jonathan R. ‍Partington and Dr. ‍David Salinger used in the previous years. Some sections are borrowed from the textbooks, which I used since being a student myself. However all misprints, omissions, and errors are only my responsibility. I am very grateful to Filipa Soares de Almeida, Eric Borgnet, Pasc Gavruta for pointing out some of them. Please let me know if you find more.

The notes are available also for download in PDF.

The suggested textbooks are [, , , ]. The other nice books with many interesting problems are ‍[, ].

Exercises with stars are not a part of mandatory material but are nevertheless worth to hear about. And they are not necessarily difficult, try to solve them!

## Notations and Assumptions

+, ℝ+ denotes non-negative integers and reals.
x,y,z,… denotes vectors.
λ,µ,ν,… denotes scalars.
z, ℑ z stand for real and imaginary parts of a complex number z.

### Integrability conditions

In this course, the functions we consider will be real or complex valued functions defined on the real line which are locally Riemann integrable. This means that they are Riemann integrable on any finite closed interval [a,b]. (A complex valued function is Riemann integrable iff its real and imaginary parts are Riemann-integrable.) In practice, we shall be dealing mainly with bounded functions that have only a finite number of points of discontinuity in any finite interval. We can relax the boundedness condition to allow improper Riemann integrals, but we then require the integral of the absolute value of the function to converge.

We mention this right at the start to get it out of the way. There are many fascinating subtleties connected with Fourier analysis, but those connected with technical aspects of integration theory are beyond the scope of the course. It turns out that one needs a “better” integral than the Riemann integral: the Lebesgue integral, and I commend the module, Linear Analysis 1, which includes an introduction to that topic which is available to MM students (or you could look it up in Real and Complex Analysis by Walter Rudin). Once one has the Lebesgue integral, one can start thinking about the different classes of functions to which Fourier analysis applies: the modern theory (not available to Fourier himself) can even go beyond functions and deal with generalized functions (distributions) such as the Dirac delta function which may be familiar to some of you from quantum theory.

From now on, when we say “function”, we shall assume the conditions of the first paragraph, unless anything is stated to the contrary.

## 1 Motivating Example: Fourier Series

### 1.1  Fourier series: basic notions

Before proceed with an abstract theory we consider a motivating example: Fourier series.

#### 1.1.1 2π-periodic functions

In this part of the course we deal with functions (as above) that are periodic.

We say a function f:ℝ→ℂ is periodic with period T>0 if f(x+T)= f(x) for all x∈ ℝ. For example, sinx, cosx, eix(=cos x+i sinx) are periodic with period 2π. For kR∖{0}, sinkx, coskx, and eikx are periodic with period 2π/|k|. Constant functions are periodic with period T, for any T>0. We shall specialize to periodic functions with period 2π: we call them 2π-periodic functions, for short. Note that cosnx, sinnx and einx are 2π-periodic for n∈ℤ. (Of course these are also 2π/|n|-periodic.)

Any half-open interval of length T is a fundamental domain of a periodic function f of period T. Once you know the values of f on the fundamental domain, you know them everywhere, because any point x in ℝ can be written uniquely as x=w+nT where n∈ ℤ and w is in the fundamental domain. Thus f(x) = f(w+(n−1)T +T)=⋯ =f(w+T) =f(w).

For 2π-periodic functions, we shall usually take the fundamental domain to be ]−π, π]. By abuse of language, we shall sometimes refer to [−π, π] as the fundamental domain. We then have to be aware that f(π)=f(−π).

#### 1.1.2 Integrating the complex exponential function

We shall need to calculate ∫ab eikxdx, for k∈ℝ. Note first that when k=0, the integrand is the constant function 1, so the result is ba. For non-zero k, ∫ab eikxdx= ∫ab (coskx+isinkx) dx = (1/k)[ (sinkxicoskx)]ab = (1/ik)[(coskx+isinkx)]ab = (1/ik)[eikx]ab = (1/ik)(eikbeika). Note that this is exactly the result you would have got by treating i as a real constant and using the usual formula for integrating eax. Note also that the cases k=0 and k≠0 have to be treated separately: this is typical.

Definition ‍1 Let f:ℝ→ℂ be a -periodic function which is Riemann integrable on [−π, π]. For each n∈ℤ we define the Fourier coefficient f(n) by
f(n) =
 1 2π
 π ∫ −π
f(x) einxdx .
Remark ‍2
1. f(n) is a complex number whose modulus is the amplitude and whose argument is the phase (of that component of the original function).
2. If f and g are Riemann integrable on an interval, then so is their product, so the integral is well-defined.
3. The constant before the integral is to divide by the length of the interval.
4. We could replace the range of integration by any interval of length , without altering the result, since the integrand is -periodic.
5. Note the minus sign in the exponent of the exponential. The reason for this will soon become clear.
Example ‍3
1. f(x) = c then f(0) =c and f(n) =0 when n≠0.
2. f(x) = eikx, where k is an integer. f(n) = δnk.
3. f is periodic and f(x) = x on ]−π, π]. (Diagram) Then f(0) = 0 and, for n≠0,
f(n) =
 1 2π
 π ∫ −π
xeinxdx =

 −xe−inx 2π in

 π −π
+
 1 in
 1 2π
 π ∫ −π
einxdx =
 (−1)ni n
.
Proposition ‍4 ‍(Linearity) If f and g are -periodic functions and c and d are complex constants, then, for all n∈ℤ,
 (c f + d g6) (n) = cf(n) + dĝ(n) .
Corollary ‍5 If p(x) is a trigonometric polynomial, p(x)= ∑kk cneinx, then p(n) = cn for |n|≤ k and =0, for |n|≥ k.
p(x) =
 ∑ n∈ℤ
p(n)einx .

This follows immediately from Ex. ‍2 and Prop.4.

Remark ‍6
1. This corollary explains why the minus sign is natural in the definition of the Fourier coefficients.
2. The first part of the course will be devoted to the question of how far this result can be extended to other -periodic functions, that is, for which functions, and for which interpretations of infinite sums is it true that
f(x) =
 ∑ n∈ℤ
f(n)einx . (1)
Definition ‍7  ∑n∈ℤ f(n)einx is called the Fourier series of the -periodic function f.

For real-valued functions, the introduction of complex exponentials seems artificial: indeed they can be avoided as follows. We work with (1) in the case of a finite sum: then we can rearrange the sum as

f(0) +
 ∑ n>0
(f(n) einx +f(−n)einx)
=
f(0) +
 ∑ n>0
[(f(n)+f(−n))cosnx +i(f(n)−f(−n))sin nx]
=
 a0 2
+
 ∑ n>0
(ancosnx +bnsinnx)

Here

an=
(f(n)+f(−n)) =
 1 2π
 π ∫ −π
f(x)(einx+einx) dx
=
 1 π
 π ∫ −π
f(x)cosnxdx

for n>0 and

bn =i((f(n)−f(−n))=
 1 π
 π ∫ −π
f(x)sin nxdx

for n>0. a0 = 1/π∫−ππf(x) dx, the constant chosen for consistency.

The an and bn are also called Fourier coefficients: if it is necessary to distinguish them, we may call them Fourier cosine and sine coefficients, respectively.

We note that if f is real-valued, then the an and bn are real numbers and so ℜ f(n) = ℜ f(−n), ℑ f(−n) = −ℑf(n): thus f(−n) is the complex conjugate of f(n). Further, if f is an even function then all the sine coefficients are 0 and if f is an odd function, all the cosine coefficients are zero. We note further that the sine and cosine coefficients of the functions coskx and sinkx themselves have a particularly simple form: ak=1 in the first case and bk=1 in the second. All the rest are zero.

For example, we should expect the 2π-periodic function whose value on ]−π,π] is x to have just sine coefficients: indeed this is the case: an=0 and bn=i(f(n)−f(−n)) = (−1)n+12/n for n>0.

The above question can then be reformulated as “to what extent is f(x) represented by the Fourier series a0/2 + ∑n>0(ancosx + bnsinx)?” For instance how well does ∑(−1)n+1(2/n)sinnx represent the 2π-periodic sawtooth function f whose value on ]−π, π] is given by f(x) = x. The easy points are x=0, x=π, where the terms are identically zero. This gives the ‘wrong’ value for x=π, but, if we look at the periodic function near π, we see that it jumps from π to −π, so perhaps the mean of those values isn’t a bad value for the series to converge to. We could conclude that we had defined the function incorrectly to begin with and that its value at the points (2n+1)π should have been zero anyway. In fact one can show (ref. ) that the Fourier series converges at all other points to the given values of f, but I shan’t include the proof in this course. The convergence is not at all uniform (it can’t be, because the partial sums are continuous functions, but the limit is discontinuous.) In particular we get the expansion

 π 2
= 2(1−1/3+1/5−⋯)

which can also be deduced from the Taylor series for tan−1.

### 1.2 The vibrating string

In this subsection we shall discuss the formal solutions of the wave equation in a special case which Fourier dealt with in his work.

We discuss the wave equation

 ∂2y ∂ x2
=
 1 K2
 ∂2y ∂ t2
, (2)

subject to the boundary conditions

 y(0, t) = y(π, t) = 0, (3)

for all t≥0, and the initial conditions

 y(x,0) = F(x), yt(x,0) = 0.

This is a mathematical model of a string on a musical instrument (guitar, harp, violin) which is of length π and is plucked, i.e. held in the shape F(x) and released at time t=0. The constant K depends on the length, density and tension of the string. We shall derive the formal solution (that is, a solution which assumes existence and ignores questions of convergence or of domain of definition).

#### 1.2.1 Separation of variables

We first look (as Fourier and others before him did) for solutions of the form y(x,t) = f(x)g(t). Feeding this into the wave equation (2) we get

f′′(x) g(t) =
 1 K2
f(x) g′′(t)

and so, dividing by f(x)g(t), we have

 f′′(x) f(x)
=
 1 K2
 g′′(t) g(t)
. (4)

The left-hand side is an expression in x alone, the right-hand side in t alone. The conclusion must be that they are both identically equal to the same constant C, say.

We have f′′(x) −Cf(x) =0 subject to the condition f(0) = f(π) =0. Working through the method of solving linear second order differential equations tells you that the only solutions occur when C = −n2 for some positive integer n and the corresponding solutions, up to constant multiples, are f(x) = sinnx.

Returning to equation (4) gives the equation g′′(t)+K2n2g(t) =0 which has the general solution g(t) = ancosKnt + bnsinKnt. Thus the solution we get through separation of variables, using the boundary conditions but ignoring the initial conditions, are

 yn(x,t) = sinnx(an cosKnt + bn sinKnt) ,

for n≥ 1.

#### 1.2.2 Principle of Superposition

To get the general solution we just add together all the solutions we have got so far, thus

y(x,t) =
 ∞ ∑ n=1
sinnx(an cosKnt + bn sin Knt) (5)

ignoring questions of convergence. (We can do this for a finite sum without difficulty because we are dealing with a linear differential equation: the iffy bit is to extend to an infinite sum.)

We now apply the initial condition y(x,0) = F(x) (note F has F(0) =F(π) =0). This gives

F(x) =
 ∞ ∑ n=1
ansinnx .

We apply the reflection trick: the right-hand side is a series of odd functions so if we extend F to a function G by reflection in the origin, giving

G(x):=

 F(x) ,  if  0≤ x≤π; −F(−x) ,  if  −π

we have

G(x) =
 ∞ ∑ n=1
ansinnx ,

for −π≤ x ≤ π.

If we multiply through by sinrx and integrate term by term, we get

ar =
 1 π
 π ∫ −π
G(x)sinrxdx

so, assuming that this operation is valid, we find that the an are precisely the sine coefficients of G. (Those of you who took Real Analysis 2 last year may remember that a sufficient condition for integrating term-by -term is that the series which is integrated is itself uniformly convergent.)

If we now assume, further, that the right-hand side of ‍(5) is differentiable (term by term) we differentiate with respect to t, and set t=0, to get

0=yt(x,0) =
 ∞ ∑ n=1
bnKn sinnx. (6)

This equation is solved by the choice bn=0 for all n, so we have the following result

Proposition ‍8 ‍(Formal) Assuming that the formal manipulations are valid, a solution of the differential equation ‍(2) with the given boundary and initial conditions is
y(x,t) =
 ∞ ∑ 1
an sinnx cosKnt ,(2.11)
where the coefficients an are the Fourier sine coefficients
an =
 1 π
 π ∫ −π
G(x)sinnxdx
of the periodic function G, defined on ]−π, π] by reflecting the graph of F in the origin.
Remark ‍9 This leaves us with the questions
1. For which F are the manipulations valid?
2. Is this the only solution of the differential equation? (which I’m not going to try to answer.)
3. Is bn=0 all n the only solution of ‍(6)? This is a special case of the uniqueness problem for trigonometric series.

### 1.3 Historic: Joseph Fourier

Joseph Fourier, Civil Servant, Egyptologist, and mathematician, was born in 1768 in Auxerre, France, son of a tailor. Debarred by birth from a career in the artillery, he was preparing to become a Benedictine monk (in order to be a teacher) when the French Revolution violently altered the course of history and Fourier’s life. He became president of the local revolutionary committee, was arrested during the Terror, but released at the fall of Robespierre.

Fourier then became a pupil at the Ecole Normale (the teachers’ academy) in Paris, studying under such great French mathematicians as Laplace and Lagrange. He became a teacher at the Ecole Polytechnique (the military academy).

He was ordered to serve as a scientist under Napoleon in Egypt. In 1801, Fourier returned to France to become Prefect of the Grenoble region. Among his most notable achievements in that office were the draining of some 20 thousand acres of swamps and the building of a new road across the alps.

During that time he wrote an important survey of Egyptian history (“a masterpiece and a turning point in the subject”).

In 1804 Fourier started the study of the theory of heat conduction, in the course of which he systematically used the sine-and-cosine series which are named after him. At the end of 1807, he submitted a memoir on this work to the Academy of Science. The memoir proved controversial both in terms of his use of Fourier series and of his derivation of the heat equation and was not accepted at that stage. He was able to resubmit a revised version in 1811: this had several important new features, including the introduction of the Fourier transform. With this version of his memoir, he won the Academy’s prize in mathematics. In 1817, Fourier was finally elected to the Academy of Sciences and in 1822 his 1811 memoir was published as “Théorie de la Chaleur”.

For more details see Fourier Analysis by T.W. Körner, 475-480 and for even more, see the biography by J. Herivel Joseph Fourier: the man and the physicist.

What is Fourier analysis. The idea is to analyse functions (into sine and cosines or, equivalently, complex exponentials) to find the underlying frequencies, their strengths (and phases) and, where possible, to see if they can be recombined (synthesis) into the original function. The answers will depend on the original properties of the functions, which often come from physics (heat, electronic or sound waves). This course will give basically a mathematical treatment and so will be interested in mathematical classes of functions (continuity, differentiability properties).

## 2 Basics of Linear Spaces

A person is solely the concentration of an infinite set of interrelations with another and others, and to separate a person from these relations means to take away any real meaning of the life.

Vl. Soloviev

A space around us could be described as a three dimensional Euclidean space. To single out a point of that space we need a fixed frame of references and three real numbers, which are coordinates of the point. Similarly to describe a pair of points from our space we could use six coordinates; for three points—nine, end so on. This makes it reasonable to consider Euclidean (linear) spaces of an arbitrary finite dimension, which are studied in the courses of linear algebra.

The basic properties of Euclidean spaces are determined by its linear and metric structures. The linear space (or vector space) structure allows to ‍add and subtract vectors associated to points as well as to ‍multiply vectors by real or complex numbers (scalars).

The metric space structure assign a distance—non-negative real number—to a pair of points or, equivalently, defines a length of a vector defined by that pair. A metric (or, more generally a topology) is essential for definition of the core analytical notions like limit or continuity. The importance of linear and metric (topological) structure in analysis sometime encoded in the formula:

 Analysis  = Algebra  + Geometry .   (7)

On the other hand we could observe that many sets admit a sort of linear and metric structures which are linked each other. Just few among many other examples are:

• The set of convergent sequences;
• The set of continuous functions on [0,1].

It is a very mathematical way of thinking to declare such sets to be spaces and call their elements points.

But shall we lose all information on a particular element (e.g. a sequence {1/n}) if we represent it by a shapeless and size-less “point” without any inner configuration? Surprisingly not: all properties of an element could be now retrieved not from its inner configuration but from interactions with other elements through linear and metric structures. Such a “sociological” approach to all kind of mathematical objects was codified in the abstract category theory.

Another surprise is that starting from our three dimensional Euclidean space and walking far away by a road of abstraction to infinite dimensional Hilbert spaces we are arriving just to yet another picture of the surrounding space—that time on the language of quantum mechanics.

The distance from Manchester to Liverpool is 35 miles—just about the mileage in the opposite direction!

A tourist guide to England

### 2.1 Banach spaces (basic definitions only)

The following definition generalises the notion of distance known from the everyday life.

Definition ‍1 A metric (or distance function) d on a set M is a function d: M× M →ℝ+ from the set of pairs to non-negative real numbers such that:
1. d(x,y)≥0 for all x, yM, d(x,y)=0 implies x=y .
2. d(x,y)=d(y,x) for all x and y in M.
3. d(x,y)+d(y,z)≥ d(x,z) for all x, y, and z in M (triangle inequality).
Exercise ‍2 Let M be the set of UK’s cities are the following function are metrics on M:
1. d(A,B) is the price of 2nd class railway ticket from A to B.
2. d(A,B) is the off-peak driving time from A to B.

The following notion is a useful specialisation of metric adopted to the linear structure.

Definition ‍3 Let V be a (real or complex) vector space. A norm on V is a real-valued function, written ||x||, such that
1. ||x||≥ 0 for all xV, and ||x||=0 implies x=0.
2. ||λ x|| = | λ | ||x|| for all scalar λ and vector x.
3. ||x+y||≤ ||x||+||y|| (triangle inequality).
A vector space with a norm is called a normed space.

The connection between norm and metric is as follows:

Proposition ‍4 If ||·|| is a norm on V, then it gives a metric on V by d(x,y)=||xy||.

Proof. This is a simple exercise to derive items ‍13 of Definition ‍1 from corresponding items of Definition ‍3. For example, see the Figure ‍1 to derive the triangle inequality.

An important notions known from real analysis are ‍limit and convergence. Particularly we usually wish to have enough limiting points for all “reasonable” sequences.

Definition ‍5 A sequence {xk} in a metric space (M,d) is a Cauchy sequence, if for every є>0, there exists an integer n such that k,l>n implies that d(xk,xl)<є.
Definition ‍6 (M,d) is a complete metric space if every Cauchy sequence in M converges to a limit in M.

For example, the set of integers ℤ and reals ℝ with the natural distance functions are complete spaces, but the set of rationals ℚ is not. The complete normed spaces deserve a special name.

Definition ‍7 A Banach space is a complete normed space.
Exercise* ‍8 A convenient way to define a norm in a Banach space is as follows. The unit ball U in a normed space B is the set of x such that ||x||≤ 1. Prove that:
1. U is a convex set, i.e. x, yU and λ∈ [0,1] the point λ x +(1−λ)y is also in U.
2. ||x||=inf{ λ∈ℝ+  ∣  λ−1xU}.
3. U is closed if and only if the space is Banach.
Example ‍9 Here is some examples of normed spaces.
1. l2n is either n or n with norm defined by
⎪⎪
⎪⎪
(x1,…,xn)⎪⎪
⎪⎪
2 =
 ⎪ ⎪ x1 ⎪ ⎪ 2+ ⎪ ⎪ x2 ⎪ ⎪ 2+ ⋯+ ⎪ ⎪ xn ⎪ ⎪ 2
.
2. l1n is either n or n with norm defined by
⎪⎪
⎪⎪
(x1,…,xn)⎪⎪
⎪⎪
1 =
 ⎪ ⎪ x1 ⎪ ⎪ + ⎪ ⎪ x2 ⎪ ⎪ + ⋯+ ⎪ ⎪ xn ⎪ ⎪
.
3. ln is either n or n with norm defined by
⎪⎪
⎪⎪
(x1,…,xn)⎪⎪
⎪⎪
= max(
 ⎪ ⎪ x1 ⎪ ⎪ , ⎪ ⎪ x2 ⎪ ⎪ , ⋯, ⎪ ⎪ xn ⎪ ⎪
).
4. Let X be a topological space, then Cb(X) is the space of continuous bounded functions f: X→ℂ with norm ||f||=supX | f(x) |.
5. Let X be any set, then l(X) is the space of all bounded (not necessarily continuous) functions f: X→ℂ with norm ||f||=supX | f(x) |.
All these normed spaces are also complete and thus are Banach spaces. Some more examples of both complete and incomplete spaces shall appear later.

—We need an extra space to accommodate this product!

A manager to a shop assistant

### 2.2 Hilbert spaces

Although metric and norm capture important geometric information about linear spaces they are not sensitive enough to represent such geometric characterisation as angles (particularly orthogonality). To this end we need a further refinements.

From courses of linear algebra known that the scalar product ⟨ x,y ⟩= x1 y1 + ⋯ + xn yn is important in a space ℝn and defines a norm ||x||2=⟨ x,x ⟩. Here is a suitable generalisation:

Definition ‍10 A scalar product (or inner product) on a real or complex vector space V is a mapping V× V → ℂ, written x,y, that satisfies:
1. x,x ⟩ ≥ 0 and x,x ⟩ =0 implies x=0.
2. x,y ⟩ = y,x in complex spaces and x,y ⟩ = ⟨ y,x in real ones for all x, yV.
3. ⟨ λ x,y ⟩=λ ⟨ x,y, for all x, yV and scalar λ. (What is xy?).
4. x+y,z ⟩=⟨ x,z ⟩ + ⟨ y,z, for all x, y, and zV. (What is x, y+z?).

Last two properties of the scalar product is oftenly encoded in the phrase: “it is linear in the first variable if we fix the second and anti-linear in the second if we fix the first”.

Definition ‍11 An inner product space V is a real or complex vector space with a scalar product on it.
Example ‍12 Here is some examples of inner product spaces which demonstrate that expression ||x||=√x,x defines a norm.
1. The inner product for n was defined in the beginning of this section. The inner product for n is given by x,y ⟩=∑1n xj ȳj. The norm ||x||=√1n | xj |2 makes it l2n from Example ‍1.
2. The extension for infinite vectors: let l2 be
l2={ sequences  {xj}1 ∣
 ∞ ∑ 1

xj
2 < ∞}. (8)
Let us equip this set with operations of term-wise addition and multiplication by scalars, then l2 is closed under them. Indeed it follows from the triangle inequality and properties of absolutely convergent series. From the standard Cauchy–Bunyakovskii–Schwarz inequality follows that the series 1xjȳj absolutely converges and its sum defined to be x,y.
3. Let Cb[a,b] be a space of continuous functions on the interval [a,b]∈ℝ. As we learn from Example ‍4 a normed space it is a normed space with the norm ||f||=sup[a,b]| f(x) |. We could also define an inner product:
⟨ f,g  ⟩=
 b ∫ a
f(x)ḡ(x) dx  and  ⎪⎪
⎪⎪
f⎪⎪
⎪⎪
2=

 b ∫ a

f(x)
2dx

 1/2
. (9)

Now we state, probably, the most important inequality in analysis.

Theorem ‍13 ‍(Cauchy–Schwarz–Bunyakovskii inequality) For vectors x and y in an inner product space V let us define ||x||=√x,x and ||y||=√y,y then we have
 ⎪ ⎪ ⟨ x,y  ⟩ ⎪ ⎪ ≤ ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ y ⎪⎪ ⎪⎪ ,  (10)
with equality if and only if x and y are scalar multiple each other.

Proof. For any x, yV and any t∈ℝ we have:

 0< ⟨ x+t y,x+t y  ⟩= ⟨ x,x  ⟩+2t ℜ ⟨ y,x  ⟩+t2⟨ y,y  ⟩),

Thus the discriminant of this quadratic expression in t is non-positive: (ℜ ⟨ y,x ⟩)2−||x||2||y||2≤ 0, that is | ℜ ⟨ x,y ⟩ |≤||x||||y||. Replacing y by eiαy for an arbitrary α∈[−π,π] we get | ℜ (eiαx,y ⟩) | ≤||x||||y||, this implies the desired inequality.

Corollary ‍14 Any inner product space is a normed space with norm ||x||=√x,x (hence also a metric space, Prop. ‍4).

Proof. Just to check items ‍13 from Definition ‍3.

Again complete inner product spaces deserve a special name

Definition ‍15 A complete inner product space is Hilbert space.

The relations between spaces introduced so far are as follows:

 Hilbert spaces ⇒ Banach spaces ⇒ Complete metric spaces ⇓ ⇓ ⇓ inner product spaces ⇒ normed spaces ⇒ metric spaces.

How can we tell if a given norm comes from an inner product?

Theorem ‍16 ‍(Parallelogram identity) In an inner product space H we have for all x and yH (see Figure ‍3):
 ⎪⎪ ⎪⎪ x+y ⎪⎪ ⎪⎪ 2+ ⎪⎪ ⎪⎪ x−y ⎪⎪ ⎪⎪ 2=2 ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ 2+2 ⎪⎪ ⎪⎪ y ⎪⎪ ⎪⎪ 2. (11)

Proof. Just by linearity of inner product:

 ⟨ x+y,x+y  ⟩+⟨ x−y,x−y  ⟩=2⟨ x,x  ⟩+2⟨ y,y  ⟩,

because the cross terms cancel out.

Exercise ‍17 Show that ‍(11) is also a sufficient condition for a norm to arise from an inner product. Namely, for a norm on a complex Banach space satisfying to ‍(11) the formula

⟨ x,y  ⟩=
 1 4

⎪⎪
⎪⎪
x+y⎪⎪
⎪⎪
2⎪⎪
⎪⎪
xy⎪⎪
⎪⎪
2+i⎪⎪
⎪⎪
x+iy⎪⎪
⎪⎪
2i⎪⎪
⎪⎪
xiy⎪⎪
⎪⎪
2
‍
(12)
=
 1 4
 3 ∑ 0
ik⎪⎪
⎪⎪
x+iky⎪⎪
⎪⎪
2

defines an inner product. What is a suitable formula for a real Banach space?

Divide and rule!

Old but still much used recipe

### 2.3 Subspaces

To study Hilbert spaces we may use the traditional mathematical technique of analysis and synthesis: we split the initial Hilbert spaces into smaller and probably simpler subsets, investigate them separately, and then reconstruct the entire picture from these parts.

As known from the linear algebra, a linear subspace is a subset of a linear space is its subset, which inherits the linear structure, i.e. possibility to add vectors and multiply them by scalars. In this course we need also that subspaces inherit topological structure (coming either from a norm or an inner product) as well.

Definition ‍18 By a subspace of a normed space (or inner product space) we mean a linear subspace with the same norm (inner product respectively). We write XY or XY.
Example ‍19
1. Cb(X) ⊂ l(X) where X is a metric space.
2. Any linear subspace of n or n with any norm given in Example ‍13.
3. Let c00 be the space of finite sequences, i.e. all sequences (xn) such that exist N with xn=0 for n>N. This is a subspace of l2 since 1| xj |2 is a finite sum, so finite.

We also wish that the both inhered structures (linear and topological) should be in agreement, i.e. the subspace should be complete. Such inheritance is linked to the property be closed.

A subspace need not be closed—for example the sequence

 x=(1, 1/2, 1/3, 1/4, …)∈ l2    because   ∑1/k2 < ∞

and xn=(1, 1/2,…, 1/n, 0, 0,…)∈ c00 converges to x thus xc00l2.

Proposition ‍20
1. Any closed subspace of a Banach/Hilbert space is complete, hence also a Banach/Hilbert space.
2. Any complete subspace is closed.
3. The closure of subspace is again a subspace.

Proof.

1. This is true in any metric space X: any Cauchy sequence from Y has a limit xX belonging to Ȳ, but if Y is closed then xY.
2. Let Y is complete and x∈ Ȳ, then there is sequence xnx in Y and it is a Cauchy sequence. Then completeness of Y implies xY.
3. If x, y∈ Ȳ then there are xn and yn in Y such that xnx and yny. From the  ‍triangle inequality:
 ⎪⎪ ⎪⎪ (xn+yn)−(x+y) ⎪⎪ ⎪⎪ ≤ ⎪⎪ ⎪⎪ xn−x ⎪⎪ ⎪⎪ + ⎪⎪ ⎪⎪ yn−y ⎪⎪ ⎪⎪ → 0,
so xn+ynx+y and x+y∈ Ȳ. Similarly x∈Ȳ implies λ x ∈Ȳ for any λ.

Hence c00 is an incomplete inner product space, with inner product ⟨ x,y ⟩=∑1xk ȳk (this is a finite sum!) as it is not closed in l2.

Similarly C[0,1] with inner product norm ||f||=(∫01 | f(t) |2 dt)1/2 is incomplete—take the large space X of functions continuous on [0,1] except for a possible jump at 1/2 (i.e. left and right limits exists but may be unequal and f(1/2)=limt→1/2+ f(t). Then the sequence of functions defined on Figure ‍4(a) has the limit shown on Figure ‍4(b) since:

⎪⎪
⎪⎪
ffn⎪⎪
⎪⎪
=
 1 2
+
 1 n
 1 2
 1 n

ffn
2dt <
 2 n
→ 0.

Obviously fC[0,1]C[0,1].

Exercise ‍21 Show alternatively that the sequence of function fn from Figure ‍4(a) is a Cauchy sequence in C[0,1] but has no continuous limit.

Similarly the space C[a,b] is incomplete for any a<b if equipped by the inner product and the corresponding norm:

⟨ f,g  ⟩ =
 b ∫ a
f(t)ḡ(t) dt ‍
(13)
 ⎪⎪ ⎪⎪ f ⎪⎪ ⎪⎪ 2
=

 b ∫ a

f(t)
2  dt

 1/2
.  ‍
(14)
Definition ‍22 Define a Hilbert space L2[a,b] to be the smallest complete inner product space containing space C[a,b] with the restriction of inner product given by ‍(13).

It is practical to realise L2[a,b] as a certain space of “functions” with the inner product defined via an integral. There are several ways to do that and we mention just two:

1. Elements of L2[a,b] are equivalent classes of Cauchy sequences f(n) of functions from C[a,b].
2. Let integration be extended from the Riemann definition to the wider Lebesgue integration (see Section ‍13). Let L be a set of square integrable in Lebesgue sense functions on [a,b] with a finite norm ‍(14). Then L2[a,b] is a quotient space of L with respect to the equivalence relation fg ⇔ ||fg||2=0 .
Example ‍23 Let the Cantor function on [0,1] be defined as follows:
f(t)=

 1, t∈ ℚ; 0, t∈ ℝ∖ℚ.
This function is not integrable in the Riemann sense but does have the Lebesgue integral. The later however is equal to 0 and as an L2-function the Cantor function equivalent to the function identically equal to 0.
3. The third possibility is to map L2(ℝ) onto a space of “true” functions but with an additional structure. For example, in quantum mechanics it is useful to work with the Segal–Bargmann space of analytic functions on ℂ with the inner product ‍[, , ]:
⟨ f1,f2  ⟩=
 ℂ
f1(z) f2(z)e
 − ⎪ ⎪ z ⎪ ⎪ 2

dz.
Theorem ‍24 The sequence space l2 is complete, hence a Hilbert space.

Proof. Take a Cauchy sequence x(n)l2, where x(n)=(x1(n), x2(n), x3(n), … ). Our proof will have three steps: identify the limit x; show it is in l2; show x(n)x.

1. If x(n) is a Cauchy sequence in l2 then xk(n) is also a Cauchy sequence of numbers for any fixed k:

xk(n)xk(m)
≤

 ∞ ∑ k=1

xk(n)xk(m)
2

 1/2
= ⎪⎪
⎪⎪
x(n)x(m)⎪⎪
⎪⎪
→ 0.
Let xk be the limit of xk(n).
2. For a given є>0 find n0 such that ||x(n)x(m)||<є for all n,m>n0. For any K and m:

 K ∑ k=1

xk(n)xk(m)
2 ≤  ⎪⎪
⎪⎪
x(n)x(m)⎪⎪
⎪⎪
22.
Let m→ ∞ then ∑k=1K | xk(n)xk |2 ≤ є2.
Let K→ ∞ then ∑k=1| xk(n)xk |2 ≤ є2. Thus x(n)xl2 and because l2 is a linear space then x = x(n)−(x(n)x) is also in l2.
3. We saw above that for any є >0 there is n0 such that ||x(n)x||<є for all n>n0. Thus x(n)x.

Consequently l2 is complete.

All good things are covered by a thick layer of chocolate (well, if something is not yet–it certainly will)

### 2.4 Linear spans

As was explained into introduction ‍2, we describe “internal” properties of a vector through its relations to other vectors. For a detailed description we need sufficiently many external reference points.

Let A be a subset (finite or infinite) of a normed space V. We may wish to upgrade it to a linear subspace in order to make it subject to our theory.

Definition ‍25 The linear span of A, write Lin(A), is the intersection of all linear subspaces of V containing A, i.e. the smallest subspace containing A, equivalently the set of all finite linear combination of elements of A. The closed linear span of A write CLin(A) is the intersection of all closed linear subspaces of V containing A, i.e. the smallest closed subspace containing A.
Exercise* ‍26
1. Show that if A is a subset of finite dimension space then Lin(A)=CLin(A).
2. Show that for an infinite A spaces Lin(A) and CLin(A)could be different. (Hint: use Example ‍3.)
Proposition ‍27Lin(A)=CLin(A).

Proof. Clearly Lin(A) is a closed subspace containing A thus it should contain CLin(A). Also Lin(A)⊂ CLin(A) thus Lin(A)CLin(A)=CLin(A). Therefore Lin(A)= CLin(A).

Consequently CLin(A) is the set of all limiting points of finite linear combination of elements of A.

Example ‍28 Let V=C[a,b] with the sup norm ||·||. Then:
Lin{1,x,x2,…}={all polynomials}
CLin{1,x,x2,…}=C[a,b] by the Weierstrass approximation theorem proved later.
Remark ‍29 Note, that the relation PCLin(Q) between two sets P and Q is transitive: if PCLin(Q) and QCLin(R) then PCLin(R). This observation is often used in the following way. To show that PCLin(R) we introduce some intermediate sets Q1, …, Qn such that PCLin(Q1), QjCLin(Qj+1) and QnCLin(R), see the proof of Weierstrass Approximation Thm. ‍16 or § 14.2 for an illustration.

The following simple result will be used later many times without comments.

Lemma ‍30 ‍(about Inner Product Limit) Suppose H is an inner product space and sequences xn and yn have limits x and y correspondingly. Then xn,yn ⟩→⟨ x,y or equivalently:

 lim n→∞
⟨ xn,yn  ⟩=⟨
 lim n→∞
xn,
 lim n→∞
yn  ⟩.

Proof. Obviously by the Cauchy–Schwarz inequality:

 ⎪ ⎪ ⟨ xn,yn  ⟩−⟨ x,y  ⟩ ⎪ ⎪
=
 ⎪ ⎪ ⟨ xn−x,yn  ⟩+⟨ x,yn−y  ⟩ ⎪ ⎪

 ⎪ ⎪ ⟨ xn−x,yn  ⟩ ⎪ ⎪ + ⎪ ⎪ ⟨ x,yn−y  ⟩ ⎪ ⎪

 ⎪⎪ ⎪⎪ xn−x ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ yn ⎪⎪ ⎪⎪ + ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ yn−y ⎪⎪ ⎪⎪ → 0,

since ||xnx||→ 0, ||yny||→ 0, and ||yn|| is bounded.

## 3 Orthogonality

Pythagoras is forever!

The catchphrase from TV commercial of Hilbert Spaces course

As was mentioned in the introduction the Hilbert spaces is an analog of our 3D Euclidean space and theory of Hilbert spaces similar to plane or space geometry. One of the primary result of Euclidean geometry which still survives in high school curriculum despite its continuous nasty de-geometrisation is Pythagoras’ theorem based on the notion of orthogonality1.

So far we was concerned only with distances between points. Now we would like to study angles between vectors and notably right angles. Pythagoras’ theorem states that if the angle C in a triangle is right then c2=a2+b2, see Figure ‍5 .

It is a very mathematical way of thinking to turn this property of right angles into their definition, which will work even in infinite dimensional Hilbert spaces.

Look for a triangle, or even for a right triangle

A universal advice in solving problems from elementary geometry.

### 3.1 Orthogonal System in Hilbert Space

In inner product spaces it is even more convenient to give a definition of orthogonality not from Pythagoras’ theorem but from an equivalent property of inner product.

Definition ‍1 Two vectors x and y in an inner product space are orthogonal if x,y ⟩=0, written xy.

An orthogonal sequence (or orthogonal system) en (finite or infinite) is one in which enem whenever nm.

An orthonormal sequence (or orthonormal system) en is an orthogonal sequence with ||en||=1 for all n.

Exercise ‍2
1. Show that if xx then x=0 and consequently xy for any yH.
2. Show that if all vectors of an orthogonal system are non-zero then they are linearly independent.
Example ‍3 These are orthonormal sequences:
1. Basis vectors (1,0,0), (0,1,0), (0,0,1) in 3 or 3.
2. Vectors en=(0,…,0,1,0,…) (with the only 1 on the nth place) in l2. (Could you see a similarity with the previous example?)
3. Functions en(t)=1/(√2π) eint , n∈ℤ in C[0,2π]:
⟨ en,em  ⟩=
 2π ∫ 0
 1 2π
einteimtdt =

 1, n=m; 0, n≠ m.
(15)
Exercise ‍4 Let A be a subset of an inner product space V and xy for any yA. Prove that xz for all zCLin(A).
Theorem ‍5 ‍(Pythagoras’) If xy then ||x+y||2=||x||2+||y||2. Also if e1, …, en is orthonormal then
⎪⎪
⎪⎪
⎪⎪
⎪⎪
 n ∑ 1
akek⎪⎪
⎪⎪
⎪⎪
⎪⎪
2=⟨
 n ∑ 1
akek,
 n ∑ 1
ak ek  ⟩=
 n ∑ 1

ak
2.

Proof. A one-line calculation.

The following theorem provides an important property of Hilbert spaces which will be used many times. Recall, that a subset K of a linear space V is convex if for all x, yK and λ∈ [0,1] the point λ x +(1−λ)y is also in K. Particularly any subspace is convex and any unit ball as well (see Exercise ‍1).

Theorem ‍6 ‍(about the Nearest Point) Let K be a non-empty convex closed subset of a Hilbert space H. For any point xH there is the unique point yK nearest to x.

Proof. Let d=infyK d(x,y), where d(x,y)—the distance coming from the norm ||x||=√x,x and let yn a sequence points in K such that limn→ ∞d(x,yn)=d. Then yn is a Cauchy sequence. Indeed from the parallelogram identity for the parallelogram generated by vectors xyn and xym we have:

 ⎪⎪ ⎪⎪ yn−ym ⎪⎪ ⎪⎪ 2=2 ⎪⎪ ⎪⎪ x−yn ⎪⎪ ⎪⎪ 2+2 ⎪⎪ ⎪⎪ x−ym ⎪⎪ ⎪⎪ 2− ⎪⎪ ⎪⎪ 2x−yn−ym ⎪⎪ ⎪⎪ 2

Note that ||2xynym||2=4||xyn+ym/2||2≥ 4d2 since yn+ym/2∈ K by its convexity. For sufficiently large m and n we get ||xym||2d +є and ||xyn||2d +є, thus ||ynym||≤ 4(d2+є)−4d2=4є, i.e. yn is a Cauchy sequence.

Let y be the limit of yn, which exists by the completeness of H, then yK since K is closed. Then d(x,y)=limn→ ∞d(x,yn)=d. This show the existence of the nearest point. Let y′ be another point in K such that d(x,y′)=d, then the parallelogram identity implies:

 ⎪⎪ ⎪⎪ y−y′ ⎪⎪ ⎪⎪ 2=2 ⎪⎪ ⎪⎪ x−y ⎪⎪ ⎪⎪ 2+2 ⎪⎪ ⎪⎪ x−y′ ⎪⎪ ⎪⎪ 2− ⎪⎪ ⎪⎪ 2x−y−y′ ⎪⎪ ⎪⎪ 2≤ 4d2−4d2=0.

This shows the uniqueness of the nearest point.

Exercise* ‍7 The essential rôle of the parallelogram identity in the above proof indicates that the theorem does not hold in a general Banach space.
1. Show that in 2 with either norm ||·||1 or ||·|| form Example ‍9 the nearest point could be non-unique;
2. Could you construct an example (in Banach space) when the nearest point does not exists?

Liberte, Egalite, Fraternite!

A longstanding ideal approximated in the real life by something completely different

### 3.2 Bessel’s inequality

For the case then a convex subset is a subspace we could characterise the nearest point in the term of orthogonality.

Theorem ‍8 ‍(on Perpendicular) Let M be a subspace of a Hilbert space H and a point xH be fixed. Then zM is the nearest point to x if and only if xz is orthogonal to any vector in M.

Proof. Let z is the nearest point to x existing by the previous Theorem. We claim that xz orthogonal to any vector in M, otherwise there exists yM such that ⟨ xz,y ⟩≠ 0. Then

 ⎪⎪ ⎪⎪ x−z−є y ⎪⎪ ⎪⎪ 2
=
 ⎪⎪ ⎪⎪ x−z ⎪⎪ ⎪⎪ 2−2є ℜ⟨ x−z,y  ⟩+є2 ⎪⎪ ⎪⎪ y ⎪⎪ ⎪⎪ 2
<
 ⎪⎪ ⎪⎪ x−z ⎪⎪ ⎪⎪ 2,

if є is chosen to be small enough and such that є ℜ⟨ xz,y ⟩ is positive, see Figure ‍6(i). Therefore we get a contradiction with the statement that z is closest point to x.

On the other hand if xz is orthogonal to all vectors in H1 then particularly (xz)⊥ (zy) for all yH1, see Figure ‍6(ii). Since xy=(xz)+(zy) we got by the Pythagoras’ theorem:

 ⎪⎪ ⎪⎪ x−y ⎪⎪ ⎪⎪ 2= ⎪⎪ ⎪⎪ x−z ⎪⎪ ⎪⎪ 2 + ⎪⎪ ⎪⎪ z−y ⎪⎪ ⎪⎪ 2

So ||xy||2≥ ||xz||2 and the are equal if and only if z=y.

Exercise ‍9 The above proof does not work if xz,y is an imaginary number, what to do in this case?

Consider now a basic case of approximation: let xH be fixed and e1, …, en be orthonormal and denote H1=Lin{e1,…,en}. We could try to approximate x by a vector y1 e1+⋯ +λn enH1.

Corollary ‍10 The minimal value of ||xy|| for yH1 is achieved when y=∑1nx,eiei.

Proof. Let z=∑1nx,eiei, then ⟨ xz,ei ⟩=⟨ x,ei ⟩−⟨ z,ei ⟩=0. By the previous Theorem z is the nearest point to x.

Example ‍11
1. In 3 find the best approximation to (1,0,0) from the plane V:{x1+x2+x3=0}. We take an orthonormal basis e1=(2−1/2, −2−1/2,0), e2=(6−1/2, 6−1/2, −2· 6−1/2) of V (Check this!). Then:
z=⟨ x,e1  ⟩e1+⟨ x,e2  ⟩e2=

 1 2
,−
 1 2
,0

+

 1 6
,
 1 6
,−
 1 3

=

 2 3
,−
 1 3
,−
 1 3

.
2. In C[0,2π] what is the best approximation to f(t)=t by functions a+beit+ceit? Let
e0=
1
 2π
,    e1=
1
 2π
eit,    e−1=
1
 2π
eit.
We find:
⟨ f,e0  ⟩=
 2π ∫ 0
t
 2π
dt=

 t2 2
1
 2π

 2π 0
=
 2
π3/2;
⟨ f,e1  ⟩=
 2π ∫ 0
teit
 2π
dt=i
 2π
(Check this!)
⟨ f,e−1  ⟩=
 2π ∫ 0
teit
 2π
dt=−i
 2π
(Why we may not check this one?)
Then the best approximation is (see Figure ‍7):
f0(t)=⟨ f,e0  ⟩e0+⟨ f,e1  ⟩e1+⟨ f,e−1  ⟩e−1
=
 2
π3/2
 2π
+ieitieit=π−2sint.
Corollary ‍12 ‍(Bessel’s inequality) If (ei) is orthonormal then
⎪⎪
⎪⎪
x⎪⎪
⎪⎪
2≥
 n ∑ i=1

⟨ x,ei  ⟩
2.

Proof. Let z= ∑1nx,eiei then xzei for all i therefore by Exercise ‍4 xzz. Hence:

 ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ 2
=
 ⎪⎪ ⎪⎪ z ⎪⎪ ⎪⎪ 2+ ⎪⎪ ⎪⎪ x−z ⎪⎪ ⎪⎪ 2

⎪⎪
⎪⎪
z⎪⎪
⎪⎪
2=
 n ∑ i=1

⟨ x,ei  ⟩
2.

—Did you say “rice and fish for them”?

A student question

### 3.3 The Riesz–Fischer theorem

When (ei) is orthonormal we call ⟨ x,en ⟩ the nth Fourier coefficient of x (with respect to (ei), naturally).

Theorem ‍13 ‍(Riesz–Fisher) Let (en)1 be an orthonormal sequence in a Hilbert space H. Then 1λn en converges in H if and only if 1| λn |2 < ∞. In this case ||∑1λn en||2=∑1| λn |2.

Proof. Necessity: Let xk=∑1k λn en and x=limk→ ∞ xk. So ⟨ x,en ⟩=limk→ ∞xk,en ⟩=λn for all n. By the Bessel’s inequality for all k

⎪⎪
⎪⎪
x⎪⎪
⎪⎪
2≥
 k ∑ 1

⟨ x,en  ⟩
2=
 k ∑ 1

λn
2,

hence ∑1k | λn |2 converges and the sum is at most ||x||2.

Sufficiency: Consider ||xkxm||=||∑mk λn en||=(∑mk | λn |2)1/2 for k>m. Since ∑mk | λn |2 converges xk is a Cauchy sequence in H and thus has a limit x. By the Pythagoras’ theorem ||xk||2=∑1k | λn |2 thus for k→ ∞ ||x||2=∑1| λn |2 by the Lemma about inner product limit.

Observation: the closed linear span of an orthonormal sequence in any Hilbert space looks like l2, i.e. l2 is a universal model for a Hilbert space.

By Bessel’s inequality and the Riesz–Fisher theorem we know that the series ∑1x,eiei converges for any xH. What is its limit?

Let y=x− ∑1x,eiei, then

⟨ y,ek  ⟩=⟨ x,ek  ⟩−
 ∞ ∑ 1
⟨ x,ei  ⟩ ⟨ ei,ek  ⟩=⟨ x,ek  ⟩−⟨ x,ek  ⟩ =0   for all  k. (16)
Definition ‍14 An orthonormal sequence (ei) in a Hilbert space H is complete if the identities y,ek ⟩=0 for all k imply y=0.

A complete orthonormal sequence is also called orthonormal basis in H.

Theorem ‍15 ‍(on Orthonormal Basis) Let ei be an orthonormal basis in a Hilber space H. Then for any xH we have
x=
 ∞ ∑ n=1
⟨ x,en  ⟩en    and    ⎪⎪
⎪⎪
x⎪⎪
⎪⎪
2=
 ∞ ∑ n=1

⟨ x,en  ⟩
2.

Proof. By the Riesz–Fisher theorem, equation ‍(16) and definition of orthonormal basis.

There are constructive existence theorems in mathematics.

An example of pure existence statement

### 3.4 Construction of Orthonormal Sequences

Natural questions are: Do orthonormal sequences always exist? Could we construct them?

Theorem ‍16 ‍(Gram–Schmidt) Let (xi) be a sequence of linearly independent vectors in an inner product space V. Then there exists orthonormal sequence (ei) such that
 Lin{x1,x2,…,xn}=Lin{e1,e2,…,en},    for all  n.

Proof. We give an explicit algorithm working by induction. The base of induction: the first vector is e1=x1/||x1||. The step of induction: let e1, e2, …, en are already constructed as required. Let yn+1=xn+1−∑i=1nxn+1,eiei. Then by (16) yn+1ei for i=1,…,n. We may put en+1=yn+1/||yn+1|| because yn+1≠ 0 due to linear independence of xk’s. Also

 Lin{e1,e2,…,en+1} = Lin{e1,e2,…,yn+1} = Lin{e1,e2,…,xn+1} = Lin{x1,x2,…,xn+1}.

So (ei) are orthonormal sequence.

Example ‍17 Consider C[0,1] with the usual inner product ‍(13) and apply orthogonalisation to the sequence 1, x, x2, …. Because ||1||=1 then e1(x)=1. The continuation could be presented by the table:
e1(x)=1
y2(x)=x−⟨ x,1  ⟩1=x
 1 2
,    ⎪⎪
⎪⎪
y2⎪⎪
⎪⎪
2=
 1 ∫ 0
(x
 1 2
)2dx=
 1 12
,    e2(x)=
 12
(x
 1 2
)
y3(x)=x2−⟨ x2,1  ⟩1−⟨ x2,x
 1 2
⟩(x
 1 2
)· 12 ,   …,   e3=
y3
 ⎪⎪ ⎪⎪ y3 ⎪⎪ ⎪⎪
…  …  …
Example ‍18 Many famous sequences of orthogonal polynomials, e.g. Chebyshev, Legendre, Laguerre, Hermite, can be obtained by orthogonalisation of 1, x, x2, …with various inner products.
1. Legendre polynomials in C[−1,1] with inner product
⟨ f,g  ⟩=
 1 ∫ −1
f(t)
 g(t)
dt. (17)
2. Chebyshev polynomials in C[−1,1] with inner product
⟨ f,g  ⟩=
 1 ∫ −1
f(t)
 g(t)
dt
 1−t2
(18)
3. Laguerre polynomials in the space of polynomials P[0,∞) with inner product
⟨ f,g  ⟩=
 ∞ ∫ 0
f(t)
 g(t)
etdt.
See Figure ‍8 for the five first Legendre and Chebyshev polynomials. Observe the difference caused by the different inner products ‍(17) and ‍(18). On the other hand note the similarity in oscillating behaviour with different “frequencies”.

Another natural question is: When is an orthonormal sequence complete?

Proposition ‍19 Let (en) be an orthonormal sequence in a Hilbert space H. The following are equivalent:
1. (en) is an orthonormal basis.
2. CLin((en))=H.
3. ||x||2=∑1| ⟨ x,en ⟩ |2 for all xH.

Proof. Clearly 1 implies 2 because x=∑1x,enen in CLin((en)) and ||x||2=∑1x,enen by Theorem ‍15. The same theorem tells that 1 implies 3.

If (en) is not complete then there exists xH such that x≠ 0 and ⟨ x,ek ⟩=0 for all k, so 3 fails, consequently 3 implies 1.

Finally if ⟨ x,ek ⟩=0 for all k then ⟨ x,y ⟩=0 for all yLin((en)) and moreover for all yCLin((en)), by the Lemma on continuity of the inner product. But then xCLin((en)) and 2 also fails because ⟨ x,x ⟩=0 is not possible. Thus 2 implies 1.

Corollary ‍20 A separable Hilbert space (i.e. one with a countable dense set) can be identified with either l2n or l2, in other words it has an orthonormal basis (en) (finite or infinite) such that
x=
 ∞ ∑ n=1
⟨ x,en  ⟩en    and    ⎪⎪
⎪⎪
x⎪⎪
⎪⎪
2=
 ∞ ∑ n=1

⟨ x,en  ⟩
2.

Proof. Take a countable dense set (xk), then H=CLin((xk)), delete all vectors which are a linear combinations of preceding vectors, make orthonormalisation by Gram–Schmidt the remaining set and apply the previous proposition.

Most pleasant compliments are usually orthogonal to our real qualities.

An advise based on observations

### 3.5 Orthogonal complements

Orthogonality allow us split a Hilbert space into subspaces which will be “independent from each other” as much as possible.

Definition ‍21 Let M be a subspace of an inner product space V. The orthogonal complement, written M, of M is
 M⊥={x∈ V: ⟨ x,m  ⟩=0 ∀  m∈ M}.
Theorem ‍22 If M is a closed subspace of a Hilbert space H then M is a closed subspace too (hence a Hilbert space too).

Proof. Clearly M is a subspace of H because x, yM implies ax+byM:

 ⟨ ax+by,m  ⟩=    a⟨ x,m  ⟩+   b⟨ y,m  ⟩=0.

Also if all xnM and xnx then xM due to inner product limit Lemma.

Theorem ‍23 Let M be a closed subspace of a Hilber space H. Then for any xH there exists the unique decomposition x=m+n with mM, nM and ||x||2=||m||2+||n||2. Thus H=MM and (M)=M.

Proof. For a given x there exists the unique closest point m in M by the Theorem on nearest point and by the Theorem on perpendicular (xm)⊥ y for all yM.

So x= m + (xm)= m+n with mM and nM. The identity ||x||2=||m||2+||n||2 is just Pythagoras’ theorem and MM={0} because null vector is the only vector orthogonal to itself.

Finally (M)=M. We have H=MM=(M)M, for any x∈(M) there is a decomposition x=m+n with mM and nM, but then n is orthogonal to itself and therefore is zero.

## 4 Duality of Linear Spaces

Everything has another side

Orthonormal basis allows to reduce any question on Hilbert space to a question on sequence of numbers. This is powerful but sometimes heavy technique. Sometime we need a smaller and faster tool to study questions which are represented by a single number, for example to demonstrate that two vectors are different it is enough to show that there is a unequal values of a single coordinate. In such cases linear functionals are just what we needed.

–Is it functional?
–Yes, it works!

### 4.1 Dual space of a normed space

Definition ‍1 A linear functional on a vector space V is a linear mapping α: V→ ℂ (or α: V→ ℝ in the real case), i.e.
 α(ax+by)=aα(x)+bα(y),     for all   x,y∈ V  and   a,b∈ℂ.
Exercise ‍2 Show that α(0) is necessarily 0.

We will not consider any functionals but linear, thus below functional always means linear functional.

Example ‍3
1. Let V=ℂn and ck, k=1,…,n be complex numbers. Then α((x1,…,xn))=c1x1+⋯+c2x2 is a linear functional.
2. On C[0,1] a functional is given by α(f)=∫01 f(t) dt.
3. On a Hilbert space H for any xH a functional αx is given by αx(y)=⟨ y,x.
Theorem ‍4 Let V be a normed space and α is a linear functional. The following are equivalent:
1. α is continuous (at any point of V).
2. α is continuous at point 0.
3. sup{| α(x) |: ||x||≤ 1}< ∞, i.e. α is a bounded linear functional.

Proof. Implication 12 is trivial.

Show 23. By the definition of continuity: for any є>0 there exists δ>0 such that ||v||<δ implies | α(v)−α(0) |<є . Take є=1 then | α(δ x) |<1 for all x with norm less than 1 because ||δ x||< δ. But from linearity of α the inequality | α(δ x) |<1 implies | α(x) |<1/δ<∞ for all ||x||≤ 1.

31. Let mentioned supremum be M. For any x, yV such that xy vector (xy)/||xy|| has norm 1. Thus | α ((xy)/||xy||) |<M. By the linearity of α this implies that | α (x)−α(y) |<M||xy||. Thus α is continuous.

Definition ‍5 The dual space X* of a normed space X is the set of continuous linear functionals on X. Define a norm on it by
⎪⎪
⎪⎪
α⎪⎪
⎪⎪
=

sup
 ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ = 1

α(x)
.  (19)
Exercise ‍6
1. Show that the chain of inequalities:
⎪⎪
⎪⎪
α⎪⎪
⎪⎪
≤

sup
 ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ ≤ 1

α(x)
≤
 sup x ≠ 0
 ⎪ ⎪ α(x) ⎪ ⎪
 ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪
≤ ⎪⎪
⎪⎪
α ⎪⎪
⎪⎪
.
Deduce that any of the mentioned supremums deliver the norm of α. Which of them you will prefer if you need to show boundedness of α? Which of them is better to use if boundedness of α is given?
2. Show that | α(x) |≤ ||α||·||x|| for all xX, α ∈ X*.

The important observations is that linear functionals form a normed space as follows:

Exercise ‍7
1. Show that X* is a linear space with natural (point-wise) operations.
2. Show that ‍(19) defines a norm on X*.

Furthermeore, X* is always complete, regardless of properties of X!

Theorem ‍8 X* is a Banach space with the defined norm (even if X was incomplete).

Proof. Due to Exercise ‍7 we only need to show that X* is complete. Let (αn) be a Cauchy sequence in X*, then for any xX scalars αn(x) form a Cauchy sequence, since | αm(x)−αn(x) |≤||αm−αn||·||x||. Thus the sequence has a limit and we define α by α(x)=limn→∞αn(x). Clearly α is a linear functional on X. We should show that it is bounded and αn→ α. Given є>0 there exists N such that ||αn−αm||<є for all n, mN. If ||x||≤ 1 then | αn(x)−αm(x) |≤ є, let m→∞ then | αn(x)−α(x) |≤ є, so

 ⎪ ⎪ α(x) ⎪ ⎪ ≤ ⎪ ⎪ αn(x) ⎪ ⎪ +є≤ ⎪⎪ ⎪⎪ αn ⎪⎪ ⎪⎪ + є,

i.e. ||α|| is finite and ||αn−α||≤ є, thus αn→α.

Definition ‍9 The kernel of linear functional α, write kerα, is the set all vectors xX such that α(x)=0.
Exercise ‍10 Show that
1. kerα is a subspace of X.
2. If α≢0 then obviously kerα ≠ X. Furthermore, if X has at least two linearly independent vectors then kerα ≠ {0}, thus kerα is a proper subspace of X.
3. If α is continuous then kerα is closed.

Study one and get any other for free!

Hilbert spaces sale

### 4.2 Self-duality of Hilbert space

Lemma ‍11 ‍(Riesz–Fréchet) Let H be a Hilbert space and α a continuous linear functional on H, then there exists the unique yH such that α(x)=⟨ x,y for all xH. Also ||α||H*=||y||H.

Proof. Uniqueness: if ⟨ x,y ⟩=⟨ x,y′ ⟩ ⇔ ⟨ x,yy′ ⟩=0 for all xH then yy′ is self-orthogonal and thus is zero (Exercise ‍1).

Existence: we may assume that α≢0 (otherwise take y=0), then M=kerα is a closed proper subspace of H. Since H=MM, there exists a non-zero zM, by scaling we could get α(z)=1. Then for any xH:

 x=(x−α(x)z)+α(x)z,     with x−α(x)z∈ M, α(x)z∈ M⊥.

Because ⟨ x,z ⟩=α(x)⟨ z,z ⟩=α(x)||z||2 for any xH we set y=z/||z||2.

Equality of the norms ||α||H*=||y||H follows from the Cauchy–Bunyakovskii–Schwarz inequality in the form α(x)≤ ||x||·||y|| and the identity α(y/||y||)=||y||.

Example ‍12 On L2[0,1] let α(f)=⟨ f,t2 ⟩=∫01 f(t)t2dt. Then
⎪⎪
⎪⎪
α⎪⎪
⎪⎪
=⎪⎪
⎪⎪
t2⎪⎪
⎪⎪
=

 1 ∫ 0
(t2)2dt

 1/2
=
1
 5
.

## 5 Fourier Analysis

All bases are equal, but some are more equal then others.

As we saw already any separable Hilbert space posses an orthonormal basis (infinitely many of them indeed). Are they equally good? This depends from our purposes. For solution of differential equation which arose in mathematical physics (wave, heat, Laplace equations, etc.) there is a proffered choice. The fundamental formula: d/dx eax=aeax reduces the derivative to a multiplication by a. We could benefit from this observation if the orthonormal basis will be constructed out of exponents. This helps to solve differential equations as was demonstrated in Subsection ‍1.2.

7.40pm Fourier series: Episode II

Today’s TV listing

### 5.1 Fourier series

Now we wish to address questions stated in Remark ‍9. Let us consider the space L2[−π,π]. As we saw in Example ‍3 there is an orthonormal sequence en(t)=(2π)−1/2eint in L2[−π,π]. We will show that it is an orthonormal basis, i.e.

f(t)∈ L2[−π,π]  ⇔   f(t)=
 ∞ ∑ k=−∞
⟨ f,ek  ⟩ek(t),

with convergence in L2 norm. To do this we show that CLin{ek:k∈ℤ}=L2[−π,π].

Let CP[−π,π] denote the continuous functions f on [−π,π] such that f(π)=f(−π). We also define f outside of the interval [−π,π] by periodicity.

Lemma ‍1 The space CP[−π,π] is dense in L2[−π,π].

Proof. Let fL2[−π,π]. Given є>0 there exists gC[−π,π] such that ||fg||<є/2. From continuity of g on a compact set follows that there is M such that | g(t) |<M for all t∈[−π,π].

We can now replace g by periodic g′, which coincides with g on [−π,π−δ] for an arbitrary δ>0 and has the same bounds: | g′(t) |<M, see Figure ‍9. Then

⎪⎪
⎪⎪
gg⎪⎪
⎪⎪
22=
 π ∫ π−δ

g(t)−g′(t)
2dt ≤ (2M)2δ.

So if δ<є2/(4M)2 then ||gg′||<є/2 and ||fg′||<є.

Now if we could show that CLin{ek: k ∈ ℤ} includes CP[−π,π] then it also includes L2[−π,π].

Notation ‍2 Let fCP[−π,π],write
fn=
 n ∑ k=−n
⟨ f,ek  ⟩ ek ,   for   n=0,1,2,… (20)
the partial sum of the Fourier series for f.

We want to show that ||ffn||2→ 0. To this end we define nth Fejér sum by the formula

Fn=
 f0+f1+⋯+fn n+1
, (21)

and show that

 ⎪⎪ ⎪⎪ Fn−f ⎪⎪ ⎪⎪ ∞ → 0.

Then we conclude

⎪⎪
⎪⎪
Fnf⎪⎪
⎪⎪
2=

 π ∫ −π

Fn(t)−f
2

 1/2
≤ (2π)1/2 ⎪⎪
⎪⎪
Fnf⎪⎪
⎪⎪
→ 0.

Since FnLin((en)) then fCLin((en)) and hence f=∑−∞f,ekek.

Remark ‍3 It is not always true that ||fnf||→ 0 even for fCP[−π,π].
Exercise ‍4 Find an example illustrating the above Remark.

The summation method used in ‍(21) us useful not only in the context of Fourier series but for many other cases as well. In such a wider framework the method is known as .

It took 19 years of his life to prove this theorem

### 5.2 Fejér’s theorem

Proposition ‍5 ‍(Fejér, age 19) Let fCP[−π,π]. Then

Fn(x)=
 1 2π
 π ∫ −π
f(t) Kn(xt)  dt,     where   ‍
(22)
Kn(t)=
 1 n+1
 n ∑ k=0
 k ∑ m=−k
eimt,  ‍
(23)
is the Fejér kernel.

Proof. From notation ‍(20):

fk(x)=
 k ∑ m=−k
⟨ f,em  ⟩ em(x)
=
 k ∑ m=−k
 π ∫ −π
f(t)
eimt
 2π
dt
eimx
 2π
=
 1 2π
 π ∫ −π
f(t)
 k ∑ m=−k
eim(xt)dt.

Then from ‍(21):

Fn(x)=
 1 n+1
 n ∑ k=0
fk(x)
=
 1 n+1
 1 2π
 n ∑ k=0
 π ∫ −π
f(t)
 k ∑ m=−k
eim(xt)dt
=
 1 2π
 π ∫ −π
f(t)
 1 n+1
 n ∑ k=0
 k ∑ m=−k
eim(xt)dt,

which finishes the proof.

Lemma ‍6 The Fejér kernel is -periodic, Kn(0)=n+1 and:
Kn(t)=
 1 n+1
sin2
 (n+1)t 2
sin2
 t 2
,    for  t∉2πℤ. (24)

 1 z−1 1 z z−2 z−1 1 z z2 ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
 Table 1: Counting powers in rows and columns

Proof. Let z=eit, then:

Kn(t)=
 1 n+1
 n ∑ k=0
(zk+⋯+1+z+⋯+zk)
=
 1 n+1
 n ∑ j=−n
 (n+1− ⎪ ⎪ j ⎪ ⎪ )
zj,

by switch from counting in rows to counting in columns in Table ‍1. Let w=eit/2, i.e. z=w2, then

Kn(t)=
 1 n+1
(w−2n+2w−2n+2+⋯+(n+1)+nw2+⋯+w2n)

=
 1 n+1
(wn+wn+2+⋯+wn−2+wn)2 ‍
(25)
=
 1 n+1

 w−n−1−wn+1 w−1−w

 2
Could you sum a geometric progression?

=
 1 n+1

2isin
 (n+1)t 2
2isin
 t 2

 2
,

if w≠ ± 1. For the value of Kn(0) we substitute w=1 into ‍(25).

The first eleven Fejér kernels are shown on Figure ‍10, we could observe that:

Lemma ‍7 Fejér’s kernel has the following properties:
1. Kn(t)≥0 for all t∈ ℝ and n∈ℕ.
2. −ππKn(t) dt=2π.
3. For any δ∈ (0,π)

 −δ ∫ −π
+
 π ∫ δ
Kn(t) dt → 0    as    n→ ∞.

Proof. The first property immediately follows from the explicit formula ‍(24). In contrast the second property is easier to deduce from expression with double sum ‍(23):

 π ∫ −π
Kn(t) dt
=
 π ∫ −π
 1 n+1
 n ∑ k=0
 k ∑ m=−k
eimtdt
=

 1 n+1
 n ∑ k=0
 k ∑ m=−k
 π ∫ −π
eimtdt
=

 1 n+1
 n ∑ k=0
2π
=2π,

since the formula ‍(15).

Finally if | t |>δ then sin2(t/2)≥ sin2(δ/2)>0 by monotonicity of sinus on [0,π/2], so:

0≤ Kn(t) ≤
 1 (n+1) sin2(δ/2)

implying:

0≤

 δ≤ ⎪ ⎪ t ⎪ ⎪ ≤ π
Kn(t)  dt ≤
 1(π−δ) (n+1) sin2(δ/2)
→ 0   as  n→ 0.

Therefore the third property follows from the squeeze rule.

Theorem ‍8 ‍(Fejér Theorem) Let fCP[−π,π]. Then its Fejér sums Fn ‍(21) converges in supremum norm to f on [−π,π] and hence in L2 norm as well.

Proof. Idea of the proof: if in the formula ‍(22)

Fn(x)=
 1 2π
 π ∫ −π
f(t) Kn(xt)  dt,

t is long way from x, Kn is small (see Lemma ‍7 and Figure ‍10), for t near x, Kn is big with total “weight” 2π, so the weighted average of f(t) is near f(x).

Here are details. Using property ‍2 and periodicity of f and Kn we could express trivially

f(x)= f(x)
 1 2π
 x+π ∫ x−π
Kn(xt)  dt  =
 1 2π
 x+π ∫ x−π
f(x) Kn(xt)  dt.

Similarly we rewrite ‍(22) as

Fn(x)=
 1 2π
 x+π ∫ x−π
f(t) Kn(xt)  dt,

then

 ⎪ ⎪ f(x)−Fn(x) ⎪ ⎪
=

 1 2π

 x+π ∫ x−π
(f(x)−f(t)) Kn(xt)  dt

 1 2π
 x+π ∫ x−π

f(x)−f(t)
Kn(xt)  dt.

Given є>0 split into three intervals: I1=[x−π,x−δ], I2=[x−δ,x+δ], I3=[x+δ,x+π], where δ is chosen such that | f(t)−f(x) |<є/2 for tI2, which is possible by continuity of f. So

 1 2π
 I2

f(x)−f(t)
Kn(xt)  dt
 є 2
 1 2π
 I2
Kn(xt)  dt <
 є 2
.

And

 1 2π
 I1⋃ I3

f(x)−f(t)
Kn(xt)  dt
2⎪⎪
⎪⎪
f⎪⎪
⎪⎪
 1 2π
 I1⋃ I3
Kn(xt)  dt
=
 ⎪⎪ ⎪⎪ f ⎪⎪ ⎪⎪ ∞
π

 δ< ⎪ ⎪ u ⎪ ⎪ <π
Kn(u)  du
<
 є 2
,

if n is sufficiently large due to property ‍3 of Kn. Hence | f(x)−Fn(x) |<є for a large n independent of x.

Remark ‍9 The above properties ‍13 and their usage in the last proof can be generalised to the concept of approximation of the identity. See § ‍15.4 for a further example.

We almost finished the demonstration that en(t)=(2π)−1/2eint is an orthonormal basis of L2[−π,π]:

Corollary ‍10 ‍(Fourier series) Let fL2[−π,π], with Fourier series

 ∞ ∑ n=−∞
⟨ f,en  ⟩en=
 ∞ ∑ n=−∞
cneint   where  cn=
⟨ f,en  ⟩
 2π
=
1
 2π
 π ∫ −π
f(t)eintdt.
Then the series −∞f,enen=∑−∞cneint converges in L2[−π,π] to f, i.e

 lim k→ ∞
⎪⎪
⎪⎪
⎪⎪
⎪⎪
f
 k ∑ n=−k
cneint⎪⎪
⎪⎪
⎪⎪
⎪⎪
2=0.

Proof. This follows from the previous Theorem, Lemma 1 about density of CP in L2, and Theorem ‍15 on orthonormal basis.

### 5.3 Parseval’s formula

The following result first appeared in the framework of L2[−π,π] and only later was understood to be a general property of inner product spaces.

Theorem ‍11 ‍(Parseval’s formula) If f, gL2[−π,π] have Fourier series f=∑n=−∞cneint and g=∑n=−∞dneint, then
⟨ f,g  ⟩=
 π ∫ −π
f(t)
 g(t)
dt=2π
 ∞ ∑ −∞
cn
 dn
. (26)

More generally if f and g are two vectors of a Hilbert space H with an orthonormal basis (en)−∞ then

⟨ f,g  ⟩=
 ∞ ∑ k=−∞
cn
 dn
,    where  cn=⟨ f,en  ⟩, dn=⟨ g,en  ⟩,

are the Fourier coefficients of f and g.

Proof. In fact we could just prove the second, more general, statement—the first one is its particular realisation. Let fn=∑k=−nn ckek and gn=∑k=−nn dkek will be partial sums of the corresponding Fourier series. Then from orthonormality of (en) and linearity of the inner product:

⟨ fn,gn  ⟩=⟨
 n ∑ k=−n
ckek,
 n ∑ k=−n
dkek  ⟩=
 n ∑ k=−n
ck
 dk
.

This formula together with the facts that fkf and gkg (following from Corollary ‍10) and Lemma about continuity of the inner product implies the assertion.

Corollary ‍12 A integrable function f belongs to L2[−π,π] if and only if its Fourier series is convergent and then ||f||2=2π∑−∞| ck |2.

Proof. The necessity, i.e. implication fL2 ⇒ ⟨ f,f ⟩=||f||2=2π∑| ck |2 , follows from the previous Theorem. The sufficiency follows by Riesz–Fisher Theorem.

Remark ‍13 The actual rôle of the Parseval’s formula is shadowed by the orthonormality and is rarely recognised until we meet the wavelets or coherent states. Indeed the equality ‍(26) should be read as follows:
Theorem ‍14 ‍(Modified Parseval) The map W: Hl2 given by the formula [Wf](n)=⟨ f,en ⟩ is an isometry for any orthonormal basis (en).
We could find many other systems of vectors (ex), xX (very different from orthonormal bases) such that the map W: HL2(X) given by the simple universal formula
 [Wf](x)=⟨ f,ex  ⟩ (27)
will be an isometry of Hilbert spaces. The map ‍(27) is oftenly called wavelet transform and most famous is the Cauchy integral formula in complex analysis. The majority of wavelets transforms are linked with group representations, see our postgraduate course Wavelets in Applied and Pure Maths.

Heat and noise but not a fire?

Answer:

### 5.4 Some Application of Fourier Series

We are going to provide now few examples which demonstrate the importance of the Fourier series in many questions. The first two (Example ‍15 and Theorem ‍16) belong to pure mathematics and last two are of more applicable nature.

Example ‍15 Let f(t)=t on [−π,π]. Then
⟨ f,en  ⟩=
 π ∫ −π
teintdt=

(−1)n
 2π i n
,
n≠ 0
0,n=0
(check!),
so f(t)∼ ∑−∞(−1)n (i/n) eint. By a direct integration:
⎪⎪
⎪⎪
f⎪⎪
⎪⎪
22=
 π ∫ −π
t2dt=
 2π3 3
.
On the other hand by the previous Corollary:
⎪⎪
⎪⎪
f⎪⎪
⎪⎪
22=2π
 ∑ n≠ 0

 (−1)ni n

2=4π
 ∞ ∑ n=1
 1 n2
.
Thus we get a beautiful formula

 ∞ ∑ 1
 1 n2
=
 π2 6
.

Here is another important result.

Theorem ‍16 ‍(Weierstrass Approximation Theorem) For any function fC[a,b] and any є>0 there exists a polynomial p such that ||fp||.

Proof. Change variable: t=2π(xa+b/2)/(ba) this maps x∈[a,b] onto t∈[−π,π]. Let P denote the subspace of polynomials in C[−π,π]. Then eint\$P_^\$ for any n∈ℤ since Taylor series converges uniformly in [−π,π]. Consequently P contains the closed linear span in (supremum norm) of eint, any n∈ℤ, which is CP[−π,π] by the Fejér theorem. Thus \$P_^\$CP[−π,π] and we extend that to non-periodic function as follows (why we could not make use of Lemma ‍1 here, by the way?).

For any fC[−π,π] let λ=(f(π)−f(−π))/(2π) then f1(t)=f(t)−λ tCP[−π,π] and could be approximated by a polynomial p1(t) from the above discussion. Then f(t) is approximated by the polynomial p(t)=p1(t)+λ t.

It is easy to see, that the rôle of exponents eint in the above prove is rather modest: they can be replaced by any functions which has a Taylor expansion. The real glory of the Fourier analysis is demonstrated in the two following examples.

Example ‍17 The modern history of the Fourier analysis starts from the works of Fourier on the heat equation. As was mentioned in the introduction to this part, the exceptional role of Fourier coefficients for differential equations is explained by the simple formula x einx= ineinx. We shortly review a solution of the heat equation to illustrate this.

Let we have a rod of the length . The temperature at its point x∈[−π,π] and a moment t∈[0,∞) is described by a function u(t,x) on [0,∞)×[−π,π]. The mathematical equation describing a dynamics of the temperature distribution is:

 ∂ u(t,x) ∂ t
=
 ∂2 u(t,x) ∂ x2
or, equivalently,
t−∂x2
u(t,x)=0.  (28)

For any fixed moment t0 the function u(t0,x) depends only from x∈[−π,π] and according to Corollary ‍10 could be represented by its Fourier series:

u(t0,x)=
 ∞ ∑ n=−∞
⟨ u,en  ⟩en=
 ∞ ∑ n=−∞
cn(t0)einx,

where

cn(t0)=
⟨ u,en  ⟩
 2π
=
1
 2π
 π ∫ −π
u(t0,x)einxdx,

with Fourier coefficients cn(t0) depending from t0. We substitute that decomposition into the heat equation ‍(28) to receive:

 ⎛ ⎝ ∂t−∂x2 ⎞ ⎠ u(t,x)
=
t−∂x2
 ∞ ∑ n=−∞
cn(t)einx

=
 ∞ ∑ n=−∞

t−∂x2
cn(t)einx

=
 ∞ ∑ n=−∞
(cn(t)+n2cn(t))einx=0 ‍.
(29)

Since function einx form a basis the last equation ‍(29) holds if and only if

 c′n(t)+n2cn(t)=0   for all  n  and  t.  (30)

Equations from the system ‍(30) have general solutions of the form:

 cn(t)=cn(0)e−n2t    for all  t∈[0,∞), (31)

producing a general solution of the heat equation ‍(28) in the form:

u(t,x)=
 ∞ ∑ n=−∞
cn(0)en2teinx =
 ∞ ∑ n=−∞
cn(0)en2t+inx, (32)

where constant cn(0) could be defined from boundary condition. For example, if it is known that the initial distribution of temperature was u(0,x)=g(x) for a function g(x)∈L2[−π,π] then cn(0) is the n-th Fourier coefficient of g(x).

The general solution ‍(32) helps produce both the analytical study of the heat equation ‍(28) and numerical simulation. For example, from ‍(32) obviously follows that

• the temperature is rapidly relaxing toward the thermal equilibrium with the temperature given by c0(0), however never reach it within a finite time;
• the “higher frequencies” (bigger thermal gradients) have a bigger speed of relaxation; etc.

The example of numerical simulation for the initial value problem with g(x)=2cos(2*u) + 1.5sin(u). It is clearly illustrate our above conclusions.

Example ‍18 Among the oldest periodic functions in human culture are acoustic waves of musical tones. The mathematical theory of musics (including rudiments of the Fourier analysis!) is as old as mathematics itself and was highly respected already in Pythagoras’ school more 2500 years ago.

The earliest observations are that

1. The musical sounds are made of pure harmonics (see the blue and green graphs on the Figure ‍12), in our language cos and sin functions form a basis;
2. Not every two pure harmonics are compatible, to be their frequencies should make a simple ratio. Otherwise the dissonance (red graph on Figure ‍12) appears.

The musical tone, say G5, performed on different instruments clearly has something in common and different, see Figure ‍13 for comparisons. The decomposition into the pure harmonics, i.e. finding Fourier coefficient for the signal, could provide the complete characterisation, see Figure ‍14.

The Fourier analysis tells that:

1. All sound have the same base (i.e. the lowest) frequencies which corresponds to the G5 tone, i.e. 788 Gz.
2. The higher frequencies, which are necessarily are multiples of 788 Gz to avoid dissonance, appears with different weights for different instruments.

The Fourier analysis is very useful in the signal processing and is indeed the fundamental tool. However it is not universal and has very serious limitations. Consider the simple case of the signals plotted on the Figure ‍15(a) and ‍(b). They are both made out of same two pure harmonics:

1. On the first signal the two harmonics (drawn in blue and green) follow one after another in time on Figure ‍15(a);
2. They just blended in equal proportions over the whole interval on Figure ‍15(b).

This appear to be two very different signals. However the Fourier performed over the whole interval does not seems to be very different, see Figure ‍15(c). Both transforms (drawn in blue-green and pink) have two major pikes corresponding to the pure frequencies. It is not very easy to extract differences between signals from their Fourier transform (yet this should be possible according to our study).

Even a better picture could be obtained if we use windowed Fourier transform, namely use a sliding “window” of the constant width instead of the entire interval for the Fourier transform. Yet even better analysis could be obtained by means of wavelets already mentioned in Remark ‍13 in connection with Plancherel’s formula. Roughly, wavelets correspond to a sliding window of a variable size—narrow for high frequencies and wide for low.

## 6 Operators

All the space’s a stage,
and all functionals and operators merely players!

All our previous considerations were only a preparation of the stage and now the main actors come forward to perform a play. The vectors spaces are not so interesting while we consider them in statics, what really make them exciting is the their transformations. The natural first steps is to consider transformations which respect both linear structure and the norm.

### 6.1 Linear operators

Definition ‍1 A linear operator T between two normed spaces X and Y is a mapping T:XY such that Tv + µ u)=λ T(v) + µ T(u). The kernel of linear operator kerT and image are defined by
 kerT ={x∈ X: Tx=0}    Im T={y∈ Y: y=Tx,  for some x∈ X}.
Exercise ‍2 Show that kernel of T is a linear subspace of X and image of T is a linear subspace of Y.

As usual we are interested also in connections with the second (topological) structure:

Definition ‍3 A norm of linear operator is defined:
 ⎪⎪ ⎪⎪ T ⎪⎪ ⎪⎪ =sup{ ⎪⎪ ⎪⎪ Tx ⎪⎪ ⎪⎪ Y: ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ X≤ 1}. (33)

T is a bounded linear operator if ||T||=sup{||Tx||: ||x||}<∞.

Exercise ‍4 Show that ||Tx||≤ ||T||·||x|| for all xX.
Example ‍5 Consider the following examples and determine kernel and images of the mentioned operators.
1. On a normed space X define the zero operator to a space Y by Z: x→ 0 for all xX. Its norm is 0.
2. On a normed space X define the identity operator by IX: xx for all xX. Its norm is 1.
3. On a normed space X any linear functional define a linear operator from X to , its norm as operator is the same as functional.
4. The set of operators from n to m is given by n× m matrices which acts on vector by the matrix multiplication. All linear operators on finite-dimensional spaces are bounded.
5. On l2, let S(x1,x2,…)=(0,x1,x2,…) be the right shift operator. Clearly ||Sx||=||x|| for all x, so ||S||=1.
6. On L2[a,b], let w(t)∈ C[a,b] and define multiplication operator Mwf by (Mw f)(t)=w(t)f(t). Now:
 ⎪⎪ ⎪⎪ Mw f ⎪⎪ ⎪⎪ 2
=
 b ∫ a

w(t)
2
f(t)
2dt

K2
 b ∫ a

f(t)
2dt,   where   K=⎪⎪
⎪⎪
w⎪⎪
⎪⎪
=
 sup [a,b]

w(t)
,
so ||Mw||≤ K.
Exercise ‍6 Show that for multiplication operator in fact there is the equality of norms ||Mw||2= ||w(t)||.
Theorem ‍7 Let T: XY be a linear operator. The following conditions are equivalent:
1. T is continuous on X;
2. T is continuous at the point 0.
3. T is a bounded linear operator.

Proof. Proof essentially follows the proof of similar Theorem ‍4.

### 6.2 Orthoprojections

Here we will use orthogonal complement, see § ‍3.5, to introduce a class of linear operators—orthogonal projections. Despite of (or rather due to) their extreme simplicity these operators are among most frequently used tools in the theory of Hilbert spaces.

Corollary ‍8 ‍(of Thm. ‍23, about Orthoprojection) Let M be a closed linear subspace of a hilbert space H. There is a linear map PM from H onto M (the orthogonal projection or orthoprojection) such that
 PM2=PM,    kerPM=M⊥,    PM⊥=I−PM. (34)

Proof. Let us define PM(x)=m where x=m+n is the decomposition from the previous theorem. The linearity of this operator follows from the fact that both M and M are linear subspaces. Also PM(m)=m for all mM and the image of PM is M. Thus PM2=PM. Also if PM(x)=0 then xM, i.e. kerPM=M. Similarly PM(x)=n where x=m+n and PM+PM=I.

Example ‍9 Let (en) be an orthonormal basis in a Hilber space and let S⊂ ℕ be fixed. Let M=CLin{en: nS} and M=CLin{en:n∈ ℕ∖ S}. Then

 ∞ ∑ k=1
akek =
 ∑ k∈ S
akek +
 ∑ k∉S
akek.
Remark ‍10 In fact there is a one-to-one correspondence between closed linear subspaces of a Hilber space H and orthogonal projections defined by identities ‍(34).

### 6.3 B(H) as a Banach space (and even algebra)

Theorem ‍11 Let B(X,Y) be the space of bounded linear operators from X and Y with the norm defined above. If Y is complete, then B(X,Y) is a Banach space.

Proof. The proof repeat proof of the Theorem ‍8, which is a particular case of the present theorem for Y=ℂ, see Example ‍3.

Theorem ‍12 Let TB(X,Y) and SB(Y,Z), where X, Y, and Z are normed spaces. Then STB(X,Z) and ||ST||≤||S||||T||.

Proof. Clearly (ST)x=S(Tx)∈ Z, and

 ⎪⎪ ⎪⎪ STx ⎪⎪ ⎪⎪ ≤ ⎪⎪ ⎪⎪ S ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ Tx ⎪⎪ ⎪⎪ ≤ ⎪⎪ ⎪⎪ S ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ T ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ ,

which implies norm estimation if ||x||≤1.

Corollary ‍13 Let TB(X,X)=B(X), where X is a normed space. Then for any n≥ 1, TnB(X) and ||Tn||≤ ||T||n.

Proof. It is induction by n with the trivial base n=1 and the step following from the previous theorem.

Remark ‍14 Some texts use notations L(X,Y) and L(X) instead of ours B(X,Y) and B(X).
Definition ‍15 Let TB(X,Y). We say T is an invertible operator if there exists SB(Y,X) such that
 ST= IX   and    TS=IY.
Such an S is called the inverse operator of T.
Exercise ‍16 Show that
1. for an invertible operator T:XY we have ker T={0} and T=Y.
2. the inverse operator is unique (if exists at all). (Assume existence of S and S, then consider operator STS.)
Example ‍17 We consider inverses to operators from Exercise ‍5.
1. The zero operator is never invertible unless the pathological spaces X=Y={0}.
2. The identity operator IX is the inverse of itself.
3. A linear functional is not invertible unless it is non-zero and X is one dimensional.
4. An operator n→ ℂm is invertible if and only if m=n and corresponding square matrix is non-singular, i.e. has non-zero determinant.
5. The right shift S is not invertible on l2 (it is one-to-one but is not onto). But the left shift operator T(x1,x2,…)=(x2,x3,…) is its left inverse, i.e. TS=I but TSI since ST(1,0,0,…)=(0,0,…). T is not invertible either (it is onto but not one-to-one), however S is its right inverse.
6. Operator of multiplication Mw is invertible if and only if w−1C[a,b] and inverse is Mw−1. For example M1+t is invertible L2[0,1] and Mt is not.

### 6.4 Adjoints

Theorem ‍18 Let H and K be Hilbert Spaces and TB(H,K). Then there exists operator T*B(K,H) such that
 ⟨ Th,k  ⟩K=⟨ h,T*k  ⟩H    for all  h∈ H, k∈ K.
Such T* is called the adjoint operator of T. Also T**=T and ||T*||=||T||.

Proof. For any fixed kK the expression h:→ ⟨ Th,kK defines a bounded linear functional on H. By the Riesz–Fréchet lemma there is a unique yH such that ⟨ Th,kK=⟨ h,yH for all hH. Define T* k =y then T* is linear:

 ⟨ h,T*(λ1k1+λ2k2)  ⟩H = ⟨ Th,λ1k1+λ2k2  ⟩K = λ1⟨ Th,k1  ⟩K+λ2⟨ Th,k2  ⟩K = λ1⟨ h,T*k1  ⟩H+λ2⟨ h,T*k2  ⟩K = ⟨ h,λ1T*k1+λ2T*k2  ⟩H

So T*1k12k2)=λ1T*k12T*k2. T** is defined by ⟨ k,T**h ⟩=⟨ T*k,h ⟩ and the identity ⟨ T**h,k ⟩=⟨ h,T*k ⟩=⟨ Th,k ⟩ for all h and k shows T**=T. Also:

 ⎪⎪ ⎪⎪ T* k ⎪⎪ ⎪⎪ 2
=⟨ T*k,T*k  ⟩=⟨ k,TT*k  ⟩

 ⎪⎪ ⎪⎪ k ⎪⎪ ⎪⎪ · ⎪⎪ ⎪⎪ TT*k ⎪⎪ ⎪⎪ ≤ ⎪⎪ ⎪⎪ k ⎪⎪ ⎪⎪ · ⎪⎪ ⎪⎪ T ⎪⎪ ⎪⎪ · ⎪⎪ ⎪⎪ T*k ⎪⎪ ⎪⎪ ,

which implies ||T*k||≤||T||·||k||, consequently ||T*||≤||T||. The opposite inequality follows from the identity ||T||=||T**||.

Exercise ‍19
1. For operators T1 and T2 show that
 (T1T2)*=T2*T1*,   (T1+T2)*=T1*+T2*   (λ T)*=λT*.
2. If A is an operator on a Hilbert space H then (kerA)= Im A*.

### 6.5 Hermitian, unitary and normal operators

Definition ‍20 An operator T: HH is a Hermitian operator or self-adjoint operator if T=T*, i.e. Tx,y ⟩=⟨ x,Ty for all x, yH.
Example ‍21
1. On l2 the adjoint S* to the right shift operator S is given by the left shift S*=T, indeed:
 ⟨ Sx,y  ⟩ = ⟨ (0,x1,x2,…),(y1,y2,…)  ⟩ = x1ȳ2+x2y_3+⋯=⟨ (x1,x2,…),(y2,y3,…)  ⟩ = ⟨ x,Ty  ⟩.
Thus S is not Hermitian.
2. Let D be diagonal operator on l2 given by
 D(x1,x2,…)=(λ1 x1, λ2 x2, …).
where k) is any bounded complex sequence. It is easy to check that ||D||=||(λn)||=supk| λk | and
 D* (x1,x2,…)=(λ1 x1, λ2 x2, …),
thus D is Hermitian if and only if λk∈ℝ for all k.
3. If T: ℂn→ ℂn is represented by multiplication of a column vector by a matrix A, then T* is multiplication by the matrix A*—transpose and conjugate to A.
Exercise ‍22 Show that for any bounded operator T operators Tr=1/2(T+ T*), Ti=1/2i(TT*), T*T and TT* are Hermitians. Note, that any operator is the linear combination of two hermitian operators: T=Tr+i Ti (cf. z= ℜ z + iz for z∈ℂ).

To appreciate the next Theorem the following exercise is useful:

Exercise ‍23 Let H be a Hilbert space. Show that
1. For xH we have ||x||= sup { | ⟨ x,y ⟩ | for all yH such that ||y||=1}.
2. For TB(H) we have
 ⎪⎪ ⎪⎪ T ⎪⎪ ⎪⎪ = sup { ⎪ ⎪ ⟨ Tx,y  ⟩ ⎪ ⎪ for all  x,y∈ H  such that ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ = ⎪⎪ ⎪⎪ y ⎪⎪ ⎪⎪ =1}. (35)

The next theorem says, that for a Hermitian operator T the supremum in ‍(35) may be taken over the “diagonal” x=y only.

Theorem ‍24 Let T be a Hermitian operator on a Hilbert space. Then
⎪⎪
⎪⎪
T⎪⎪
⎪⎪
=

sup
 ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ = 1

⟨ Tx,x  ⟩
.

Proof. If Tx=0 for all xH, both sides of the identity are 0. So we suppose that ∃ xH for which Tx≠ 0.

We see that | ⟨ Tx,x ⟩ |≤ ||Tx||||x|| ≤ ||T||||x2||, so sup||x|| =1 | ⟨ Tx,x ⟩ |≤ ||T||. To get the inequality the other way around, we first write s:=sup||x|| =1 | ⟨ Tx,x ⟩ |. Then for any xH, we have | ⟨ Tx,x ⟩ |≤ s||x2||.

We now consider

 ⟨ T(x+y),x+y  ⟩ =⟨ Tx,x  ⟩ +⟨ Tx,y  ⟩+⟨ Ty,x  ⟩ +⟨ Ty,y  ⟩ =  ⟨ Tx,x  ⟩ +2ℜ ⟨ Tx,y  ⟩ +⟨ Ty,y  ⟩

(because T being Hermitian gives ⟨ Ty,x ⟩=⟨ y,Tx ⟩ =Tx,y) and, similarly,

 ⟨ T(x−y),x−y  ⟩ = ⟨ Tx,x  ⟩ −2ℜ ⟨ Tx,y  ⟩ +⟨ Ty,y  ⟩.

Subtracting gives

4ℜ ⟨ Tx,y  ⟩= ⟨ T(x+y),x+y  ⟩−⟨ T(xy),xy  ⟩

 ≤ s( ⎪⎪ ⎪⎪ x+y ⎪⎪ ⎪⎪ 2 + ⎪⎪ ⎪⎪ x−y ⎪⎪ ⎪⎪ 2)

 = 2s( ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ 2 + ⎪⎪ ⎪⎪ y ⎪⎪ ⎪⎪ 2),

by the parallelogram identity.

Now, for xH such that Tx≠ 0, we put y=||Tx||−1||x|| Tx. Then ||y|| =||x|| and when we substitute into the previous inequality, we get

 4 ⎪⎪ ⎪⎪ Tx ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ =4ℜ⟨ Tx,y  ⟩  ≤ 4s ⎪⎪ ⎪⎪ x2 ⎪⎪ ⎪⎪ ,

So ||Tx||≤ s||x|| and it follows that ||T||≤ s, as required.

Definition ‍25 We say that U:HH is a unitary operator on a Hilbert space H if U*=U−1, i.e. U*U=UU*=I.
Example ‍26
1. If D:l2l2 is a diagonal operator such that D ekk ek, then D* ek=λk ek and D is unitary if and only if | λk |=1 for all k.
2. The ‍shift operator S satisfies S*S=I but SS*I thus S is not unitary.
Theorem ‍27 For an operator U on a complex Hilbert space H the following are equivalent:
1. U is unitary;
2. U is surjection and an isometry, i.e. ||Ux||=||x|| for all xH;
3. U is a surjection and preserves the inner product, i.e. Ux,Uy ⟩=⟨ x,y for all x, yH.

Proof. 12. Clearly unitarity of operator implies its invertibility and hence surjectivity. Also

 ⎪⎪ ⎪⎪ Ux ⎪⎪ ⎪⎪ 2=⟨ Ux,Ux  ⟩=⟨ x,U*Ux  ⟩=⟨ x,x  ⟩= ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ 2

23. Using the polarisation identity (cf. polarisation in equation ‍(12)):

4⟨ Tx,y  ⟩=    ⟨ T(x+y),x+y  ⟩+i⟨ T(x+iy),x+iy  ⟩
−⟨ T(xy),xy  ⟩−i⟨ T(xiy),xiy  ⟩.
=
 3 ∑ k=0
ik⟨ T(x+iky),x+iky  ⟩

Take T=U*U and T=I, then

4⟨ U*Ux,y  ⟩=
 3 ∑ k=0
ik⟨ U*U(x+iky),x+iky  ⟩
=
 3 ∑ k=0
ik⟨ U(x+iky),U(x+iky)  ⟩
=
 3 ∑ k=0
ik⟨ (x+iky),(x+iky)  ⟩
=4⟨ x,y  ⟩.

31. Indeed ⟨ U*U x,y ⟩=⟨ x,y ⟩ implies ⟨ (U*UI)x,y ⟩=0 for all x,yH, then U*U=I. Since U is surjective, for any yH there is xH such that y=Ux. Then, using the already established fact U*U=I we get

 UU* y = UU*(Ux) =  U(U*U)x = Ux= y.

Thus we have UU*=I as well and U is unitary.

Definition ‍28 A normal operator T is one for which T*T=TT*.
Example ‍29
1. Any self-adjoint operator T is normal, since T*=T.
2. Any unitary operator U is normal, since U*U=I=UU*.
3. Any diagonal operator D is normal , since D ekk ek, D* ek=λk ek, and DD*ek=D*D ek=| λk |2 ek.
4. The shift operator S is not normal.
5. A finite matrix is normal (as an operator on l2n) if and only if it has an orthonormal basis in which it is diagonal.
Remark ‍30 Theorems ‍24 and ‍2 draw similarity between those types of operators and multiplications by complex numbers. Indeed Theorem ‍24 said that an operator which significantly change direction of vectors (“rotates”) cannot be Hermitian, just like a multiplication by a real number scales but do not rotate. On the other hand Theorem ‍2 says that unitary operator just rotate vectors but do not scale, as a multiplication by an unimodular complex number. We will see further such connections in Theorem ‍17.

## 7 Spectral Theory

Beware of ghosts2 in this area!

As we saw operators could be added and multiplied each other, in some sense they behave like numbers, but are much more complicated. In this lecture we will associate to each operator a set of complex numbers which reflects certain (unfortunately not all) properties of this operator.

The analogy between operators and numbers become even more deeper since we could construct functions of operators (called functional calculus) in a way we build numeric functions. The most important functions of this sort is called resolvent (see Definition ‍5). The methods of analytical functions are very powerful in operator theory and students may wish to refresh their knowledge of complex analysis before this part.

### 7.1 The spectrum of an operator on a Hilbert space

An eigenvalue of operator TB(H) is a complex number λ such that there exists a nonzero xH, called eigenvector with property Txx, in other words x∈ker(T−λ I).

In finite dimensions T−λ I is invertible if and only if λ is not an eigenvalue. In infinite dimensions it is not the same: the ‍right shift operator S is not invertible but 0 is not its eigenvalue because Sx=0 implies x=0 (check!).

Definition ‍1 The resolvent set ρ(T) of an operator T is the set
 ρ (T)={λ∈ℂ: T−λ I  is invertible}.
The spectrum of operator TB(H), denoted σ(T), is the complement of the resolvent set ρ(T):
 σ(T)={λ∈ℂ: T−λ I  is not invertible}.
Example ‍2 If H is finite dimensional the from previous discussion follows that σ(T) is the set of eigenvalues of T for any T.

Even this example demonstrates that spectrum does not provide a complete description for operator even in finite-dimensional case. For example, both operators in 2 given by matrices (

 0 0 0 0

) and (

 0 0 1 0

) have a single point spectrum {0}, however are rather different. The situation became even worst in the infinite dimensional spaces.

Theorem ‍3 The spectrum σ(T) of a bounded operator T is a nonempty compact (i.e. closed and bounded) subset of .

For the proof we will need several Lemmas.

Lemma ‍4 Let AB(H). If ||A||<1 then IA is invertible in B(H) and inverse is given by the Neumann series (C. ‍Neumann, 1877):
(IA)−1=I+A+A2+A3+…=
 ∞ ∑ k=0
Ak. (36)

Proof. Define the sequence of operators Bn=I+A+⋯+AN—the partial sums of the infinite series ‍(36). It is a Cauchy sequence, indeed:

 ⎪⎪ ⎪⎪ Bn−Bm ⎪⎪ ⎪⎪
=
 ⎪⎪ ⎪⎪ Am+1+Am+2+⋯+An ⎪⎪ ⎪⎪ (if  n

 ⎪⎪ ⎪⎪ Am+1 ⎪⎪ ⎪⎪ + ⎪⎪ ⎪⎪ Am+2 ⎪⎪ ⎪⎪ +⋯+ ⎪⎪ ⎪⎪ An ⎪⎪ ⎪⎪

 ⎪⎪ ⎪⎪ A ⎪⎪ ⎪⎪ m+1+ ⎪⎪ ⎪⎪ A ⎪⎪ ⎪⎪ m+2+⋯+ ⎪⎪ ⎪⎪ A ⎪⎪ ⎪⎪ n

 ⎪⎪ ⎪⎪ A ⎪⎪ ⎪⎪ m+1
 1− ⎪⎪ ⎪⎪ A ⎪⎪ ⎪⎪
<є

for a large m. By the completeness of B(H) there is a limit, say B, of the sequence Bn. It is a simple algebra to check that (IA)Bn=Bn(IA)=IAn+1, passing to the limit in the norm topology, where An+1→ 0 and BnB we get:

 (I−A)B=B(I−A)=I  ⇔  B=(I−A)−1.

Definition ‍5 The resolventof an operator T is the operator valued function defined on the resolvent set by the formula:
 R(λ,T)=(T−λ I)−1.         (37)
Corollary ‍6
1. If | λ |>||T|| then λ∈ ρ(T), hence the spectrum is bounded.
2. The resolvent set ρ(T) is open, i.e for any λ ∈ ρ(T) then there exist є>0 such that all µ with | λ−µ |<є are also in ρ(T), i.e. the resolvent set is open and the spectrum is closed.
Both statements together imply that the spectrum is compact.

Proof.

1. If | λ |>||T|| then ||λ−1T||<1 and the operator T−λ I=−λ(I−λ−1T) has the inverse
R(λ,T)= (T−λ I)−1=−
 ∞ ∑ k=0
λk−1Tk.       (38)
by the previous Lemma.
2. Indeed:
 T−µ I = T−λ I + (λ−µ)I = (T−λ I)(I+(λ−µ)(T−λ I)−1).
The last line is an invertible operator because T−λ I is invertible by the assumption and I+(λ−µ)(T−λ I)−1 is invertible by the previous Lemma, since ||(λ−µ)(T−λ I)−1||<1 if є<||(T−λ I)−1||.

Exercise ‍7
1. Prove the first resolvent identity:
 R(λ,T)−R(µ,T)=(λ−µ)R(λ,T)R(µ,T) (39)
2. Use the identity ‍(39) to show that (T−µ I)−1→ (T−λ I)−1 as µ→ λ.
3. Use the identity ‍(39) to show that for z∈ρ(t) the complex derivative d/dz R(z,T) of the resolvent R(z,T) is well defined, i.e. the resolvent is an analytic function operator valued function of z.
Lemma ‍8 The spectrum is non-empty.

Proof. Let us assume the opposite, σ(T)=∅ then the resolvent function R(λ,T) is well defined for all λ∈ℂ. As could be seen from the von ‍Neumann series ‍(38) ||R(λ,T)||→ 0 as λ→ ∞. Thus for any vectors x, yH the function f(λ)=⟨ R(λ,T)x,y) ⟩ is analytic (see Exercise ‍3) function tensing to zero at infinity. Then by the Liouville theorem from complex analysis R(λ,T)=0, which is impossible. Thus the spectrum is not empty.

Proof.[Proof of Theorem ‍3] Spectrum is nonempty by Lemma ‍8 and compact by Corollary ‍6.

Remark ‍9 Theorem ‍3 gives the maximal possible description of the spectrum, indeed any non-empty compact set could be a spectrum for some bounded operator, see Problem ‍23.

### 7.2 The spectral radius formula

The following definition is of interest.

Definition ‍10 The spectral radius of T is
 r(T)=sup{ ⎪ ⎪ λ ⎪ ⎪ : λ∈ σ(T)}.

From the Lemma ‍1 immediately follows that r(T)≤||T||. The more accurate estimation is given by the following theorem.

Theorem ‍11 For a bounded operator T we have
r(T)=
 lim n→∞
⎪⎪
⎪⎪
Tn⎪⎪
⎪⎪
1/n. (40)

We start from the following general lemma:

Lemma ‍12 Let a sequence (an) of positive real numbers satisfies inequalities: 0≤ am+nam+an for all m and n. Then there is a limit limn→∞(an/n) and its equal to infn(an/n).

Proof. The statements follows from the observation that for any n and m=nk+l with 0≤ ln we have amkan+la1 thus, for big m we got am/man/n +la1/man/n+є.

Proof.[Proof of Theorem ‍11] The existence of the limit limn→∞||Tn||1/n in ‍(40) follows from the previous Lemma since by the Lemma ‍12 log||Tn+m||≤ log||Tn||+log||Tm||. Now we are using some results from the complex analysis. The Laurent series for the resolvent R(λ,T) in the neighbourhood of infinity is given by the von ‍Neumann series ‍(38). The radius of its convergence (which is equal, obviously, to r(T)) by the Hadamard theorem is exactly limn→∞||Tn||1/n.

Corollary ‍13 There exists λ∈σ(T) such that | λ |=r(T).

Proof. Indeed, as its known from the complex analysis the boundary of the convergence circle of a Laurent (or Taylor) series contain a singular point, the singular point of the resolvent is obviously belongs to the spectrum.

Example ‍14 Let us consider the left shift operator S*, for any λ∈ℂ such that | λ | <1 the vector (1,λ,λ23,…) is in l2 and is an eigenvector of S* with eigenvalue λ, so the open unit disk | λ |<1 belongs to σ(S*). On the other hand spectrum of S* belongs to the closed unit disk | λ |≤ 1 since r(S*)≤ ||S*||=1. Because spectrum is closed it should coincide with the closed unit disk, since the open unit disk is dense in it. Particularly 1∈σ(S*), but it is easy to see that 1 is not an eigenvalue of S*.
Proposition ‍15 For any TB(H) the spectrum of the adjoint operator is σ(T*)={λ: λ∈ σ(T)}.

Proof. If (T−λ I)V=V(T−λ I)=I the by taking adjoints V*(T*λI)=(T*λI)V*=I. So λ ∈ ρ(T) implies λ∈ρ(T*), using the property T**=T we could invert the implication and get the statement of proposition.

Example ‍16 In continuation of Example ‍14 using the previous Proposition we conclude that σ(S) is also the closed unit disk, but S does not have eigenvalues at all!

### 7.3 Spectrum of Special Operators

Theorem ‍17
1. If U is a unitary operator then σ(U)⊆ {| z |=1}.
2. If T is Hermitian then σ(T)⊆ ℝ.

Proof.

1. If | λ |>1 then ||λ−1U||<1 and then λ IU=λ(I−λ−1U) is invertible, thus λ∉σ(U). If | λ |<1 then ||λ U*||<1 and then λ IU=UU*I) is invertible, thus λ∉σ(U). The remaining set is exactly {z:| z |=1}.
2. Without lost of generality we could assume that ||T||<1, otherwise we could multiply T by a small real scalar. Let us consider the Cayley transform which maps real axis to the unit circle:
 U=(T−iI)(T+iI)−1.
Straightforward calculations show that U is unitary if T is Hermitian. Let us take λ∉ℝ and λ≠ −i (this case could be checked directly by Lemma ‍4). Then the Cayley transform µ=(λ−i)(λ+i)−1 of λ is not on the unit circle and thus the operator
 U−µ I=(T−iI)(T+iI)−1−(λ−i)(λ+i)−1I= 2i(λ+i)−1(T−λ I)(T+iI)−1,
is invertible, which implies invertibility of T−λ I. So λ∉ℝ.

The above reduction of a self-adjoint operator to a unitary one (it can be done on the opposite direction as well!) is an important tool which can be applied in other questions as well, e.g. in the following exercise.

Exercise ‍18
1. Show that an operator U: f(t) ↦ eitf(t) on L2[0,2π] is unitary and has the entire unit circle {| z |=1} as its spectrum .
2. Find a self-adjoint operator T with the entire real line as its spectrum.

## 8 Compactness

It is not easy to study linear operators “in general” and there are many questions about operators in Hilbert spaces raised many decades ago which are still unanswered. Therefore it is reasonable to single out classes of operators which have (relatively) simple properties. Such a class of operators more closed to finite dimensional ones will be studied here.

These operators are so compact that we even can fit them in our course

### 8.1 Compact operators

Let us recall some topological definition and results.

Definition ‍1 A compact set in a metric space is defined by the property that any its covering by a family of open sets contains a subcovering by a finite subfamily.

In the finite dimensional vector spaces ℝn or ℂn there is the following equivalent definition of compactness (equivalence of ‍1 and ‍2 is known as Heine–Borel theorem):

Theorem ‍2 If a set E in n or n has any of the following properties then it has other two as well:
1. E is bounded and closed;
2. E is compact;
3. Any infinite subset of E has a limiting point belonging to E.
Exercise* ‍3 Which equivalences from above are not true any more in the infinite dimensional spaces?
Definition ‍4 Let X and Y be normed spaces, TB(X,Y) is a finite rank operator if Im T is a finite dimensional subspace of Y. T is a compact operator if whenever (xi)1 is a bounded sequence in X then its image (T xi)1 has a convergent subsequence in Y.

The set of finite rank operators is denote by F(X,Y) and the set of compact operators—by K(X,Y)

Exercise ‍5 Show that both F(X,Y) and K(X,Y) are linear subspaces of B(X,Y).

We intend to show that F(X,Y)⊂K(X,Y).

Lemma ‍6 Let Z be a finite-dimensional normed space. Then there is a number N and a mapping S: l2NZ which is invertible and such that S and S−1 are bounded.

Proof. The proof is given by an explicit construction. Let N=dimZ and z1, z2, …, zN be a basis in Z. Let us define

S: l2N → Z    by     S(a1,a2,…,aN)=
 N ∑ k=1
akzk,

then we have an estimation of norm:

 ⎪⎪ ⎪⎪ Sa ⎪⎪ ⎪⎪
=
⎪⎪
⎪⎪
⎪⎪
⎪⎪
 N ∑ k=1
akzk⎪⎪
⎪⎪
⎪⎪
⎪⎪
≤
 N ∑ k=1

ak
⎪⎪
⎪⎪
zk⎪⎪
⎪⎪

 N ∑ k=1

ak
2

 1/2

 N ∑ k=1
⎪⎪
⎪⎪
zk⎪⎪
⎪⎪
2

 1/2
.

So ||S||≤ (∑1N ||zk||2)1/2 and S is continuous.

Clearly S has the trivial kernel, particularly ||Sa||>0 if ||a||=1. By the Heine–Borel theorem the unit sphere in l2N is compact, consequently the continuous function a↦ ||∑1N ak zk|| attains its lower bound, which has to be positive. This means there exists δ>0 such that ||a||=1 implies ||Sa||>δ , or, equivalently if ||z||<δ then ||S−1 z||<1. The later means that ||S−1||≤ δ−1 and boundedness of S−1.

Corollary ‍7 For any two metric spaces X and Y we have F(X,Y)⊂ K(X,Y).

Proof. Let TF(X,Y), if (xn)1 is a bounded sequence in X then ((Txn)1Z=Im T is also bounded. Let S: l2NZ be a map constructed in the above Lemma. The sequence (S−1T xn)1 is bounded in l2N and thus has a limiting point, say a0. Then Sa0 is a limiting point of (T xn)1.

There is a simple condition which allows to determine which diagonal operators are compact (particularly the identity operator IX is not compact if dimX =∞):

Proposition ‍8 Let T is a diagonal operator and given by identities T enn en for all n in a basis en. T is compact if and only if λn→ 0.

Proof. If λn↛0 then there exists a subsequence λnk and δ>0 such that | λnk |>δ for all k. Now the sequence (enk) is bounded but its image T enknk enk has no convergent subsequence because for any kl:

⎪⎪
⎪⎪
λ nkenk−λ nlenl⎪⎪
⎪⎪
=  (
λ nk
2 +
λ nl
2)1/2≥
 2
δ ,

i.e. T enk is not a Cauchy sequence, see Figure ‍16. For the converse, note that if λn→ 0 then we can define a finite rank operator Tm, m≥ 1—m-“truncation” of T by:

Tmen =

 Ten=λn en, 1≤ n≤ m; 0 , n>m.
(41)

Then obviously

(TTm) en =

 0, 1≤ n≤ m; λn en , n>m,

and ||TTm||=supn>m| λn |→ 0 if m→ ∞. All Tm are finite rank operators (so are compact) and T is also compact as their limit—by the next Theorem.

Theorem ‍9 Let Tm be a sequence of compact operators convergent to an operator T in the norm topology (i.e. ||TTm||→ 0) then T is compact itself. Equivalently K(X,Y) is a closed subspace of B(X,Y).

 T1x1(1) T1x2(1) T1x3(1) … T1xn(1) … → a1 T2x1(2) T2x2(2) T2x3(2) … T2xn(2) … → a2 T3x1(3) T3x2(3) T3x3(3) … T3xn(3) … → a3 … … … … … … Tnx1(n) Tnx2(n) Tnx3(n) … Tnxn(n) … → an … … … … … … ↓ ↘ a
 Table 2: The “diagonal argument”.

Proof. Take a bounded sequence (xn)1. From compactness

 of T1 ⇒ ∃ subsequence (xn(1))1∞ of (xn)1∞ s.t. (T1xn(1))1∞ is convergent. of T2 ⇒ ∃ subsequence (xn(2))1∞ of (xn(1))1∞ s.t. (T2xn(2))1∞ is convergent. of T3 ⇒ ∃ subsequence (xn(3))1∞ of (xn(2))1∞ s.t. (T3xn(3))1∞ is convergent. … … … … …

Could we find a subsequence which converges for all Tm simultaneously? The first guess “take the intersection of all above sequences (xn(k))1” does not work because the intersection could be empty. The way out is provided by the diagonal argument (see Table ‍2): a subsequence (Tm xk(k))1 is convergent for all m, because at latest after the term xm(m) it is a subsequence of (xk(m))1.

We are claiming that a subsequence (T xk(k))1 of (T xn)1 is convergent as well. We use here є/3 argument (see Figure ‍17): for a given є>0 choose p∈ℕ such that ||TTp||<є/3. Because (Tp xk(k))→ 0 it is a Cauchy sequence, thus there exists n0>p such that ||Tp xk(k)Tp xl(l)||< є/3 for all k, l>n0. Then:

 ⎪⎪ ⎪⎪ T xk(k)−T xl(l) ⎪⎪ ⎪⎪
=
 ⎪⎪ ⎪⎪ (T xk(k)−Tp xk(k))+(Tp xk(k)−Tp xl(l))+(Tp xl(l)−T xl(l)) ⎪⎪ ⎪⎪

 ⎪⎪ ⎪⎪ T xk(k)−Tp xk(k) ⎪⎪ ⎪⎪ + ⎪⎪ ⎪⎪ Tp xk(k)−Tp xl(l) ⎪⎪ ⎪⎪ + ⎪⎪ ⎪⎪ Tp xl(l)−T xl(l) ⎪⎪ ⎪⎪
є

Thus T is compact.

### 8.2 Hilbert–Schmidt operators

Definition ‍10 Let T: HK be a bounded linear map between two Hilbert spaces. Then T is said to be Hilbert–Schmidt operator if there exists an orthonormal basis in H such that the series k=1||T ek||2 is convergent.
Example ‍11
1. Let T: l2l2 be a diagonal operator defined by Ten=en/n, for all n≥ 1. Then ∑ ||Ten||2=∑n−22/6 (see Example ‍15) is finite.
2. The identity operator IH is not a Hilbert–Schmidt operator, unless H is finite dimensional.

A relation to compact operator is as follows.

Theorem ‍12 All Hilbert–Schmidt operators are compact. (The opposite inclusion is false, give a counterexample!)

Proof. Let TB(H,K) have a convergent series ∑ ||T en||2 in an orthonormal basis (en)1 of H. We again (see ‍(41)) define the m-truncation of T by the formula

Tmen =

 Ten, 1≤ n≤ m; 0 , n>m.
(42)

Then Tm(∑1ak ek)=∑1m ak ek and each Tm is a finite rank operator because its image is spanned by the finite set of vectors Te1, …, Ten. We claim that ||TTm||→ 0. Indeed by linearity and definition of Tm:

(TTm)

 ∞ ∑ n=1
anen

=
 ∞ ∑ n=m+1
an (Ten).

Thus:

⎪⎪
⎪⎪
⎪⎪
⎪⎪
(TTm)

 ∞ ∑ n=1
anen

⎪⎪
⎪⎪
⎪⎪
⎪⎪
=
⎪⎪
⎪⎪
⎪⎪
⎪⎪
 ∞ ∑ n=m+1
an (Ten)⎪⎪
⎪⎪
⎪⎪
⎪⎪
‍
(43)

 ∞ ∑ n=m+1

an
⎪⎪
⎪⎪
(Ten)⎪⎪
⎪⎪

 ∞ ∑ n=m+1

an
2

 1/2

 ∞ ∑ n=m+1
⎪⎪
⎪⎪
(Ten)⎪⎪
⎪⎪
2

 1/2

⎪⎪
⎪⎪
⎪⎪
⎪⎪
 ∞ ∑ n=1
anen⎪⎪
⎪⎪
⎪⎪
⎪⎪

 ∞ ∑ n=m+1
⎪⎪
⎪⎪
(Ten)⎪⎪
⎪⎪
2

 1/2
‍
(44)

so ||TTm||→ 0 and by the previous Theorem T is compact as a limit of compact operators.

Corollary ‍13 ‍(from the above proof) For a Hilbert–Schmidt operator
⎪⎪
⎪⎪
T⎪⎪
⎪⎪

 ∞ ∑ n=m+1
⎪⎪
⎪⎪
(Ten)⎪⎪
⎪⎪
2

 1/2
.

Proof. Just consider difference of T and T0=0 in (43)–(44).

Example ‍14 An integral operator T on L2[0,1] is defined by the formula:
(Tf)(x)=
 1 ∫ 0
K(x,y)f(y) dy,   f(y)∈L2[0,1], (45)
where the continuous on [0,1]×[0,1] function K is called the kernel of integral operator.
Theorem ‍15 Integral operator ‍(45) is Hilbert–Schmidt.

Proof. Let (en)−∞ be an orthonormal basis of L2[0,1], e.g. (ei nt)n∈ℤ. Let us consider the kernel Kx(y)=K(x,y) as a function of the argument y depending from the parameter x. Then:

(Ten)(x)=
 1 ∫ 0
K(x,y)en(y) dy=
 1 ∫ 0
Kx(y)en(y) dy= ⟨ Kxn  ⟩.

So ||T en||2= ∫01| ⟨ Kxn ⟩ |2dx. Consequently:

 ∞ ∑ −∞
⎪⎪
⎪⎪
Ten⎪⎪
⎪⎪
2
=

 ∞ ∑ −∞
 1 ∫ 0

⟨ Kxn  ⟩
2dx

=

 1 ∫ 0
 ∞ ∑ 1

⟨ Kxn  ⟩
2dx ‍
(46)
=

 1 ∫ 0
⎪⎪
⎪⎪
Kx⎪⎪
⎪⎪
2dx

=

 1 ∫ 0
 1 ∫ 0

K(x,y)
2dxdy < ∞

Exercise ‍16 Justify the exchange of summation and integration in ‍(46).

Remark ‍17 The definition ‍14 and Theorem ‍15 work also for any T: L2[a,b] → L2[c,d] with a continuous kernel K(x,y) on [c,d]×[a,b].
Definition ‍18 Define Hilbert–Schmidt norm of a Hilbert–Schmidt operator A by ||A||HS2=∑n=1||Aen||2 (it is independent of the choice of orthonormal basis (en)1, see Question ‍27).
Exercise* ‍19 Show that set of Hilbert–Schmidt operators with the above norm is a Hilbert space and find the an expression for the inner product.
Example ‍20 Let K(x,y)=xy, then
(Tf)(x)=
 1 ∫ 0
(xy)f(y) dy =x
 1 ∫ 0
f(y) dy −
 1 ∫ 0
yf(y) dy
is a rank 2 operator. Furthermore:
 ⎪⎪ ⎪⎪ T ⎪⎪ ⎪⎪ HS2
=
 1 ∫ 0
 1 ∫ 0
(xy)2dxdy =
 1 ∫ 0

 (x−y)3 3

 1 x=0
dy
=
 1 ∫ 0
 (1−y)3 3
+
 y3 3
dy=

 (1−y)4 12
+
 y4 12

 1 0
=
 1 6
.
On the other hand there is an orthonormal basis such that
Tf=
1
 12
⟨ f,e1  ⟩e1
1
 12
⟨ f,e2  ⟩e2,
and ||T||=1/√12 and 12 ||Tek||2=1/6 and we get ||T||≤ ||T||HS in agreement with Corollary ‍13.

## 9 Compact normal operators

Recall from Section ‍6.5 that an operator T is normal if TT*=T*T; Hermitian (T*=T) and unitary (T*=T−1) operators are normal.

### 9.1 Spectrum of normal operators

Theorem ‍1 Let TB(H) be a normal operator then
1. kerT =kerT*, so ker(T−λ I) =ker (T*λI) for all λ∈ℂ
2. Eigenvectors corresponding to distinct eigenvalues are orthogonal.
3. ||T||=r(T).

Proof.

1. Obviously:
 x∈kerT ⇔ ⟨ Tx,Tx  ⟩=0 ⇔ ⟨ T*Tx,x  ⟩=0 ⇔ ⟨ TT*x,x  ⟩=0 ⇔  ⟨ T*x,T*x  ⟩=0 ⇔ x∈kerT*.
The second part holds because normalities of T and T−λ I are equivalent.
2. If Txx, Tyy then from the previous statement T* y =µy. If λ≠µ then the identity
 λ⟨ x,y  ⟩=⟨ Tx,y  ⟩ =⟨ x,T*y  ⟩=µ⟨ x,y  ⟩
implies ⟨ x,y ⟩=0.
3. Let S=T*T, then S is Hermitian (check!). Consequently, inequality
 ⎪⎪ ⎪⎪ Sx ⎪⎪ ⎪⎪ 2=⟨ Sx,Sx  ⟩=⟨ S2x,x  ⟩≤ ⎪⎪ ⎪⎪ S2 ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ 2
implies ||S||2≤ ||S2||. But the opposite inequality follows from the Theorem ‍12, thus we have the equality ||S2||=||S||2 and more generally by induction: ||S2m||=||S||2m for all m.

Now we claim ||S||=||T||2. From Theorem ‍12 and ‍18 we get ||S||=||T*T||≤ ||T||2. On the other hand if ||x||=1 then

 ⎪⎪ ⎪⎪ T*T ⎪⎪ ⎪⎪ ≥ ⎪ ⎪ ⟨ T*Tx,x  ⟩ ⎪ ⎪ =⟨ Tx,Tx  ⟩= ⎪⎪ ⎪⎪ Tx ⎪⎪ ⎪⎪ 2

implies the opposite inequality ||S||≥||T||2. Only now we use normality of T to obtain (T2m)*T2m=(T*T)2m and get the equality

 ⎪⎪ ⎪⎪ T2m ⎪⎪ ⎪⎪ 2= ⎪⎪ ⎪⎪ (T*T)2m ⎪⎪ ⎪⎪ = ⎪⎪ ⎪⎪ T*T ⎪⎪ ⎪⎪ 2m = ⎪⎪ ⎪⎪ T ⎪⎪ ⎪⎪ 2m+1.

Thus:

r(T)=
 lim m→∞
⎪⎪
⎪⎪
T2m⎪⎪
⎪⎪
1/2m=
 lim m→∞
⎪⎪
⎪⎪
T⎪⎪
⎪⎪
2m+1/2m+1 = ⎪⎪
⎪⎪
T⎪⎪
⎪⎪
.

by the spectral radius formula ‍(40).

Example ‍2 It is easy to see that normality is important in ‍3, indeed the non-normal operator T given by the matrix (
 0 1 0 0
) in has one-point spectrum {0}, consequently r(T)=0 but ||T||=1.
Lemma ‍3 Let T be a compact normal operator then
1. The set of of eigenvalues of T is either finite or a countable sequence tending to zero.
2. All the eigenspaces, i.e. ker(T−λ I), are finite-dimensional for all λ≠ 0.
Remark ‍4 This Lemma is true for any compact operator, but we will not use that in our course.

Proof.

1. Let H0 be the closed linear span of eigenvectors of T. Then T restricted to H0 is a diagonal compact operator with the same set of eigenvalues λn as in H. Then λn→ 0 from Proposition ‍8 .
Exercise ‍5 Use the proof of Proposition ‍8 to give a direct demonstration.

Proof.[Solution] Or straightforwardly assume opposite: there exist an δ>0 and infinitely many eigenvalues λn such that | λn |>δ. By the previous Theorem there is an orthonormal sequence vn of corresponding eigenvectors T vnn vn. Now the sequence (vn) is bounded but its image T vnn en has no convergent subsequence because for any kl:

⎪⎪
⎪⎪
λ kvk−λ lel⎪⎪
⎪⎪
=  (
λ k
2 +
λl
2)1/2≥
 2
δ ,

i.e. T enk is not a Cauchy sequence, see Figure ‍16.

2. Similarly if H0=ker(T−λ I) is infinite dimensional, then restriction of T on H0 is λ I—which is non-compact by Proposition ‍8. Alternatively consider the infinite orthonormal sequence (vn), Tvnvn as in Exercise ‍5.

Lemma ‍6 Let T be a compact normal operator. Then all non-zero points λ∈ σ(T) are eigenvalues and there exists an eigenvalue of modulus ||T||.

Proof. Assume without lost of generality that T≠ 0. Let λ∈σ(T), without lost of generality (multiplying by a scalar) λ=1.

We claim that if 1 is not an eigenvalue then there exist δ>0 such that

 ⎪⎪ ⎪⎪ (I−T)x ⎪⎪ ⎪⎪ ≥ δ ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ . (47)

Otherwise there exists a sequence of vectors (xn) with unit norm such that (IT)xn→ 0. Then from the compactness of T for a subsequence (xnk) there is yH such that Txnky, then xny implying Ty=y and y≠ 0—i.e. y is eigenvector with eigenvalue 1.

Now we claim Im (IT) is closed, i.e. yIm(IT) implies yIm(IT). Indeed, if (IT)xny, then there is a subsequence (xnk) such that Txnkz implying xnky+z, then (IT)(z+y)=y by continuity of IT.

Finally IT is injective, i.e ker(IT)={0}, by ‍(47). By the property ‍1, ker(IT*)={0} as well. But because always ker(IT*)=Im(IT) (by ‍2) we got surjectivity, i.e. Im(IT)={0}, of IT. Thus (IT)−1 exists and is bounded because ‍(47) implies ||y||>δ ||(IT)−1y||. Thus 1∉σ(T).

The existence of eigenvalue λ such that | λ |=||T|| follows from combination of Lemma ‍13 and Theorem ‍3.

### 9.2 Compact normal operators

Theorem ‍7 ‍(The spectral theorem for compact normal operators) Let T be a compact normal operator on a Hilbert space H. Then there exists an orthonormal sequence (en) of eigenvectors of T and corresponding eigenvalues n) such that:
Tx=
 ∑ n
λn ⟨ x,en  ⟩ en,     for all x∈ H. (48)
If n) is an infinite sequence it tends to zero.

Conversely, if T is given by a formula ‍(48) then it is compact and normal.

Proof. Suppose T≠ 0. Then by the previous Theorem there exists an eigenvalue λ1 such that | λ1 |=||T|| with corresponding eigenvector e1 of the unit norm. Let H1=Lin(e1). If xH1 then

 ⟨ Tx,e1  ⟩=⟨ x,T*e1  ⟩=⟨ x,λ1 e1  ⟩=λ1⟨ x,e1  ⟩=0, (49)

thus TxH1 and similarly T* xH1. Write T1=T|H1 which is again a normal compact operator with a norm does not exceeding ||T||. We could inductively repeat this procedure for T1 obtaining sequence of eigenvalues λ2, λ3, …with eigenvectors e2, e3, …. If Tn=0 for a finite n then theorem is already proved. Otherwise we have an infinite sequence λn→ 0. Let

x=
 n ∑ 1
⟨ x,ek  ⟩ek +yn  ⇒  ⎪⎪
⎪⎪
x⎪⎪
⎪⎪
2=
 n ∑ 1

⟨ x,ek  ⟩
2 +⎪⎪
⎪⎪
yn⎪⎪
⎪⎪
2 ,    yn∈ Hn,

from Pythagoras’s theorem. Then ||yn||≤ ||x|| and ||T yn||≤ ||Tn||||yn||≤ | λn |||x||→ 0 by Lemma ‍3. Thus

Tx =
 lim n→ ∞

 n ∑ 1
⟨ x,en  ⟩ Ten + Tyn

=
 ∞ ∑ 1
λn⟨ x,en  ⟩ en

Conversely, if T x = ∑1λnx,enen then

⟨ Tx,y  ⟩=
 ∞ ∑ 1
λn⟨ x,en  ⟩ ⟨ en,y  ⟩ =
 ∞ ∑ 1
⟨ x,en  ⟩ λn
 ⟨ y,en  ⟩
,

thus T* y = ∑1λny,enen. Then we got the normality of T: T*Tx=TT*x= ∑1| λn |2y,enen. Also T is compact because it is a uniform limit of the finite rank operators Tnx=∑1n λnx,enen.

Corollary ‍8 Let T be a compact normal operator on a separable Hilbert space H, then there exists a orthonormal basis gk such that
Tx=
 ∞ ∑ 1
λn⟨ x,gn  ⟩ gn,
and λn are eigenvalues of T including zeros.

Proof. Let (en) be the orthonormal sequence constructed in the proof of the previous Theorem. Then x is perpendicular to all en if and only if its in the kernel of T. Let (fn) be any orthonormal basis of kerT. Then the union of (en) and (fn) is the orthonormal basis (gn) we have looked for.

Exercise ‍9 Finish all details in the above proof.
Corollary ‍10 ‍(Singular value decomposition) If T is any compact operator on a separable Hilbert space then there exists orthonormal sequences (ek) and (fk) such that Tx=∑k µkx,ekfk where k) is a sequence of positive numbers such that µk→ 0 if it is an infinite sequence.

Proof. Operator T*T is compact and Hermitian (hence normal). From the previous Corollary there is an orthonormal basis (ek) such that T*T x= ∑n λnx,ekek for some positive λn=||T en||2. Let µn=||Ten|| and fn=Tenn. Then fn is an orthonormal sequence (check!) and

Tx=
 ∑ n
⟨ x,en  ⟩ Ten =
 ∑ n
⟨ x,en  ⟩ µnfn.

Corollary ‍11 A bounded operator in a Hilber space is compact if and only if it is a uniform limit of the finite rank operators.

Proof. Sufficiency follows from ‍9.
Necessity: by the previous Corollary Tx =∑nx,en ⟩ µn fn thus T is a uniform limit of operators Tm x=∑n=1mx,en ⟩ µn fn which are of finite rank.

## 10 Integral equations

In this lecture we will study the Fredholm equation defined as follows. Let the integral operator with a kernel K(x,y) defined on [a,b]×[a,b] be defined as before:

(Tφ)(x)=
 b ∫ a
K(x,y)φ(y) dy. (50)

The Fredholm equation of the first and second kinds correspondingly are:

 Tφ=f     and     φ −λ Tφ=f, (51)

for a function f on [a,b]. A special case is given by Volterra equation by an operator integral operator ‍(50) T with a kernel K(x,y)=0 for all y>x which could be written as:

(Tφ)(x)=
 x ∫ a
K(x,y)φ(y) dy. (52)

We will consider integral operators with kernels K such that ∫abab K(x,y) dxdy<∞, then by Theorem ‍15 T is a Hilbert–Schmidt operator and in particular bounded.

As a reason to study Fredholm operators we will mention that solutions of differential equations in mathematical physics (notably heat and wave equations) requires a decomposition of a function f as a linear combination of functions K(x,y) with “coefficients” φ. This is an continuous analog of a discrete decomposition into Fourier series.

Using ideas from the proof of Lemma ‍4 we define Neumann series for the resolvent:

 (I−λ T)−1=I+λ T + λ2T2+⋯, (53)

which is valid for all λ<||T||−1.

Example ‍1 Solve the Volterra equation
φ(x)−λ
 x ∫ 0
y φ(y) dy=x2,     on  L2[0,1].
In this case I−λ T φ = f, with f(x)=x2 and:
K(x,y)=

 y, 0≤ y ≤ x; 0, x< y ≤ 1.
Straightforward calculations shows:
(Tf)(x)=
 x ∫ 0
y· y2dy=
 x4 4
,
(T2f)(x)=
 x ∫ 0
y
 y4 4
dy=
 x6 24
, …
and generally by induction:
(Tnf)(x) =
 x ∫ 0
y
 y2n 2n−1n!
dy=
 x2n+2 2n(n+1)!
.
Hence:
φ(x)=
 ∞ ∑ 0
λnTnf =
 ∞ ∑ 0
 λnx2n+2 2n(n+1)!
=
 2 λ
 ∞ ∑ 0
 λn+1x2n+2 2n+1(n+1)!
=
 2 λ
(eλ x2/2−1)     for all  λ ∈ ℂ∖ {0},
because in this case r(T)=0. For the Fredholm equations this is not always the case, see Tutorial problem ‍29.

Among other integral operators there is an important subclass with separable kernel, namely a kernel which has a form:

K(x,y)=
 n ∑ j=1
gj(x)hj(y). (54)

In such a case:

(Tφ)(x)=
 b ∫ a
 n ∑ j=1
gj(x)hj(y)φ(y) dy
=
 n ∑ j=1
gj(x)
 b ∫ a
hj(y)φ(y) dy,

i.e. the image of T is spanned by g1(x), …, gn(x) and is finite dimensional, consequently the solution of such equation reduces to linear algebra.

Example ‍2 Solve the Fredholm equation (actually find eigenvectors of T):
φ(x)=
λ
 2π ∫ 0
cos(x+y)φ(y) dy
=
λ
 2π ∫ 0
(cosxcosy − sinx siny)φ(y) dy.
Clearly φ (x) should be a linear combination φ(x)=Acos x+Bsinx with coefficients A and B satisfying to:
A=
λ
 2π ∫ 0
cosy (Acosy+Bsiny) dy,
B=
−λ
 2π ∫ 0
siny (Acosy+Bsiny) dy.
Basic calculus implies A=λπ A and B=−λπ B and the only nonzero solutions are:
 λ=π−1 A ≠ 0 B = 0 λ=−π−1 A = 0 B ≠ 0

We develop some Hilbert–Schmidt theory for integral operators.

Theorem ‍3 Suppose that K(x,y) is a continuous function on [a,b]×[a,b] and K(x,y)=K(y,x) and operator T is defined by ‍(50). Then
1. T is a self-adjoint Hilbert–Schmidt operator.
2. All eigenvalues of T are real and satisfy n λn2<∞.
3. The eigenvectors vn of T can be chosen as an orthonormal basis of L2[a,b], are continuous for nonzero λn and
Tφ=
 ∞ ∑ n=1
λn ⟨ φ,vn  ⟩vn    where    φ=
 ∞ ∑ n=1
⟨ φ,vn  ⟩vn

Proof.

1. The condition K(x,y)=K(y,x) implies the Hermitian property of T:
⟨ Tφ,ψ  ⟩=
 b ∫ a

 b ∫ a
K(x,y)φ(y) dy

ψ(x) dx
=
 b ∫ a
 b ∫ a
K(x,y)φ(y) ψ(x) dxdy
=
 b ∫ a
φ(y)

 b ∫ a
 K(y,x) ψ(x)
dx

dy
=⟨ φ,Tψ  ⟩.
The Hilbert–Schmidt property (and hence compactness) was proved in Theorem ‍15.
2. Spectrum of T is real as for any Hermitian operator, see Theorem ‍2 and finiteness of ∑n λn2 follows from Hilbert–Schmidt property
3. The existence of orthonormal basis consisting from eigenvectors (vn) of T was proved in Corollary ‍8. If λn≠ 0 then:
vn(x1)−vn(x2)=λn−1((Tvn)(x1)−(Tvn)(x2))
=
 1 λn
 b ∫ a
(K(x1,y)−K(x2,y))vn(y) dy
and by Cauchy–Schwarz-Bunyakovskii inequality:

vn(x1)−vn(x2)
≤
1
 ⎪ ⎪ λn ⎪ ⎪
⎪⎪
⎪⎪
vn⎪⎪
⎪⎪
2
 b ∫ a

K(x1,y)−K(x2,y)
dy
which tense to 0 due to (uniform) continuity of K(x,y).

Theorem ‍4 Let T be as in the previous Theorem. Then if λ≠ 0 and λ−1∉σ(T), the unique solution φ of the Fredholm equation of the second kind φ−λ T φ=f is
φ=
 ∞ ∑ 1
 ⟨ f,vn  ⟩ 1−λ λn
vn. (55)

Proof. Let φ=∑1an vn where an=⟨ φ,vn ⟩, then

φ−λ Tφ=
 ∞ ∑ 1
an(1−λ λn) vn =f=
 ∞ ∑ 1
⟨ f,vn  ⟩vn

if and only if an=⟨ f,vn ⟩/(1−λ λn) for all n. Note 1−λ λn≠ 0 since λ−1∉σ(T).

Because λn→ 0 we got ∑1| an |2 by its comparison with ∑1| ⟨ f,vn ⟩ |2=||f||2, thus the solution exists and is unique by the Riesz–Fisher Theorem.

See Exercise ‍30 for an example.

Theorem ‍5 ‍(Fredholm alternative) Let TK(H) be compact normal and λ∈ℂ∖ {0}. Consider the equations:

 ‍ φ−λ Tφ = 0 (56) ‍ φ−λ Tφ = f (57)
then either
1. the only solution to ‍(56) is φ=0 and (57) has a unique solution for any fH; or
2. there exists a nonzero solution to ‍(56) and (57) can be solved if and only if f is orthogonal all solutions to ‍(56).

Proof.

1. If φ=0 is the only solution of ‍(56), then λ−1 is not an eigenvalue of T and then by Lemma ‍6 is neither in spectrum of T. Thus I−λ T is invertible and the unique solution of ‍(57) is given by φ=(I−λ T)−1 f.
2. A nonzero solution to ‍(56) means that λ−1∈σ(T). Let (vn) be an orthonormal basis of eigenvectors of T for eigenvalues (λn). By Lemma ‍2 only a finite number of λn is equal to λ−1, say they are λ1, …, λN, then
(I−λ T)φ=
 ∞ ∑ n=1
(1−λ λn)⟨ φ,vn  ⟩vn =
 ∞ ∑ n=N+1
(1−λ λn)⟨ φ,vn  ⟩vn.
If f=∑1f,vnvn then the identity (I−λ T)φ=f is only possible if ⟨ f,vn ⟩=0 for 1≤ nN. Conversely from that condition we could give a solution
φ=
 ∞ ∑ n=N+1
 ⟨ f,vn  ⟩ 1−λ λn
vn +φ0,     for any  φ0Lin(v1,…,vN),
which is again in H because fH and λn→ 0.

Example ‍6 Let us consider
(Tφ)(x)=
 1 ∫ 0
(2xyxy+1)φ(y) dy.
Because the kernel of T is real and symmetric T=T*, the kernel is also separable:
(Tφ)(x)=x
 1 ∫ 0
(2y−1)φ(y) dy+
 1 ∫ 0
(−y+1)φ(y) dy,
and T of the rank 2 with image of T spanned by 1 and x. By direct calculations:

T:1
 1 2
T:x
 1 6
x +
 1 6
,
or T is given by the matrix

 1 2
 1 6
0
 1 6

According to linear algebra decomposition over eigenvectors is:
λ1=
 1 2
with vector

 1 0

,
λ2=
 1 6
with vector

−
 1 2
1

with normalisation v1(y)=1, v2(y)=√12(y−1/2) and we complete it to an orthonormal basis (vn) of L2[0,1]. Then
• If λ≠ 2 or 6 then (I−λ T)φ = f has a unique solution (cf. equation ‍(55)):
φ=
 2 ∑ n=1
 ⟨ f,vn  ⟩ 1−λ λn
vn  +  ∞ ∑ n=3
⟨ f,vn  ⟩ vn
=
 2 ∑ n=1
 ⟨ f,vn  ⟩ 1−λ λn
vn  +

f 2 ∑ n=1
⟨ f,vn  ⟩ vn)

=
f+ 2 ∑ n=1
 λλn 1−λ λn
⟨ f,vn  ⟩vn.
• If λ=2 then the solutions exist provided f,v1 ⟩=0 and are:
φ=f+  λλ2 1−λ λ2
⟨ f,v2  ⟩v2+Cv1=f+ 1 2
⟨ f,v2  ⟩v2+Cv1,    C∈ℂ.
• If λ=6 then the solutions exist provided f,v2 ⟩=0 and are:
φ=f+ λλ1 1−λ λ1
⟨ f,v1  ⟩v1+Cv2=f 3 2
⟨ f,v2  ⟩v2+Cv2,    C∈ℂ.

## 11 Banach and Normed Spaces

We will work with either the field of real numbers ℝ or the complex numbers ℂ. To avoid repetition, we use K to denote either ℝ or ℂ.

### 11.1 Normed spaces

Recall, see Defn. ‍3, a norm on a vector space V is a map ||·||:V→[0,∞) such that

1. ||u||=0 only when u=0;
2. ||λ u|| = | λ | ||u|| for λ∈K and uV;
3. ||u+v|| ≤ ||u|| + ||v|| for u,vV.

Note, that the second and third conditions imply that linear operations—multiplication by a scalar and addition of vectors respectively—are continuous in the topology defined by the norm.

A norm induces a metric, see Defn. ‍1, on V by setting d(u,v)=||uv||. When V is complete, see Defn. ‍6, for this metric, we say that V is a Banach space.

Theorem ‍1 Every finite-dimensional normed vector space is a Banach space.

We will use the following simple inequality:

Lemma ‍2 ‍(Young’s inequality) Let two real numbers 1<p,q<∞ are related through 1/p+1/q=1 then

ab
≤
 ⎪ ⎪ a ⎪ ⎪ p
p
+
 ⎪ ⎪ b ⎪ ⎪ q
q
, (58)
for any complex a and b.

Proof.[First proof: analytic] Obviously, it is enough to prove inequality for positive reals a=| a | and b=| b |. If p>1 then 0<1/p < 1. Consider the function φ(t)=tmmt for an 0<m<1. From its derivative φ(t)=m(tm−1−1) we find the only critical point t=1 on [0,∞), which is its maximum for m=1/p<1. Thus write the inequality φ(t)≤ φ(1) for t=ap/bq and m=1/p. After a transformation we get a· bq/p−1≤ 1/p(apbq−1) and multiplication by bq with rearrangements lead to the desired result.

Proof.[Second proof: geometric] Consider the plane with coordinates (x,y) and take the curve y=xp−1 which is the same as x=yq−1. Comparing areas on the figure:

we see that S1+S2ab for any positive reals a and b. Elementary integration shows:

S1=
 a ∫ 0
xp−1dx=
 ap p
,    S2=
 b ∫ 0
yq−1dy=
 bq q
.

This finishes the demonstration.

Remark ‍3 You may notice, that the both proofs introduced some specific auxiliary functions related to xp/p. It is a fruitful generalisation to conduct the proofs for more functions and derive respective forms of Young’s inequality.
Proposition ‍4 ‍(Hölder’s Inequality) For 1<p<∞, let q∈(1,∞) be such that 1/p + 1/q = 1. For n≥1 and u,v∈Kn, we have that

 n ∑ j=1

ujvj
≤

 n ∑ j=1

uj
p

 1 p

 n ∑ j=1

vj
q

 1 q

.

Proof. For reasons become clear soon we use the notation ||u||p=( ∑j=1n | uj |p )1/p and ||v||q= ( ∑j=1n | vj |q )1/q and define for 1≤ in:

ai=
ui
 ⎪⎪ ⎪⎪ u ⎪⎪ ⎪⎪ p
and       bi=
vi
 ⎪⎪ ⎪⎪ v ⎪⎪ ⎪⎪ q
.

Summing up for 1≤ in all inequalities obtained from ‍(58):

aibi
≤
 ⎪ ⎪ ai ⎪ ⎪ p
p
+
 ⎪ ⎪ bi ⎪ ⎪ q
q
,

we get the result.

Using Hölder inequality we can derive the following one:

Proposition ‍5 ‍(Minkowski’s Inequality) For 1<p<∞, and n≥ 1, let u,v∈Kn. Then

 n ∑ j=1

uj+vj
p

 1/p
≤

 n ∑ j=1

uj
p

 1/p
+

 n ∑ j=1

vj
p

 1/p
.

Proof. For p>1 we have:

 n ∑ 1

uk+vk
p =
 n ∑ 1

uk

uk+vk
p−1  +
 n ∑ 1

vk

uk+vk
p−1. (59)

By Hölder inequality

 n ∑ 1

uk

uk+vk
p−1 ≤

 n ∑ 1

uk
p

 1 p

 n ∑ 1

uk+vk
q(p−1)

 1 q

.

Adding a similar inequality for the second term in the right hand side of ‍(59) and division by (∑1n | uk+vk |q(p−1))1/q yields the result.

Minkowski’s inequality shows that for 1≤ p<∞ (the case p=1 is easy) we can define a norm ||·||p on Kn by

⎪⎪
⎪⎪
u⎪⎪
⎪⎪
p =

 n ∑ j=1

uj
p

 1/p
( u =(u1,⋯,un)∈Kn ).

See, Figure ‍2 for illustration of various norms of this type defined in ℝ2.

We can define an infinite analogue of this. Let 1≤ p<∞, let lp be the space of all scalar sequences (xn) with ∑n | xn |p < ∞. A careful use of Minkowski’s inequality shows that lp is a vector space. Then lp becomes a normed space for the ||·||p norm. Note also, that l2 is the Hilbert space introduced before in Example ‍2.

Recall that a Cauchy sequence, see Defn. ‍5, in a normed space is bounded: if (xn) is Cauchy then we can find N with ||xnxm||<1 for all n,mN. Then ||xn|| ≤ ||xnxN|| + ||xN|| < ||xN||+1 for nN, so in particular, ||xn|| ≤ max( ||x1||,||x2||,⋯,||xN−1||,||xN||+1).

Theorem ‍6 For 1≤ p<∞, the space lp is a Banach space.
Remark ‍7 Most completeness proofs (in particular, all completeness proof in this course) are similar to the next one, see also Thm. ‍24. The general scheme of those proofs has three steps:
1. For a general Cauchy sequence we build “limit” in some point-wise sense.
2. At this stage it is not clear either the constructed “limit” is at our space at all, that is shown on the second step.
3. From the construction it does not follows that the “limit” is really the limit in the topology of our space, that is the third step of the proof.

Proof. We repeat the proof of Thm. ‍24 changing 2 to p. Let (x(n)) be a Cauchy-sequence in lp; we wish to show this converges to some vector in lp.

For each n, x(n)lp so is a sequence of scalars, say (xk(n))k=1. As (x(n)) is Cauchy, for each є>0 there exists Nє so that ||x(n)x(m)||p ≤ є for n,mNє.

For k fixed,

xk(n) − xk(m)
≤

 ∑ j

xj(n) − xj(m)
p

 1/p
= ⎪⎪
⎪⎪
x(n) − x(m)⎪⎪
⎪⎪
p ≤ є,

when n,mNє. Thus the scalar sequence (xk(n))n=1 is Cauchy in K and hence converges, to xk say. Let x=(xk), so that x is a candidate for the limit of (x(n)).

Firstly, we check that xx(n)lp for some n. Indeed, for a given є>0 find n0 such that ||x(n)x(m)||<є for all n,m>n0. For any K and m:

 K ∑ k=1

xk(n)xk(m)
p ≤  ⎪⎪
⎪⎪
x(n)x(m)⎪⎪
⎪⎪
pp.

Let m→ ∞ then ∑k=1K | xk(n)xk |p ≤ єp.
Let K→ ∞ then ∑k=1| xk(n)xk |p ≤ єp. Thus x(n)xlp and because lp is a linear space then x = x(n)−(x(n)x) is also in lp.

Finally, we saw above that for any є >0 there is n0 such that ||x(n)x||<є for all n>n0. Thus x(n)x.

For p=∞, there are two analogies to the lp spaces. First, we define l to be the vector space of all bounded scalar sequences, with the sup-norm (||·||-norm):

⎪⎪
⎪⎪
(xn)⎪⎪
⎪⎪
=
 sup n∈ℕ

xn
( (xn)∈ l ).   (60)

Second, we define c0 to be the space of all scalar sequences (xn) which converge to 0. We equip c0 with the sup norm ‍(60). This is defined, as if xn→0, then (xn) is bounded. Hence c0 is a subspace of l, and we can check (exercise!) that c0 is closed.

Theorem ‍8 The spaces c0 and l are Banach spaces.

Proof. This is another variant of the previous proof of Thm. ‍6. We do the l case. Again, let (x(n)) be a Cauchy sequence in l, and for each n, let x(n)=(xk(n))k=1. For є>0 we can find N such that ||x(n)x(m)|| < є for n,mN. Thus, for any k, we see that | xk(n)xk(m) | < є when n,mN. So (xk(n))n=1 is Cauchy, and hence converges, say to xk∈K. Let x=(xk).

Let mN, so that for any k, we have that

xk − xk(m)
=
 lim n→∞

xk(n) − xk(m)
≤ є.

As k was arbitrary, we see that supk | xkxk(m) | ≤ є. So, firstly, this shows that (xx(m))∈l, and so also x = (xx(m)) + x(m)l. Secondly, we have shown that ||xx(m)|| ≤ є when mN, so x(m)x in norm.

Example ‍9 We can also consider a Banach space of functions Lp[a,b] with the norm
⎪⎪
⎪⎪
f⎪⎪
⎪⎪
p=

 b a

f(t)
pdt

 1/p
.
See the discussion after Defn. ‍22 for a realisation of such spaces.

### 11.2 Bounded linear operators

Recall what a linear map is, see Defn. ‍1. A linear map is often called an operator. A linear map T:EF between normed spaces is bounded if there exists M>0 such that ||T(x)|| ≤ M ||x|| for xE, see Defn. ‍3. We write B(E,F) for the set of operators from E to F. For the natural operations, B(E,F) is a vector space. We norm B(E,F) by setting

⎪⎪
⎪⎪
T⎪⎪
⎪⎪
= sup

 ⎪⎪ ⎪⎪ T(x) ⎪⎪ ⎪⎪
 ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪
: x∈ E, x≠0

.  (61)
Exercise ‍10 Show that
1. The expression ‍(61) is a norm in the sense of Defn. ‍3.
2. We equivalently have
 ⎪⎪ ⎪⎪ T ⎪⎪ ⎪⎪ = sup ⎧ ⎨ ⎩ ⎪⎪ ⎪⎪ T(x) ⎪⎪ ⎪⎪ : x∈ E, ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ ≤1 ⎫ ⎬ ⎭ = sup ⎧ ⎨ ⎩ ⎪⎪ ⎪⎪ T(x) ⎪⎪ ⎪⎪ : x∈ E, ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ =1 ⎫ ⎬ ⎭ .
Proposition ‍11 For a linear map T:EF between normed spaces, the following are equivalent:
1. T is continuous (for the metrics induced by the norms on E and F);
2. T is continuous at 0;
3. T is bounded.

Proof. Proof essentially follows the proof of similar Theorem ‍4. See also discussion about usefulness of this theorem there.

Theorem ‍12 Let E be a normed space, and let F be a Banach space. Then B(E,F) is a Banach space.

Proof. In the essence, we follows the same three-step procedure as in Thms. ‍24, 6 and 8. Let (Tn) be a Cauchy sequence in B(E,F). For xE, check that (Tn(x)) is Cauchy in F, and hence converges to, say, T(x), as F is complete. Then check that T:EF is linear, bounded, and that ||TnT||→ 0.

We write B(E) for B(E,E). For normed spaces E, F and G, and for TB(E,F) and SB(F,G), we have that ST=STB(E,G) with ||ST|| ≤ ||S|| ||T||.

For TB(E,F), if there exists SB(F,E) with ST=IE, the identity of E, and TS=IF, then T is said to be invertible, and write T=S−1. In this case, we say that E and F are isomorphic spaces, and that T is an isomorphism.

If ||T(x)||=||x|| for each xE, we say that T is an isometry. If additionally T is an isomorphism, then T is an isometric isomorphism, and we say that E and F are isometrically isomorphic.

### 11.3 Dual Spaces

Let E be a normed vector space, and let E* (also written E′) be B(E,K), the space of bounded linear maps from E to K, which we call functionals, or more correctly, bounded linear functionals, see Defn. ‍1. Notice that as K is complete, the above theorem shows that E* is always a Banach space.

Theorem ‍13 Let 1<p<∞, and again let q be such that 1/p+1/q=1. Then the map lq→(lp)*: u↦φu, is an isometric isomorphism, where φu is defined, for u=(uj)∈lq, by
φu(x) =
 ∞ ∑ j=1
ujxj
x=(xj)∈lp
.

Proof. By Hölder’s inequality, we see that

φu(x)
≤
 ∞ ∑ j=1

uj

xj
≤

 ∞ ∑ j=1

uj
q

 1/q

 ∞ ∑ j=1

xj
p

 1/p
= ⎪⎪
⎪⎪
u⎪⎪
⎪⎪
q⎪⎪
⎪⎪
x⎪⎪
⎪⎪
p.

So the sum converges, and hence φu is defined. Clearly φu is linear, and the above estimate also shows that ||φu|| ≤ ||u||q. The map u↦ φu is also clearly linear, and we’ve just shown that it is norm-decreasing.

Now let φ∈(lp)*. For each n, let en = (0,⋯,0,1,0,⋯) with the 1 in the nth position. Then, for x=(xn)∈lp,

⎪⎪
⎪⎪
⎪⎪
⎪⎪
x −
 n ∑ k=1
xkek⎪⎪
⎪⎪
⎪⎪
⎪⎪
p =

 ∞ ∑ k=n+1

xk
p

 1/p
→ 0,

as n→∞. As φ is continuous, we see that

φ(x) =
 lim n→∞
 n ∑ k=1
φ(xkek) =
 ∞ ∑ k=1
xk φ(ek).

Let uk=φ(ek) for each k. If u=(uk)∈lq then we would have that φ=φu.

Let us fix N∈ℕ, and define

xk =

0, if  uk=0  or  k>N;

 uk

uk
q−2,
if  uk≠0 and  k≤ N.

Then we see that

 ∞ ∑ k=1

xk
p =
 N ∑ k=1

uk
p(q−1) =
 N ∑ k=1

uk
q,

as p(q−1) = q. Then, by the previous paragraph,

φ(x) =
 ∞ ∑ k=1
xkuk =
 N ∑ k=1

uk
q.

Hence

⎪⎪
⎪⎪
φ⎪⎪
⎪⎪
≥
 ⎪ ⎪ φ(x) ⎪ ⎪
 ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ p
=

 N ∑ k=1

uk
q

 1−1/p
=

 N ∑ k=1

uk
q

 1/q
.

By letting N→∞, it follows that ulq with ||u||q ≤ ||φ||. So φ=φu and ||φ|| = ||φu|| ≤ ||u||q. Hence every element of (lp)* arises as φu for some u, and also ||φu|| = ||u||q.

Loosely speaking, we say that lq = (lp)*, although we should always be careful to keep in mind the exact map which gives this.

Corollary ‍14 ‍(Riesz–Frechet Self-duality Lemma ‍11)l2 is self-dual: l2=l2*.

Similarly, we can show that c0*=l1 and that (l1)*=l (the implementing isometric isomorphism is giving by the same summation formula).

### 11.4 Hahn–Banach Theorem

Mathematical induction is a well known method to prove statements depending from a natural number. The mathematical induction is based on the following property of natural numbers: any subset of ℕ has the least element. This observation can be generalised to the transfinite induction described as follows.

A poset is a set X with a relation ≼ such that aa for all aX, if ab and ba then a=b, and if ab and bc, then ac. We say that (X,≼) is total if for every a,bX, either ab or ba. For a subset SX, an element aX is an upper bound for S if sa for every sS. An element aX is maximal if whenever bX is such that ab, then also ba.

Then Zorn’s Lemma tells us that if X is a non-empty poset such that every total subset has an upper bound, then X has a maximal element. Really this is an axiom which we have to assume, in addition to the usual axioms of set-theory. Zorn’s Lemma is equivalent to the axiom of choice and Zermelo’s theorem.

Theorem ‍15 ‍(Hahn–Banach Theorem) Let E be a normed vector space, and let FE be a subspace. Let φ∈ F*. Then there exists ψ∈ E* with ||ψ||≤||φ|| and ψ(x)=φ(x) for each xF.

Proof. We do the real case. An “extension” of φ is a bounded linear map φG:G→ℝ such that FGE, φG(x)=φ(x) for xF, and ||φG||≤||φ||. We introduce a partial order on the pairs (G, φG) of subspaces and functionals as follows: (G1, φG1)≼ (G2, φG2) if and only if G1G2 and φG1(x)=φG2(x) for all xG1. A Zorn’s Lemma argument shows that a maximal extension φG:G→ℝ exists. We shall show that if GE, then we can extend φG, a contradiction.

Let xG, so an extension φ1 of φ to the linear span of G and x must have the form

 φ1(x′+ax) = φ(x) + a α    (x′∈ G, a∈ℝ),

for some α∈ℝ. Under this, φ1 is linear and extends φ, but we also need to ensure that ||φ1||≤||φ||. That is, we need

 ⎪ ⎪ φ(x′) + aα ⎪ ⎪ ≤ ⎪⎪ ⎪⎪ φ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ x′+ax ⎪⎪ ⎪⎪ (x′∈ G, a∈ℝ).  (62)

It is straightforward for a=0, otherwise to simplify proof put −a y=x′ in ‍(62) and divide both sides of the identity by a. Thus we need to show that there exist such α that

 ⎪ ⎪ α−φ(y) ⎪ ⎪ ≤ ⎪⎪ ⎪⎪ φ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ x−y ⎪⎪ ⎪⎪ for all   y∈ G, a∈ℝ,

or

 φ(y)− ⎪⎪ ⎪⎪ φ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ x−y ⎪⎪ ⎪⎪ ≤ α ≤ φ(y)+ ⎪⎪ ⎪⎪ φ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ x−y ⎪⎪ ⎪⎪ .

For any y1 and y2 in G we have:

 φ(y1)−φ(y2)≤ ⎪⎪ ⎪⎪ φ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ y1−y2 ⎪⎪ ⎪⎪ ≤ ⎪⎪ ⎪⎪ φ ⎪⎪ ⎪⎪ ( ⎪⎪ ⎪⎪ x−y2 ⎪⎪ ⎪⎪ + ⎪⎪ ⎪⎪ x−y1 ⎪⎪ ⎪⎪ ).

Thus

 φ(y1)− ⎪⎪ ⎪⎪ φ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ x−y1 ⎪⎪ ⎪⎪ ≤ φ(y2)+ ⎪⎪ ⎪⎪ φ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ x−y2 ⎪⎪ ⎪⎪ .

As y1 and y2 were arbitrary,

 sup y∈ G
(φ(y) − ⎪⎪
⎪⎪
φ⎪⎪
⎪⎪
⎪⎪
⎪⎪
y+x⎪⎪
⎪⎪
) ≤
 inf y∈ G
(φ(y) + ⎪⎪
⎪⎪
φ⎪⎪
⎪⎪
⎪⎪
⎪⎪
y+x⎪⎪
⎪⎪
).

Hence we can choose α between the inf and the sup.

The complex case follows by “complexification”.

The Hahn-Banach theorem tells us that a functional from a subspace can be extended to the whole space without increasing the norm. In particular, extending a functional on a one-dimensional subspace yields the following.

Corollary ‍16 Let E be a normed vector space, and let xE. Then there exists φ∈ E* with ||φ||=1 and φ(x)=||x||.

Another useful result which can be proved by Hahn-Banach is the following.

Corollary ‍17 Let E be a normed vector space, and let F be a subspace of E. For xE, the following are equivalent:
1. xF the closure of F;
2. for each φ∈ E* with φ(y)=0 for each yF, we have that φ(x)=0.

Proof. 12 follows because we can find a sequence (yn) in F with ynx; then it’s immediate that φ(x)=0, because φ is continuous. Conversely, we show that if 1 doesn’t hold then 2 doesn’t hold (that is, the contrapositive to 21).

So, xF. Define ψ:{F,x}→K by

 ψ(y+tx) = t    (y∈ F, t∈K).

This is well-defined, for y, y′∈ F if y+tx=y′+tx then either t=t′, or otherwise x = (tt′)−1(y′−y) ∈ F which is a contradiction. The map ψ is obviously linear, so we need to show that it is bounded. Towards a contradiction, suppose that ψ is not bounded, so we can find a sequence (yn+tnx) with ||yn+tnx||≤1 for each n, and yet | ψ(yn+tnx) |=| tn |→∞. Then || tn−1 yn + x || ≤ 1/| tn | → 0, so that the sequence (−tn−1yn), which is in F, converges to x. So x is in the closure of F, a contradiction. So ψ is bounded. By Hahn-Banach theorem, we can find some φ∈ E* extending ψ. For yF, we have φ(y)=ψ(y)=0, while φ(x)=ψ(x)=1, so 2 doesn’t hold, as required.

We define E** = (E*)* to be the bidual of E, and define J:EE** as follows. For xE, J(x) should be in E**, that is, a map E*→K. We define this to be the map φ↦φ(x) for φ∈ E*. We write this as

 J(x)(φ) = φ(x)    (x∈ E, φ∈ E*).

The Corollary ‍16 shows that J is an isometry; when J is surjective (that is, when J is an isomorphism), we say that E is reflexive. For example, lp is reflexive for 1<p<∞. On the other hand c0 is not reflexive.

### 11.5 C(X) Spaces

This section is not examinable. Standard facts about topology will be used in later sections of the course.

All our topological spaces are assumed Hausdorff. Let X be a compact space, and let CK(X) be the space of continuous functions from X to K, with pointwise operations, so that CK(X) is a vector space. We norm CK(X) by setting

⎪⎪
⎪⎪
f⎪⎪
⎪⎪
=
 sup x∈ X

f(x)
(f∈ CK(X)).
Theorem ‍18 Let X be a compact space. Then CK(X) is a Banach space.

Let E be a vector space, and let ||·||(1) and ||·||(2) be norms on E. These norms are equivalent if there exists m>0 with

 m−1 ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ (2) ≤ ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ (1) ≤ m ⎪⎪ ⎪⎪ x ⎪⎪ ⎪⎪ (2)    (x∈ E).
Theorem ‍19 Let E be a finite-dimensional vector space with basis {e1,…,en}, so we can identify E with Kn as vector spaces, and hence talk about the norm ||·||2 on E. If ||·|| is any norm on E, then ||·|| and ||·||2 are equivalent.
Corollary ‍20 Let E be a finite-dimensional normed space. Then a subset XE is compact if and only if it is closed and bounded.
Lemma ‍21 Let E be a normed vector space, and let F be a closed subspace of E with EF. For 0<θ<1, we can find x0E with ||x0||≤1 and ||x0y||>θ for yF.
Theorem ‍22 Let E be an infinite-dimensional normed vector space. Then the closed unit ball of E, the set {xE : ||x||≤ 1}, is not compact.

Proof. Use the above lemma to construct a sequence (xn) in the closed unit ball of E with, say, ||xnxm||≥1/2 for each nm. Then (xn) can have no convergent subsequence, and so the closed unit ball cannot be compact.

## 12 Measure Theory

The presentation in this section is close to ‍[, , ].

### 12.1 Basic Measure Theory

Definition ‍1 Let X be a set. A σ-algebra R on X is a collection of subsets of X, written R⊆ 2X, such that
1. XR;
2. if A,BR, then ABR;
3. if (An) is any sequence in R, then n AnR.

Note, that in the third condition we admit any countable unions. The usage of “σ” in the names of σ -algebra and σ-ring is a reference to this. If we replace the condition by

1. if (An)1m is any finite family in R, then ∪n=1m AnR;

then we obtain definitions of an algebra.

For a σ-algebra R and A,BR, we have

 A ⋂ B = X∖ ⎛ ⎝ X∖(A⋂ B) ⎞ ⎠ = X ∖ ⎛ ⎝ (X∖ A)⋃(X∖ B) ⎞ ⎠ ∈R.

Similarly, R is closed under taking (countably) infinite intersections.

If we drop the first condition from the definition of (σ-)algebra (but keep the above conclusion from it!) we got a (σ-)ring, that is a (σ-)ring is closed under (countable) unions, (countable) intersections and subtractions of sets.

Exercise ‍2 Show that the empty set belongs to any non-empty ring.

Sets Ak are pairwise disjoint if AnAm=∅ for nm. We denote the union of pairwise disjoint sets by ⊔, e.g. ABC.

It is easy to work with a vector space through its basis. For a ring of sets the following notion works as a helpful “basis”.

Definition ‍3 A semiring S of sets is a collection such that
1. it is closed under intersection;
2. for A, BS we have AB=C1⊔ … ⊔ CN with CkS.

Again, any non-empty semiring contain the empty set.

Example ‍4 The following are semirings but not rings:
1. The collection of intervals [a,b) on the real line;
2. The collection of all rectangles { ax < b, cy <d } on the plane.

As the intersection of a family of σ-algebras is again a σ-algebra, and the power set 2X is a σ-algebra, it follows that given any collection D⊆ 2X, there is a σ-algebra R such that DR, such that if S is any other σ-algebra, with DS, then RS. We call R the σ-algebra generated by D.

Exercise ‍5 Let S be a semiring. Show that
1. The collection of all finite disjoint unions k=1n Ak, where AkS, is a ring. We call it the ring R(S) generated by the semiring S.
2. Any ring containing S contains R(S) as well.
3. The collection of all finite (not necessarily disjoint!) unions k=1n Ak, where AkS, coincides with R(S).

We introduce the symbols +∞, −∞, and treat these as being “extended real numbers”, so −∞ < t < ∞ for t∈ℝ. We define t+∞ = ∞, t∞ = ∞ if t>0 and so forth. We do not (and cannot, in a consistent manner) define ∞ − ∞ or 0∞.

Definition ‍6 A measure is a map µ:R→[0,∞] defined on a (semi-)ring (or σ-algebra) R, such that if A=⊔n An for AR and a finite subset (An) of R, then µ (A) = ∑n µ(An). This property is called additivity of a measure.
Exercise ‍7 Show that the following two conditions are equivalent:
1. µ(∅)=0.
2. There is a set AR such that µ(A)<∞.
The first condition often (but not always) is included in the definition of a measure.

In analysis we are interested in infinities and limits, thus the following extension of additivity is very important.

Definition ‍8 In terms of the previous definition we say that µ is countably additive (or σ-additive) if for any countable infinite family (An) of pairwise disjoint sets from R such that A=⊔n AnR we have µ(A) = ∑n µ(An). If the sum diverges, then as it will be the sum of positive numbers, we can, without problem, define it to be +∞.
Example ‍9
1. Fix a point a∈ℝ and define a measure µ by the condition µ(A)=1 if aA and µ(A)=0 otherwise.
2. For the ring obtained in Exercise ‍5 from semiring S in Example ‍1 define µ([a,b))=ba on S. This is a measure, and we will show its σ-additivity.
3. For ring obtained in Exercise ‍5 from the semiring in Example ‍2, define µ(V)=(ba)(dc) for the rectangle V={ ax < b, cy <d } S. It will be again a σ-additive measure.
4. Let X=ℕ and R=2, we define µ(A)=0 if A is a finite subset of X=ℕ and µ(A)=+∞ otherwise. Let An={n}, then µ(An)=0 and µ(⊔n An)=µ(ℕ)=+∞≠ ∑n µ(An)=0. Thus, this measure is not σ-additive.

We will see further examples of measures which are not σ-additive in Section ‍12.4.

Definition ‍10 A measure µ is finite if µ(A)<∞ for all AR.

A measure µ is σ-finite if X is a union of countable number of sets Xk, such that for any AR and any k∈ ℕ the intersection AXk is in R and µ(AXk)<∞.

Exercise ‍11 Modify the example ‍1 to obtain
1. a measure which is not finite, but is σ-finite. (Hint: let the measure count the number of integer points in a set).
2. a measure which is not σ-finite. (Hint: assign µ(A)=+∞ if aA.)
Proposition ‍12 Let µ be a σ-additive measure on a σ-algebra R. Then:
1. If A,BR with AB, then µ(A)≤µ(B) [we call this property “monotonicity of a measure”];
2. If A,BR with AB and µ(B)<∞, then µ(BA) = µ(B) − µ(A);
3. If (An) is a sequence in R, with A1A2A3 ⊆⋯. Then

 lim n→∞
µ(An) = µ
nAn
.
4. If (An) is a sequence in R, with A1A2A3 ⊇⋯. If µ(Am)<∞ for some m, then

 lim n→∞
µ(An) = µ
nAn
.

Proof. The two first properties are easy to see. For the third statement, define A=∪n An, B1=A1 and Bn=AnAn−1, n>1. Then An=⊔k=1n Bn and A=⊔k=1Bn. Using the σ-additivity of measures µ(A)=∑k=1µ(Bk) and µ(An)=∑k=1n µ(Bk). From the theorem in real analysis that any monotonic sequence of real numbers converges (recall that we admit +∞ as limits’ value) we have µ(A)=∑k=1µ(Bk)=limn→ ∞k=1n µ(Bk) = limn→ ∞ µ(An). The last statement can be shown similarly.

### 12.2 Extension of Measures

From now on we consider only finite measures, an extension to σ-finite measures will be done later.

Proposition ‍13 Any measure µ′ on a semiring S is uniquely extended to a measure µ on the generated ring R(S), see Ex. ‍5. If the initial measure was σ-additive, then the extension is σ-additive as well.

Proof. If an extension exists it shall satisfy µ(A)=∑k=1n µ′(Ak), where AkS. We need to show for this definition two elements:

1. Consistency, i.e. independence of the value from a presentation of AR(S) as A=⊔k=1n Ak, where AkS. For two different presentation A=⊔j=1n Aj and A=⊔k=1m Bk define Cjk=AjBk, which will be pair-wise disjoint. By the additivity of µ′ we have µ′(Aj)=∑kµ′(Cjk) and µ′(Bk)=∑jµ′(Cjk). Then

 ∑ j
µ′(Aj)=
 ∑ j
 ∑ k
µ′(Cjk) =
 ∑ k
 ∑ j
µ′(Cjk)=
 ∑ k
µ′(Bk).
2. Additivity. For A=⊔k=1n Ak, where AkR(S) we can present Ak=⊔j=1n(k) Cjk, CjkS. Thus A=⊔k=1nj=1n(k) Cjk and:
µ(A)=
 n ∑ k=1
 n(k) ∑ j=1
µ′(Cjk)=
 n ∑ k=1
µ(Ak).

Finally, show the σ-additivity. For a set A=⊔k=1Ak, where A and AkR(S), find presentations A=⊔j=1n Bj, BjS and Ak=⊔l=1m(k) Blk, BlkS. Define Cjlk=BjBlkS, then Bj=⊔k=1l=1m(k) Cjlk and Ak= ⊔j=1nl=1m(k) Cjlk Then, from σ-additivity of µ′:

µ(A)
=
 n ∑ j=1
µ′(Bj)=
 n ∑ j=1
 ∞ ∑ k=1
 m(k) ∑ l=1
µ′(Cjlk)=
 ∞ ∑ k=1
 n ∑ j=1
 m(k) ∑ l=1
µ′(Cjlk) =
 ∞ ∑ k=1
µ(Ak),

where we changed the summation order in series with non-negative terms.

In a similar way we can extend a measure from a semiring to corresponding σ-ring, however it can be done even for a larger family. The procedure recall the famous story on Baron Munchausen saves himself from being drowned in a swamp by pulling on his own hair. Indeed, initially we knew measure for elements of semiring S or their finite disjoint unions from R(S). For an arbitrary set A we may assign a measure from an element of R(S) which “approximates” A. But how to measure such approximation? Well, to this end we use the measure on R(S) again (pulling on his own hair)!

Coming back to exact definitions, we introduce the following notion.

Definition ‍14 Let S be a semi-ring of subsets in X, and µ be a measure defined on S. An outer measure µ* on X is a map µ*:2X→[0,∞] defined by:
µ*(A)=inf

 ∑ k
µ(Ak),  such that A⊆ ⋃kAk,   Ak∈ S

.
Proposition ‍15 An outer measure has the following properties:
1. µ*(∅)=0;
2. if AB then µ*(A)≤µ*(B), this is called monotonicity of the outer measure;
3. if (An) is any sequence in 2X, then µ*(∪n An) ≤ ∑n µ*(An).

The final condition says that an outer measure is countably sub-additive. Note, that an outer measure may be not a measure in the sense of Defn. ‍6 due to a luck of additivity.

Example ‍16 The Lebesgue outer measure on is defined out of the measure from Example ‍2, that is, for A⊆ℝ, as
µ*(A) = inf

 ∞ ∑ j=1
(bjaj) : A⊆ ⋃j=1[aj,bj)

.
We make this definition, as intuitively, the “length”, or measure, of the interval [a,b) is (ba).

For example, for outer Lebesgue measure we have µ*(A)=0 for any countable set, which follows, as clearly µ*({x})=0 for any x∈ℝ.

Lemma ‍17 Let a<b. Then µ*([a,b])=ba.

Proof. For є>0, as [a,b] ⊆ [a,b+є), we have that µ*([a,b])≤ (ba)+є. As є>0, was arbitrary, µ*([a,b]) ≤ ba.

To show the opposite inequality we observe that [a,b)⊂[a,b] and µ*[a,b) =ba (because [a,b) is in the semi-ring) so µ*[a,b]≥ ba by ‍2.

Our next aim is to construct measures from outer measures. We use the notation AB=(AB)∖ (AB) for symmetric difference of sets.

Definition ‍18 Given an outer measure µ* defined by a measure µ on a semiring S, we define AX to be Lebesgue measurable if for any ε >0 there is a finite union B of elements in S (in other words: BR(S) by Lem. ‍3), such that µ*(AB)<ε .

Obviously all elements of S are measurable. An alternative definition of a measurable set is due to Carathéodory.

Definition ‍19 Given an outer measure µ*, we define EX to be Carathéodory measurable if
 µ*(A) = µ*(A⋂ E) + µ*(A∖ E),
for any AX.

As µ* is sub-additive, this is equivalent to

 µ*(A) ≥ µ*(A⋂ E) + µ*(A∖ E)    (A⊆ X),

as the other inequality is automatic.

Exercise* ‍20 Show that measurability by Lebesgue and Carathéodory are equivalent.

Suppose now that the ring R(S) is an algebra (i.e., contains the maximal element X). Then, the outer measure of any set is finite, and the following theorem holds:

Theorem ‍21 ‍(Lebesgue) Let µ* be an outer measure on X defined by a semiring S, and let L be the collection of all Lebesgue measurable sets for µ*. Then L is a σ-algebra, and if µ′ is the restriction of µ* to L, then µ′ is a measure. Furthermore, µ′ is σ-additive on L if µ is σ-additive on S.

Proof.[Sketch of proof] Clearly, R(S)⊂ L. Now we show that µ*(A)=µ(A) for a set AR(S). If A⊂ ∪k Ak for AkS, then µ(A)≤ ∑k µ(Ak), taking the infimum we get µ(A)≤µ*(A). For the opposite inequality, any AR(S) has a disjoint representation A=⊔k Ak, AkS, thus µ*(A)≤ ∑k µ(Ak)=µ(A).

Now we will show that R(S) is an incomplete metric space, with the measure µ being uniformly continuous functions. Measurable sets make the completion of R(S) with µ being continuation of µ* to the completion by continuity.

Define a distance between elements A, BL as the outer measure of the symmetric difference of A and B: d(A,B)=µ*(AB). Introduce equivalence relation AB if d(A,B)=0 and use the inclusion for the triangle inequality:

 A▵ B ⊆ (A▵ C) ⋃ (C▵ B)

Then, by the definition, Lebesgue measurable sets make the closure of R(S) with respect to this distance.

We can check that measurable sets form an algebra. To this end we need to make estimations, say, of µ*((A1A2)▵ (B1B2)) in terms of µ*(AiBi). A demonstration for any finite number of sets is performed through mathematical inductions. The above two-sets case provide both: the base and the step of the induction.

Now, we show that L is σ-algebra. Let AkL and A=∪k Ak. Then for any ε>0 there exists BkR(S), such that µ*(AkBk)<ε/2k. Define B=∪k Bk. Then

 ⎛ ⎝ ⋃k Ak ⎞ ⎠ ▵ ⎛ ⎝ ⋃k Bk ⎞ ⎠ ⊂    ⋃k ⎛ ⎝ Ak ▵ Bk ⎞ ⎠ implies  µ*(A▵ B)<ε.

We cannot stop at this point since B=∪k Bk may be not in R(S). Thus, define B1=B1 and Bk=Bk∖ ∪i=1k−1 Bi, so Bk are pair-wise disjoint. Then B=⊔k Bk and BkR(S). From the convergence of the series there is N such that ∑k=Nµ(Bk)<ε . Let B′=∪k=1N Bk, which is in R(S). Then µ*(BB′)≤ ε and, thus, µ*(AB′)≤ 2ε.

To check that µ* is measure on L we use the following

Lemma ‍22  | µ*(A)−µ*(B) |≤ µ*(AB), that is µ* is uniformly continuous in the metric d(A,B).

Proof.[Proof of the Lemma] Use inclusions AB∪(AB) and BA∪(AB).

To show additivity take A1,2L , A=A1A2, B1,2R(S) and µ*(AiBi)<ε. Then µ*(A▵(B1B2))<2ε and | µ*(A) − µ*(B1B2) |<2ε. Thus µ*(B1B2)=µ(B1B2)=µ (B1) +µ (B2)−µ (B1B2), but µ (B1B2)=d(B1B2,∅)=d(B1B2,A1A2)<2ε. Therefore

 ⎪ ⎪ µ*(B1⋃ B2)−µ (B1) −µ (B2) ⎪ ⎪ <2ε.

Combining everything together we get:

 ⎪ ⎪ µ*(A)−µ*(A1)−µ*(A2) ⎪ ⎪ <6ε.

Thus µ* is additive.

Check the countable additivity for A=⊔k Ak. The inequality µ*(A)≤ ∑kµ*(Ak) follows from countable sub-additivity. The opposite inequality is the limiting case of the finite inequality µ*(A)≥ µ*(⊔k=1N Ak)=∑k=1Nµ*(Ak) following from additivity and monotonicity of µ*.

Corollary ‍23 Let E⊆ℝ be open or closed. Then E is Lebesgue measurable.

Proof. This is a common trick, using the density and the countability of the rationals. As σ-algebras are closed under taking complements, we need only show that open sets are Lebesgue measurable.

Intervals (a,b) are Lebesgue measurable by the very definition. Now let U⊆ℝ be open. For each xU, there exists ax<bx with x∈(ax,bx)⊆ U. By making ax slightly larger, and bx slightly smaller, we can ensure that ax,bx∈ℚ. Thus U = ∪x (ax, bx). Each interval is measurable, and there are at most a countable number of them (endpoints make a countable set) thus U is the countable (or finite) union of Lebesgue measurable sets, and hence U is Lebesgue measurable itself.

We perform now an extension of finite measure to σ-finite one. Let µ be a σ-additive and σ-finite measure defined on a semiring in X=⊔k Xk, such that the restriction of µ to every Xk is finite. Consider the Lebesgue extension µk of µ defined within Xk. A set AX is measurable if every intersection AXk is µk measurable. For a such measurable set A we define its measure by the identity:

µ(A)=
 ∑ k
µk(A⋂ Xk).

We call a measure µ defined on L complete if whenever EX is such that there exists FL with µ(F)=0 and EF, we have that EL. Measures constructed from outer measures by the above theorem are always complete. On the example sheet, we saw how to form a complete measure from a given measure. We call sets like E null sets: complete measures are useful, because it is helpful to be able to say that null sets are in our σ-algebra. Null sets can be quite complicated. For the Lebesgue measure, all countable subsets of ℝ are null, but then so is the Cantor set, which is uncountable.

Definition ‍24 If we have a property P(x) which is true except possibly xA and µ(A)=0, we say P(x) is almost everywhere or a.e..

### 12.3 Complex-Valued Measures and Charges

We start from the following observation.

Exercise ‍25 Let µ1 and µ2 be measures on a same σ-algebra. Define µ12 and λµ1, λ>0 by 12)(A)=µ1(A)+µ2(A) and (λµ1)(A)=λ(µ1(A)). Then µ12 and λµ1 are measures on the same σ-algebra as well.

In view of this, it will be helpful to extend the notion of a measure to obtain a linear space.

Definition ‍26 Let X be a set, and R be a σ-ring. A real- (complex-) valued function ν on R is called a charge (or signed measure) if it is countably additive as follows: for any AkR the identity A=⊔k Ak implies the series k ν(Ak) is absolute convergent and has the sum ν(A).

In the following “charge” means “real charge”.

Example ‍27 Any linear combination of σ-additive measures on with real (complex) coefficients is real (complex) charge.

The opposite statement is also true:

Theorem ‍28 Any real (complex) charge ν has a representation ν=µ1−µ2 (ν=µ1−µ2+iµ3iµ4), where µk are σ-additive measures.

To prove the theorem we need the following definition.

Definition ‍29 The variation of a charge on a set A is | ν |(A)=sup ∑k| ν(Ak) | for all disjoint splitting A=⊔k Ak.
Example ‍30 If ν=µ1−µ2, then | ν |(A)≤ µ1(A)+µ2(A). The inequality becomes an identity for disjunctive measures on A (that is there is a partition A=A1A2 such that µ2(A1)=µ1(A2)=0).

The relation of variation to charge is as follows:

Theorem ‍31 For any charge ν the function | ν | is a σ-additive measure.

Finally to prove the Thm. ‍28 we use the following

Proposition ‍32 For any charge ν the function | ν |−ν is a σ-additive measure as well.

From the Thm. ‍28 we can deduce

Corollary ‍33 The collection of all charges on a σ-algebra R is a linear space which is complete with respect to the distance:
d12)=
 sup A∈R

ν1(A)−ν2(A)
.

The following result is also important:

Theorem ‍34 ‍(Hahn Decomposition) Let ν be a charge. There exist A,BL, called a Hahn decomposition of (X,ν), with AB=∅, AB= X and such that for any EL,
 ν (A⋂ E) ≥ 0,   ν(B⋂ E)≤ 0.
This need not be unique.

Proof.[Sketch of proof] We only sketch this. We say that AL is positive if

 ν(E⋂ A)≥0    (E∈L),

and similiarly define what it means for a measurable set to be negative. Suppose that ν never takes the value −∞ (the other case follows by considering the charge −ν).

Let β = infν(B0) where we take the infimum over all negative sets B0. If β=−∞ then for each n, we can find a negative Bn with ν(Bn)≤ −n. But then B=∪n Bn would be negative with ν(B)≤ −n for any n, so that ν(B)=−∞ a contradiction.

So β>−∞ and so for each n we can find a negative Bn ν(Bn) < β+1/n. Then we can show that B = ∪n Bn is negative, and argue that ν(B) ≤ β. As B is negative, actually ν(B) = β.

There then follows a very tedious argument, by contradiction, to show that A=XB is a positive set. Then (A,B) is the required decomposition.

### 12.4 Constructing Measures, Products

Consider the semiring S of intervals [a,b). There is a simple description of all measures on it. For a measure µ define

Fµ(t)=

 µ([0,t)) if  t>0, 0 if  t=0, −µ([t,0)) if  t<0,
(63)

Fµ is monotonic and any monotonic function F defines a measure µ on S by the by µ([a,b))=F(b)−F(a). The correspondence is one-to-one with the additional assumption F(0)=0.

Theorem ‍35 The above measure µ is σ-additive on S if and only if F is continuous from the left: F(t−0)=F(t) for all t∈ℝ.

Proof. Necessity: F(t)−F(t−0)=limε→ 0µ([t−ε,t))=µ(limε→ 0[t−ε,t))=µ(∅)=0, by the continuity of a σ-additive measure, see ‍4.

For sufficiency assume [a,b)=⊔k [ak,bk). The inequality µ([a,b))≥ ∑k µ([ak,bk)) follows from additivity and monotonicity. For the opposite inequality take δk s.t. F(b)−F(b−δ)<ε and F(ak)−F(ak−δk)<ε/2k (use left continuity of F). Then the interval [a,b−δ] is covered by (ak−δk,bk), due to compactness of [a,b−δ] there is a finite subcovering. Thus µ([a,b−δ ))≤∑j=1N µ([akj−δkj,bkj)) and µ([a,b))≤∑j=1N µ([akj,bkj))+2ε .

Exercise ‍36
1. Give an example of function discontinued from the left at 1 and show that the resulting measure is additive but not σ-additive.
2. Check that, if a function F is continuous at point a then µ({a})=0.
Example ‍37
1. Take F(t)=t, then the corresponding measure is the Lebesgue measure on .
2. Take F(t) be the integer part of t, then µ counts the number of integer within the set.
3. Define the Cantor function as follows α(x)=1/2 on (1/3,2/3); α(x)=1/4 on (1/9,2/9); α(x)=3/4 on (7/9,8/9), and so for. This function is monotonic and can be continued to [0,1] by continuity, it is know as Cantor ladder. The resulting measure has the following properties:
• The measure of the entire interval is 1.
• Measure of every point is zero.
• The measure of the Cantor set is 1, while its Lebesgue measure is 0.

Another possibility to build measures is their product. In particular, it allows to expand various measures defined through ‍(63) on the real line to ℝn.

Definition ‍38 Let X and Y be spaces, and let S and T be semirings on X and Y respectively. Then S× T is the semiring consisting of { A× B : AS, BT } (“generalised rectangles”). Let µ and ν be measures on S and T respectively. Define the product measure µ×ν on S× T by the rule (µ× ν)(A× B)=µ(A) ν(B).
Example ‍39 The measure from Example ‍3 is the product of two copies of pre-Lebesgue measures from Example ‍2.

## 13 Integration

We now come to the main use of measure theory: to define a general theory of integration.

### 13.1 Measurable functions

From now on, by a measure space we shall mean a triple (X,L,µ), where X is a set, L is a σ-algebra on X, and µ is a σ-additive measure defined on L. We say that the members of L are measurable, or L-measurable, if necessary to avoid confusion.

Definition ‍1 A function f:X→ℝ is measurable if
 Ec(f)={x∈ X: f(x)
is in L for any c∈ℝ.

A complex-valued function is measurable if its real and imaginary parts are measurable.

Lemma ‍2 The following are equivalent:
1. A function f is measurable;
2. For any a<b the set f−1((a,b)) is measurable;
3. For any open set U⊂ ℝ the set f−1(U) is measurable.

Proof. Use that any open set U⊂ ℝ is a union of countable set of intervals (a,b), cf. proof of Cor. ‍23.

Corollary ‍3 Let f: X → ℝ be measurable and g: ℝ → ℝ be continuous, then the composition g(f(x)) is measurable.

Proof. The preimage of the open set (−∞,c) under a continuous g is an open set, say U. The preimage of U under f is measurable by Lem. ‍3. Thus, the preimage of (−∞,c) under the composition gf is measurable, thereafter gf is a measurable function.

Theorem ‍4 Let f,g:X→ℝ be measurable. Then af (a∈ℝ), f+g, fg, max(f,g) and min(f,g) are all measurable. That is measurable functions form an algebra and this algebra is closed under convergence a.e.

Proof. Use Cor. ‍3 to show measurability of λ f, | f | and f2.

Next use the following identities:

Ec(f1+f2)=r∈ℚ (Er(f1)⋂ Ecr(f2)),
f1f2=
 (f1+f2)2−(f1−f2)2 4
,
max(f1,f2)=
 (f1+f2)+ ⎪ ⎪ f1−f2 ⎪ ⎪
2
.

If (fn) is a non-increasing sequence of measurable functions converging to f. Than Ec(f)=∪n Ec(fn).

Moreover any limit can be replaced by two monotonic limits:

 lim n→ ∞
fn(x)=
 lim n→ ∞
 lim k→ ∞
max (fn(x), fn+1(x),…,fn+k(x)). (64)

Finally if f1 is measurable and f2=f1 almost everywhere, then f2 is measurable as well.

We can define several types of convergence for measurable functions.

Definition ‍5 We say that sequence (fn) of functions converges
1. uniformly to f (notated fnf) if

 sup x∈ X

fn(x)−f(x)
→ 0;
2. almost everywhere to f (notated fna.e.f) if
 fn(x)→ f(x)    for all  x∈ X∖ A,  µ(A)=0;
3. in measure µ to f (notated fnµf) if for all ε>0
 µ({x∈ X: ⎪ ⎪ fn(x)−f(x) ⎪ ⎪ >ε }) → 0.

Clearly uniform convergence implies both convergences a.e and in measure.

Theorem ‍6 On finite measures convergence a.e. implies convergence in measure.

Proof. Define An(ε)={xX: | fn(x)−f(x) |≥ ε}. Let Bn(ε)=∪kn Ak(ε). Clearly Bn(ε)⊃ Bn+1(ε), let B(ε)=∩1Bn(ε). If xB(ε) then fn(x)↛f(x). Thus µ(B(ε))=0, but µ(B(ε))=limn→ ∞µ(Bn(ε)). Since An(ε)⊂ Bn(ε) we see that µ(An(ε))→ 0.

Note, that the construction of sets Bn(ε) is just another implementation of the “two monotonic limits” trick ‍(64) for sets.

Exercise ‍7 Present examples of sequences (fn) and functions f such that:
1. fnµf but not fna.e.f.
2. fna.e.f but not fnf.

However we can slightly “fix” either the set or the sequence to “upgrade” the convergence as shown in the following two theorems.

Theorem ‍8 ‍(Egorov) If fna.e.f on a finite measure set X then for any σ>0 there is EσX with µ(Eσ)<σ and fnf on XEσ.

Proof. We use An(ε) and Bn(ε) from the proof of Thm. ‍6. For every ε>0 we seen µ(Bn(ε))→ 0, thus for each k there is N(k) such that µ(BN(k)(1/k))<σ/2k. Put Eσ=∪k BN(k)(1/k).

Theorem ‍9 If fnµf then there is a subsequence (nk) such that fnka.e.f for k→ ∞.

Proof. In the notations of two previous proofs: for every natural k take nk such that µ(Ank(1/k))< 1/2k. Define Cm=∪k=mAnk(1/k) and C=∩ Cm. Then, µ(Cm)=1/2m−1 and, thus, µ(C)=0. If xC then there is such N that xAnk(1/k) for all k>N. That means that | fnk(x)−f(x) |<1/k for all such k, i.e fnk(x)→ f(x).

It is worth to note, that we can use the last two theorem subsequently and upgrade the convergence in measure to the uniform convergence of a subsequence on a subset.

Exercise ‍10 For your counter examples from Exercise ‍7, find
1. a subsequence fnk of the sequence from ‍1 which converges to f a.e.;
2. a subset such that sequence from ‍2 converges uniformly.
Exercise ‍11 Read about Luzin’s C-property.

### 13.2 Lebsgue Integral

First we define a sort of “basis” for the space of integral functions.

Definition ‍12 For AX, we define χA to be the indicator function of A, by
χA(x) =

 1 : x∈ A, 0 : x∉A.

Then, if χA is measurable, then χA−1( (1/2,3/2) ) = AL; conversely, if AL, then XAL, and we see that for any U⊆ℝ open, χA−1(U) is either ∅, A, XA, or X, all of which are in L. So χA is measurable if and only if AL.

Definition ‍13 A measurable function f:X→ℝ is simple if it attains only a countable number of values.
Lemma ‍14 A function f:X→ℝ is simple if and only if
f =
 ∞ ∑ k=1
tk χAk(65)
for some (tk)k=1⊆ℝ and AkL. That is, simple functions are linear combinations of indicator functions of measurable sets.

Moreover in the above representation the sets Ak can be pair-wise disjoint and all tk≠ 0 pair-wise different. In this case the representation is unique.

Notice that it is now obvious that

Corollary ‍15 The collection of simple functions forms a vector space: this wasn’t clear from the original definition.
Definition ‍16 A simple function in the form ‍(65) with disjoint Ak is called summable if the following series converges:
 ∞ ∑ k=1

tk
µ(Ak)   if f has the above unique representation   f =
 ∞ ∑ k=1
tk χAk (66)

It is another combinatorial exercise to show that this definition is independent of the way we write f.

Definition ‍17 We define the integral of a simple function f:X→ ℝ over a measurable set A by setting

 A
f  dµ =
 ∞ ∑ k=1
tk µ(AkA).

Clearly the series converges for any simple summable function f. Moreover

Lemma ‍18 The value of integral of a simple summable function is independent from its representation by the sum of indicators ‍(65). In particular, we can evaluate the integral taking the canonical representation over pair-wise disjoint sets having pair-wise different values.

Proof. This is another slightly tedious combinatorial exercise. You need to prove that the integral of a simple function is well-defined, in the sense that it is independent of the way we choose to write the simple function.

Exercise ‍19 Let f be the function on [0,1] which take the value 1 in all rational points and 0—everywhere else. Find the value of the Lebesgue integral [0,1] f,dµ with respect to the Lebesgue measure on [0,1]. Show that the Riemann upper- and lower sums for f converges to different values, so f is not Riemann-integrable.
Remark ‍20 The previous exercise shows that the Lebesgue integral does not have those problems of the Riemann integral related to discontinuities. Indeed, most of function which are not Riemann-integrable are integrable in the sense of Lebesgue. The only reason, why a measurable function is not integrable by Lebesgue is divergence of the series ‍(66). Therefore, we prefer to speak that the function is summable rather than integrable. However, those terms are used interchangeably in the mathematical literature.

We will denote by S(X) the collection of all simple summable functions on X.

Proposition ‍21 Let f, g:X→ ℝ be in S(X) (that is simple summable), let a, b∈ ℝ and A is a measurable. Then:
1. A af+bgdµ = aA fdµ + bA gdµ, that is S(X) is a linear space;
2. The correspondence f→ ∫A fdµ is a linear functional on S(X);
3. The correspondence A → ∫A fdµ is a charge;
4. The function
d1(f,g)=
 X

f(x)−g(x)
dµ(x) (67)
has all properties of the distance on S(X) probably except separation.
5. For the above function d1:

 A
f(x) dµ(x)−
 A
g(x) dµ(x)

≤ d1(f,g).
6. If fg then X fdµ ≤ ∫X gdµ, that is integral is monotonic;
7. For f≥ 0 we have X fdµ=0 if and only if µ( { xX : f(x)≠0 } ) = 0.

Proof. The proof is almost obvious, for example the Property ‍1 easily follows from Lem. ‍18.

We will outline ‍3 only. Let f is an indicator function of a set B, then A→ ∫A fdµ=µ(AB) is a σ-additive measure (and thus—a charge). By the Cor. ‍33 the same is true for finite linear combinations of indicator functions and their limits in the sense of distance d1.

We can identify functions which has the same values a.e. Then S(X) becomes a metric space with the distance d1 ‍(67). The space may be incomplete and we may wish to look for its completion. However, if we will simply try to assign a limiting point to every Cauchy sequence in S(X), then the resulting space becomes so huge that it will be impossible to realise it as a space of functions on X. To reduce the number of Cauchy sequences in S(X) eligible to have a limit, we shall ask an additional condition. A convenient reduction to functions on X appears if we ask both the convergence in d1 metric and the point-wise convergence on X a.e.

Definition ‍22 A function f is summable by a measure µ if there is a sequence (fn)⊂S(X) such that
1. the sequence (fn) is a Cauchy sequence in S(X);
2. fna.e. f.

Clearly, if a function is summable, then any equivalent function is summable as well. Set of equivalent classes will be denoted by L1(X).

Lemma ‍23 If the measure µ is finite then any bounded measurable function is summable.

Proof. Define Ekn(f)={xX: k/nf(x)< (k+1)/n} and fn=∑k k/n χEkn (note that the sum is finite due to boundedness of f).

Since | fn(x)−f(x) |<1/n we have uniform convergence (thus convergence a.e.) and (fn) is the Cauchy sequence: d1(fn,fm)=∫X| fnfm | dµ≤ (1/n+1/m)µ(X).

Remark ‍24 This Lemma can be extended to the space of essentially bounded functions L(X), in other words L(X)⊂L1(X) for finite measures.

Another simple result, which is useful on many occasions is as follows.

Lemma ‍25 If the measure µ is finite and fnf then d1(fn,f)→ 0.
Corollary ‍26 For a convergent sequence fna.e. f, which admits the uniform bound | fn(x) |<M for all n and x, we have d1(fn,f)→ 0.

Proof. For any ε>0, by the Egorov’s theorem ‍8 we can find E, such that

1. µ(E)< ε/2M; and
2. from the uniform convergence on XE there exists N such that for any n>N we have | f(x)−fn(x) |<ε /2µ(X).

Combining this we found that for n>N, d1(fn,f)< M ε/2M + µ(X) ε /2µ(X) < ε .

Exercise ‍27 Convergence in the metric d1 and a.e. do not imply each other:
1. Give an example of fna.e. f such that d1(fn ,f)↛0.
2. Give an example of the sequence (fn) and function f in L1(X) such that d1(fn ,f)→ 0 but fn does not converge to f a.e.

To build integral we need the following

Lemma ‍28 Let (fn) and (gn) be two Cauchy sequences in S(X) with the same limit a.e., then d1(fn,gn)→ 0.

Proof. Let φn=fngn, then this is a Cauchy sequence with zero limit a.e. Assume the opposite to the statement: there exist δ>0 and sequence (nk) such that ∫x| φnk | dµ>δ. Rescaling-renumbering we can obtain ∫x| φn | dµ>1.

Take quickly convergent subsequence using the Cauchy property:

 d1(φnk,φnk+1)≤ 1/2k+2.

Renumbering agian assume d1kk+1)≤ 1/2k+2

Since φ1 is a simple, that is φ1=∑k tk χAk and ∑k | tk | µ(Ak)=∫X | φ1 | dµ≥ 1. Thus there exists N, such that ∑k=1N | tk | µ(Ak)≥ 3/4. Put A=⊔k=1N Ak and C=max1≤ kN| tk |=maxxA| φ1(x) |.

By the Egorov’s Theorem ‍8 there is EA such that µ(E)<1/(4C) and φn⇉ 0 on B=AE. Then

 B

φ1
dµ=
 A

φ1
dµ−
 E

φ1
dµ≥
 3 4
 1 4C
· C=
 1 2
.

Since

 B

φn
dµ−
 B

φn+1
dµ ⎪<