Introduction to Functional AnalysisVladimir V. Kisil 
Abstract: This is lecture notes for several courses on Functional Analysis at School of Mathematics of University of Leeds. They are based on the notes of Dr. Matt Daws, Prof. Jonathan R. Partington and Dr. David Salinger used in the previous years. Some sections are borrowed from the textbooks, which I used since being a student myself. However all misprints, omissions, and errors are only my responsibility. I am very grateful to Filipa Soares de Almeida, Eric Borgnet, Pasc Gavruta for pointing out some of them. Please let me know if you find more.The notes are available also for download in PDF.
The suggested textbooks are [, , , ]. The other nice books with many interesting problems are [, ].
Exercises with stars are not a part of mandatory material but are nevertheless worth to hear about. And they are not necessarily difficult, try to solve them!
ℤ_{+}, ℝ_{+} denotes nonnegative integers
and reals.
x,y,z,… denotes vectors.
λ,µ,ν,… denotes scalars.
ℜ z, ℑ z stand for real and imaginary parts of a complex number
z.
In this course, the functions we consider will be real or complex valued functions defined on the real line which are locally Riemann integrable. This means that they are Riemann integrable on any finite closed interval [a,b]. (A complex valued function is Riemann integrable iff its real and imaginary parts are Riemannintegrable.) In practice, we shall be dealing mainly with bounded functions that have only a finite number of points of discontinuity in any finite interval. We can relax the boundedness condition to allow improper Riemann integrals, but we then require the integral of the absolute value of the function to converge.
We mention this right at the start to get it out of the way. There are many fascinating subtleties connected with Fourier analysis, but those connected with technical aspects of integration theory are beyond the scope of the course. It turns out that one needs a “better” integral than the Riemann integral: the Lebesgue integral, and I commend the module, Linear Analysis 1, which includes an introduction to that topic which is available to MM students (or you could look it up in Real and Complex Analysis by Walter Rudin). Once one has the Lebesgue integral, one can start thinking about the different classes of functions to which Fourier analysis applies: the modern theory (not available to Fourier himself) can even go beyond functions and deal with generalized functions (distributions) such as the Dirac delta function which may be familiar to some of you from quantum theory.
From now on, when we say “function”, we shall assume the conditions of the first paragraph, unless anything is stated to the contrary.
Before proceed with an abstract theory we consider a motivating example: Fourier series.
In this part of the course we deal with functions (as above) that are periodic.
We say a function f:ℝ→ℂ is periodic with period T>0 if f(x+T)= f(x) for all x∈ ℝ. For example, sinx, cosx, e^{ix}(=cos x+i sinx) are periodic with period 2π. For k∈ R∖{0}, sinkx, coskx, and e^{ikx} are periodic with period 2π/k. Constant functions are periodic with period T, for any T>0. We shall specialize to periodic functions with period 2π: we call them 2πperiodic functions, for short. Note that cosnx, sinnx and e^{inx} are 2πperiodic for n∈ℤ. (Of course these are also 2π/nperiodic.)
Any halfopen interval of length T is a fundamental domain of a periodic function f of period T. Once you know the values of f on the fundamental domain, you know them everywhere, because any point x in ℝ can be written uniquely as x=w+nT where n∈ ℤ and w is in the fundamental domain. Thus f(x) = f(w+(n−1)T +T)=⋯ =f(w+T) =f(w).
For 2πperiodic functions, we shall usually take the fundamental domain to
be ]−π, π]. By abuse of language, we shall sometimes refer to [−π,
π] as the fundamental domain. We then have to be aware that f(π)=f(−π).
We shall need to calculate ∫_{a}^{b} e^{ikx} dx, for k∈ℝ. Note first that when k=0, the integrand is the constant function 1, so the result is b−a. For nonzero k, ∫_{a}^{b} e^{ikx} dx= ∫_{a}^{b} (coskx+isinkx) dx = (1/k)[ (sinkx − icoskx)]_{a}^{b} = (1/ik)[(coskx+isinkx)]_{a}^{b} = (1/ik)[e^{ikx}]_{a}^{b} = (1/ik)(e^{ikb}−e^{ika}). Note that this is exactly the result you would have got by treating i as a real constant and using the usual formula for integrating e^{ax}. Note also that the cases k=0 and k≠0 have to be treated separately: this is typical.
f(n) = 

 f(x) e^{−inx} dx . 
(c f + d g6) (n) = cf(n) + dĝ(n) . 
p(x) = 
 p(n)e^{inx} . 
This follows immediately from Ex. 2 and Prop.4.
f(x) = 
 f(n)e^{inx} . (1) 
For realvalued functions, the introduction of complex exponentials seems artificial: indeed they can be avoided as follows. We work with (1) in the case of a finite sum: then we can rearrange the sum as

Here

for n>0 and
b_{n} =i((f(n)−f(−n))= 

 f(x)sin nx dx 
for n>0. a_{0} = 1/π∫_{−π}^{π}f(x) dx, the constant chosen for consistency.
The a_{n} and b_{n} are also called Fourier coefficients: if it is necessary to distinguish them, we may call them Fourier cosine and sine coefficients, respectively.
We note that if f is realvalued, then the a_{n} and b_{n} are real numbers and so ℜ f(n) = ℜ f(−n), ℑ f(n) = −ℑf(n): thus f(−n) is the complex conjugate of f(n). Further, if f is an even function then all the sine coefficients are 0 and if f is an odd function, all the cosine coefficients are zero. We note further that the sine and cosine coefficients of the functions coskx and sinkx themselves have a particularly simple form: a_{k}=1 in the first case and b_{k}=1 in the second. All the rest are zero.
For example, we should expect the 2πperiodic function whose value on ]−π,π] is x to have just sine coefficients: indeed this is the case: a_{n}=0 and b_{n}=i(f(n)−f(−n)) = (−1)^{n+1}2/n for n>0.
The above question can then be reformulated as “to what extent is f(x) represented by the Fourier series a_{0}/2 + ∑_{n>0}(a_{n}cosx + b_{n}sinx)?” For instance how well does ∑(−1)^{n+1}(2/n)sinnx represent the 2πperiodic sawtooth function f whose value on ]−π, π] is given by f(x) = x. The easy points are x=0, x=π, where the terms are identically zero. This gives the ‘wrong’ value for x=π, but, if we look at the periodic function near π, we see that it jumps from π to −π, so perhaps the mean of those values isn’t a bad value for the series to converge to. We could conclude that we had defined the function incorrectly to begin with and that its value at the points (2n+1)π should have been zero anyway. In fact one can show (ref. ) that the Fourier series converges at all other points to the given values of f, but I shan’t include the proof in this course. The convergence is not at all uniform (it can’t be, because the partial sums are continuous functions, but the limit is discontinuous.) In particular we get the expansion
 = 2(1−1/3+1/5−⋯) 
which can also be deduced from the Taylor series for tan^{−1}.
In this subsection we shall discuss the formal solutions of the wave equation in a special case which Fourier dealt with in his work.
We discuss the wave equation
 = 

 , (2) 
subject to the boundary conditions
y(0, t) = y(π, t) = 0, (3) 
for all t≥0, and the initial conditions

This is a mathematical model of a string on a musical instrument (guitar, harp, violin) which is of length π and is plucked, i.e. held in the shape F(x) and released at time t=0. The constant K depends on the length, density and tension of the string. We shall derive the formal solution (that is, a solution which assumes existence and ignores questions of convergence or of domain of definition).
We first look (as Fourier and others before him did) for solutions of the form y(x,t) = f(x)g(t). Feeding this into the wave equation (2) we get
f^{′′}(x) g(t) = 
 f(x) g^{′′}(t) 
and so, dividing by f(x)g(t), we have
 = 

 . (4) 
The lefthand side is an expression in x alone, the righthand side in t alone. The conclusion must be that they are both identically equal to the same constant C, say.
We have f^{′′}(x) −Cf(x) =0 subject to the condition f(0) = f(π) =0. Working through the method of solving linear second order differential equations tells you that the only solutions occur when C = −n^{2} for some positive integer n and the corresponding solutions, up to constant multiples, are f(x) = sinnx.
Returning to equation (4) gives the equation g^{′′}(t)+K^{2}n^{2}g(t) =0 which has the general solution g(t) = a_{n}cosKnt + b_{n}sinKnt. Thus the solution we get through separation of variables, using the boundary conditions but ignoring the initial conditions, are
y_{n}(x,t) = sinnx(a_{n} cosKnt + b_{n} sinKnt) , 
for n≥ 1.
To get the general solution we just add together all the solutions we have got so far, thus
y(x,t) = 
 sinnx(a_{n} cosKnt + b_{n} sin Knt) (5) 
ignoring questions of convergence. (We can do this for a finite sum without difficulty because we are dealing with a linear differential equation: the iffy bit is to extend to an infinite sum.)
We now apply the initial condition y(x,0) = F(x) (note F has F(0) =F(π) =0). This gives
F(x) = 
 a_{n}sinnx . 
We apply the reflection trick: the righthand side is a series of odd functions so if we extend F to a function G by reflection in the origin, giving
G(x):=  ⎧ ⎨ ⎩ 

we have
G(x) = 
 a_{n}sinnx , 
for −π≤ x ≤ π.
If we multiply through by sinrx and integrate term by term, we get
a_{r} = 

 G(x)sinrx dx 
so, assuming that this operation is valid, we find that the a_{n} are precisely the sine coefficients of G. (Those of you who took Real Analysis 2 last year may remember that a sufficient condition for integrating termby term is that the series which is integrated is itself uniformly convergent.)
If we now assume, further, that the righthand side of (5) is differentiable (term by term) we differentiate with respect to t, and set t=0, to get
0=y_{t}(x,0) = 
 b_{n} K n sinnx. (6) 
This equation is solved by the choice b_{n}=0 for all n, so we have the following result
y(x,t) = 
 a_{n} sinnx cosKnt ,(2.11) 
a_{n} = 

 G(x)sinnx dx 
Joseph Fourier, Civil Servant, Egyptologist, and mathematician, was born in 1768 in Auxerre, France, son of a tailor. Debarred by birth from a career in the artillery, he was preparing to become a Benedictine monk (in order to be a teacher) when the French Revolution violently altered the course of history and Fourier’s life. He became president of the local revolutionary committee, was arrested during the Terror, but released at the fall of Robespierre.
Fourier then became a pupil at the Ecole Normale (the teachers’ academy) in Paris, studying under such great French mathematicians as Laplace and Lagrange. He became a teacher at the Ecole Polytechnique (the military academy).
He was ordered to serve as a scientist under Napoleon in Egypt. In 1801, Fourier returned to France to become Prefect of the Grenoble region. Among his most notable achievements in that office were the draining of some 20 thousand acres of swamps and the building of a new road across the alps.
During that time he wrote an important survey of Egyptian history (“a masterpiece and a turning point in the subject”).
In 1804 Fourier started the study of the theory of heat conduction, in the course of which he systematically used the sineandcosine series which are named after him. At the end of 1807, he submitted a memoir on this work to the Academy of Science. The memoir proved controversial both in terms of his use of Fourier series and of his derivation of the heat equation and was not accepted at that stage. He was able to resubmit a revised version in 1811: this had several important new features, including the introduction of the Fourier transform. With this version of his memoir, he won the Academy’s prize in mathematics. In 1817, Fourier was finally elected to the Academy of Sciences and in 1822 his 1811 memoir was published as “Théorie de la Chaleur”.
For more details see Fourier Analysis by T.W. Körner, 475480 and for even more, see the biography by J. Herivel Joseph Fourier: the man and the physicist.
What is Fourier analysis. The idea is to analyse functions (into sine and cosines or, equivalently, complex exponentials) to find the underlying frequencies, their strengths (and phases) and, where possible, to see if they can be recombined (synthesis) into the original function. The answers will depend on the original properties of the functions, which often come from physics (heat, electronic or sound waves). This course will give basically a mathematical treatment and so will be interested in mathematical classes of functions (continuity, differentiability properties).
A person is solely the concentration of an infinite set of interrelations with another and others, and to separate a person from these relations means to take away any real meaning of the life.
Vl. Soloviev
A space around us could be described as a three dimensional Euclidean space. To single out a point of that space we need a fixed frame of references and three real numbers, which are coordinates of the point. Similarly to describe a pair of points from our space we could use six coordinates; for three points—nine, end so on. This makes it reasonable to consider Euclidean (linear) spaces of an arbitrary finite dimension, which are studied in the courses of linear algebra.
The basic properties of Euclidean spaces are determined by its linear and metric structures. The linear space (or vector space) structure allows to add and subtract vectors associated to points as well as to multiply vectors by real or complex numbers (scalars).
The metric space structure assign a distance—nonnegative real number—to a pair of points or, equivalently, defines a length of a vector defined by that pair. A metric (or, more generally a topology) is essential for definition of the core analytical notions like limit or continuity. The importance of linear and metric (topological) structure in analysis sometime encoded in the formula:
Analysis = Algebra + Geometry . (7) 
On the other hand we could observe that many sets admit a sort of linear and metric structures which are linked each other. Just few among many other examples are:
It is a very mathematical way of thinking to declare such sets to be spaces and call their elements points.
But shall we lose all information on a particular element (e.g. a sequence {1/n}) if we represent it by a shapeless and sizeless “point” without any inner configuration? Surprisingly not: all properties of an element could be now retrieved not from its inner configuration but from interactions with other elements through linear and metric structures. Such a “sociological” approach to all kind of mathematical objects was codified in the abstract category theory.
Another surprise is that starting from our three dimensional Euclidean
space and walking far away by a road of abstraction to infinite
dimensional Hilbert spaces we are arriving just to yet another picture
of the surrounding space—that time on the language of
quantum mechanics.
The distance from Manchester to Liverpool is 35 miles—just about the mileage in the opposite direction!
A tourist guide to England
The following definition generalises the notion of distance known from the everyday life.
The following notion is a useful specialisation of metric adopted to the linear structure.
The connection between norm and metric is as follows:
Proof.
This is a simple exercise to derive items 1–3 of Definition 1 from corresponding items of Definition 3. For example, see the Figure 1 to derive the triangle inequality.
An important notions known from real analysis are limit and convergence. Particularly we usually wish to have enough limiting points for all “reasonable” sequences.
For example, the set of integers ℤ and reals ℝ with the natural distance functions are complete spaces, but the set of rationals ℚ is not. The complete normed spaces deserve a special name.
⎪⎪ ⎪⎪  (x_{1},…,x_{n})  ⎪⎪ ⎪⎪  _{2} =  √ 
 . 
⎪⎪ ⎪⎪  (x_{1},…,x_{n})  ⎪⎪ ⎪⎪  _{1} = 
 . 
⎪⎪ ⎪⎪  (x_{1},…,x_{n})  ⎪⎪ ⎪⎪  _{∞} = max( 
 ). 
—We need an extra space to accommodate this product!
A manager to a shop assistant
Although metric and norm capture important geometric information about linear spaces they are not sensitive enough to represent such geometric characterisation as angles (particularly orthogonality). To this end we need a further refinements.
From courses of linear algebra known that the scalar product ⟨ x,y ⟩= x_{1} y_{1} + ⋯ + x_{n} y_{n} is important in a space ℝ^{n} and defines a norm x^{2}=⟨ x,x ⟩. Here is a suitable generalisation:
Last two properties of the scalar product is oftenly encoded in the phrase: “it is linear in the first variable if we fix the second and antilinear in the second if we fix the first”.
l_{2}={ sequences {x_{j}}_{1}^{∞} ∣ 
 ⎪ ⎪  x_{j}  ⎪ ⎪  ^{2} < ∞}. (8) 
⟨ f,g ⟩= 
 f(x)ḡ(x) dx and  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{2}=  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  f(x)  ⎪ ⎪  ^{2} dx  ⎞ ⎟ ⎟ ⎠ 
 . (9) 
Now we state, probably, the most important inequality in analysis.
⎪ ⎪  ⟨ x,y ⟩  ⎪ ⎪  ≤  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  y  ⎪⎪ ⎪⎪  , (10) 
Proof. For any x, y∈ V and any t∈ℝ we have:
0< ⟨ x+t y,x+t y ⟩= ⟨ x,x ⟩+2t ℜ ⟨ y,x ⟩+t^{2}⟨ y,y ⟩), 
Thus the discriminant of this quadratic expression in t is nonpositive: (ℜ ⟨ y,x ⟩)^{2}−x^{2}y^{2}≤ 0, that is  ℜ ⟨ x,y ⟩ ≤xy. Replacing y by e^{iα}y for an arbitrary α∈[−π,π] we get  ℜ (e^{iα}⟨ x,y ⟩)  ≤xy, this implies the desired inequality.
Proof. Just to check items 1–3 from Definition 3.
Again complete inner product spaces deserve a special name
The relations between spaces introduced so far are as follows:
Hilbert spaces  ⇒  Banach spaces  ⇒  Complete metric spaces 
⇓  ⇓  ⇓  
inner product spaces  ⇒  normed spaces  ⇒  metric spaces. 
How can we tell if a given norm comes from an inner product?
⎪⎪ ⎪⎪  x+y  ⎪⎪ ⎪⎪  ^{2}+  ⎪⎪ ⎪⎪  x−y  ⎪⎪ ⎪⎪  ^{2}=2  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  ^{2}+2  ⎪⎪ ⎪⎪  y  ⎪⎪ ⎪⎪  ^{2}. (11) 
Proof. Just by linearity of inner product:
⟨ x+y,x+y ⟩+⟨ x−y,x−y ⟩=2⟨ x,x ⟩+2⟨ y,y ⟩, 
because the cross terms cancel out.

Divide and rule!
Old but still much used recipe
To study Hilbert spaces we may use the traditional mathematical technique of analysis and synthesis: we split the initial Hilbert spaces into smaller and probably simpler subsets, investigate them separately, and then reconstruct the entire picture from these parts.
As known from the linear algebra, a linear subspace is a subset of a linear space is its subset, which inherits the linear structure, i.e. possibility to add vectors and multiply them by scalars. In this course we need also that subspaces inherit topological structure (coming either from a norm or an inner product) as well.
We also wish that the both inhered structures (linear and topological) should be in agreement, i.e. the subspace should be complete. Such inheritance is linked to the property be closed.
A subspace need not be closed—for example the sequence
x=(1, 1/2, 1/3, 1/4, …)∈ l_{2} because ∑1/k^{2} < ∞ 
and x_{n}=(1, 1/2,…, 1/n, 0, 0,…)∈ c_{00} converges to x thus x∈ c_{00} ⊂ l_{2}.
Proof.
⎪⎪ ⎪⎪  (x_{n}+y_{n})−(x+y)  ⎪⎪ ⎪⎪  ≤  ⎪⎪ ⎪⎪  x_{n}−x  ⎪⎪ ⎪⎪  +  ⎪⎪ ⎪⎪  y_{n}−y  ⎪⎪ ⎪⎪  → 0, 
Hence c_{00} is an incomplete inner product space, with inner product ⟨ x,y ⟩=∑_{1}^{∞}x_{k} ȳ_{k} (this is a finite sum!) as it is not closed in l_{2}.
Similarly C[0,1] with inner product norm f=(∫_{0}^{1}  f(t) ^{2} dt)^{1/2} is incomplete—take the large space X of functions continuous on [0,1] except for a possible jump at 1/2 (i.e. left and right limits exists but may be unequal and f(1/2)=lim_{t→1/2+} f(t). Then the sequence of functions defined on Figure 4(a) has the limit shown on Figure 4(b) since:
⎪⎪ ⎪⎪  f−f_{n}  ⎪⎪ ⎪⎪  = 
 ⎪ ⎪  f−f_{n}  ⎪ ⎪  ^{2} dt < 
 → 0. 
Obviously f∈C[0,1]∖C[0,1].
Similarly the space C[a,b] is incomplete for any a<b if equipped by the inner product and the corresponding norm:

It is practical to realise L_{2}[a,b] as a certain space of “functions” with the inner product defined via an integral. There are several ways to do that and we mention just two:
f(t)=  ⎧ ⎨ ⎩ 

⟨ f_{1},f_{2} ⟩=  ∫ 
 f_{1}(z) f_{2}(z)e 
 dz. 
Proof. Take a Cauchy sequence x^{(n)}∈l_{2}, where x^{(n)}=(x_{1}^{(n)}, x_{2}^{(n)}, x_{3}^{(n)}, … ). Our proof will have three steps: identify the limit x; show it is in l_{2}; show x^{(n)}→ x.
⎪ ⎪  x_{k}^{(n)}−x_{k}^{(m)}  ⎪ ⎪  ≤  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  x_{k}^{(n)}−x_{k}^{(m)}  ⎪ ⎪  ^{2}  ⎞ ⎟ ⎟ ⎠ 
 =  ⎪⎪ ⎪⎪  x^{(n)}−x^{(m)}  ⎪⎪ ⎪⎪  → 0. 
 ⎪ ⎪  x_{k}^{(n)}−x_{k}^{(m)}  ⎪ ⎪  ^{2} ≤  ⎪⎪ ⎪⎪  x^{(n)}−x^{(m)}  ⎪⎪ ⎪⎪  ^{2}<є^{2}. 
Consequently l_{2} is complete.
All good things are covered by a thick layer of chocolate (well, if something is not yet–it certainly will)
As was explained into introduction 2, we describe “internal” properties of a vector through its relations to other vectors. For a detailed description we need sufficiently many external reference points.
Let A be a subset (finite or infinite) of a normed space V. We may wish to upgrade it to a linear subspace in order to make it subject to our theory.
Proof. Clearly Lin(A) is a closed subspace containing A thus it should contain CLin(A). Also Lin(A)⊂ CLin(A) thus Lin(A)⊂ CLin(A)=CLin(A). Therefore Lin(A)= CLin(A).
Consequently CLin(A) is the set of all limiting points of finite
linear combination of elements of A.
The following simple result will be used later many times without comments.
 ⟨ x_{n},y_{n} ⟩=⟨ 
 x_{n}, 
 y_{n} ⟩. 
Proof. Obviously by the Cauchy–Schwarz inequality:

since x_{n}−x→ 0, y_{n}−y→ 0, and y_{n} is bounded.
Pythagoras is forever!
The catchphrase from TV commercial of Hilbert Spaces course
As was mentioned in the introduction the Hilbert spaces is an analog of our 3D Euclidean space and theory of Hilbert spaces similar to plane or space geometry. One of the primary result of Euclidean geometry which still survives in high school curriculum despite its continuous nasty degeometrisation is Pythagoras’ theorem based on the notion of orthogonality^{1}.
So far we was concerned only with distances between points. Now we would like to study angles between vectors and notably right angles. Pythagoras’ theorem states that if the angle C in a triangle is right then c^{2}=a^{2}+b^{2}, see Figure 5 .
It is a very mathematical way of thinking to turn this property of right angles into their definition, which will work even in infinite dimensional Hilbert spaces.
Look for a triangle, or even for a right triangle
A universal advice in solving problems from elementary geometry.
In inner product spaces it is even more convenient to give a definition of orthogonality not from Pythagoras’ theorem but from an equivalent property of inner product.
An orthogonal sequence (or orthogonal system) e_{n} (finite or infinite) is one in which e_{n} ⊥ e_{m} whenever n≠ m.
An orthonormal sequence (or orthonormal system) e_{n} is an orthogonal sequence with e_{n}=1 for all n.
⟨ e_{n},e_{m} ⟩= 

 e^{int}e^{−imt}dt =  ⎧ ⎨ ⎩ 
 (15) 
⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ 
 a_{k} e_{k}  ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪  ^{2}=⟨ 
 a_{k} e_{k}, 
 a_{k} e_{k} ⟩= 
 ⎪ ⎪  a_{k}  ⎪ ⎪  ^{2}. 
Proof. A oneline calculation.
The following theorem provides an important property of Hilbert spaces
which will be used many times. Recall, that a subset K of a linear
space V is convex if for all x,
y∈ K and λ∈ [0,1] the point λ x
+(1−λ)y is also in K. Particularly any subspace is convex
and any unit ball as well (see Exercise 1).
Proof. Let d=inf_{y∈ K} d(x,y), where d(x,y)—the distance coming from the norm x=√⟨ x,x ⟩ and let y_{n} a sequence points in K such that lim_{n→ ∞}d(x,y_{n})=d. Then y_{n} is a Cauchy sequence. Indeed from the parallelogram identity for the parallelogram generated by vectors x−y_{n} and x−y_{m} we have:
⎪⎪ ⎪⎪  y_{n}−y_{m}  ⎪⎪ ⎪⎪  ^{2}=2  ⎪⎪ ⎪⎪  x−y_{n}  ⎪⎪ ⎪⎪  ^{2}+2  ⎪⎪ ⎪⎪  x−y_{m}  ⎪⎪ ⎪⎪  ^{2}−  ⎪⎪ ⎪⎪  2x−y_{n}−y_{m}  ⎪⎪ ⎪⎪  ^{2}. 
Note that 2x−y_{n}−y_{m}^{2}=4x−y_{n}+y_{m}/2^{2}≥ 4d^{2} since y_{n}+y_{m}/2∈ K by its convexity. For sufficiently large m and n we get x−y_{m}^{2}≤ d +є and x−y_{n}^{2}≤ d +є, thus y_{n}−y_{m}≤ 4(d^{2}+є)−4d^{2}=4є, i.e. y_{n} is a Cauchy sequence.
Let y be the limit of y_{n}, which exists by the completeness of H, then y∈ K since K is closed. Then d(x,y)=lim_{n→ ∞}d(x,y_{n})=d. This show the existence of the nearest point. Let y′ be another point in K such that d(x,y′)=d, then the parallelogram identity implies:
⎪⎪ ⎪⎪  y−y′  ⎪⎪ ⎪⎪  ^{2}=2  ⎪⎪ ⎪⎪  x−y  ⎪⎪ ⎪⎪  ^{2}+2  ⎪⎪ ⎪⎪  x−y′  ⎪⎪ ⎪⎪  ^{2}−  ⎪⎪ ⎪⎪  2x−y−y′  ⎪⎪ ⎪⎪  ^{2}≤ 4d^{2}−4d^{2}=0. 
This shows the uniqueness of the nearest point.
Liberte, Egalite, Fraternite!
A longstanding ideal approximated in the real life by something completely different
For the case then a convex subset is a subspace we could characterise the nearest point in the term of orthogonality.
Proof. Let z is the nearest point to x existing by the previous Theorem. We claim that x−z orthogonal to any vector in M, otherwise there exists y∈ M such that ⟨ x−z,y ⟩≠ 0. Then

if є is chosen to be small enough and such that є ℜ⟨ x−z,y ⟩ is positive, see Figure 6(i). Therefore we get a contradiction with the statement that z is closest point to x.
On the other hand if x−z is orthogonal to all vectors in H_{1} then particularly (x−z)⊥ (z−y) for all y∈ H_{1}, see Figure 6(ii). Since x−y=(x−z)+(z−y) we got by the Pythagoras’ theorem:
⎪⎪ ⎪⎪  x−y  ⎪⎪ ⎪⎪  ^{2}=  ⎪⎪ ⎪⎪  x−z  ⎪⎪ ⎪⎪  ^{2} +  ⎪⎪ ⎪⎪  z−y  ⎪⎪ ⎪⎪  ^{2}. 
So x−y^{2}≥ x−z^{2} and the are equal if and only if z=y.
Consider now a basic case of approximation: let x∈ H be fixed and e_{1}, …, e_{n} be orthonormal and denote H_{1}=Lin{e_{1},…,e_{n}}. We could try to approximate x by a vector y=λ_{1} e_{1}+⋯ +λ_{n} e_{n} ∈ H_{1}.
Proof. Let z=∑_{1}^{n}⟨ x,e_{i} ⟩ e_{i}, then ⟨ x−z,e_{i} ⟩=⟨ x,e_{i} ⟩−⟨ z,e_{i} ⟩=0. By the previous Theorem z is the nearest point to x.
z=⟨ x,e_{1} ⟩e_{1}+⟨ x,e_{2} ⟩e_{2}=  ⎛ ⎜ ⎜ ⎝ 
 ,− 
 ,0  ⎞ ⎟ ⎟ ⎠  +  ⎛ ⎜ ⎜ ⎝ 
 , 
 ,− 
 ⎞ ⎟ ⎟ ⎠  =  ⎛ ⎜ ⎜ ⎝ 
 ,− 
 ,− 
 ⎞ ⎟ ⎟ ⎠  . 
e_{0}= 
 , e_{1}= 
 e^{it}, e_{−1}= 
 e^{−it}. 


⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  ^{2}≥ 
 ⎪ ⎪  ⟨ x,e_{i} ⟩  ⎪ ⎪  ^{2}. 
Proof. Let z= ∑_{1}^{n}⟨ x,e_{i} ⟩e_{i} then x−z⊥ e_{i} for all i therefore by Exercise 4 x−z⊥ z. Hence:

—Did you say “rice and fish for them”?
A student question
When (e_{i}) is orthonormal we call ⟨ x,e_{n} ⟩ the nth Fourier coefficient of x (with respect to (e_{i}), naturally).
Proof. Necessity: Let x_{k}=∑_{1}^{k} λ_{n} e_{n} and x=lim_{k→ ∞} x_{k}. So ⟨ x,e_{n} ⟩=lim_{k→ ∞}⟨ x_{k},e_{n} ⟩=λ_{n} for all n. By the Bessel’s inequality for all k
⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  ^{2}≥ 
 ⎪ ⎪  ⟨ x,e_{n} ⟩  ⎪ ⎪  ^{2}= 
 ⎪ ⎪  λ_{n}  ⎪ ⎪  ^{2}, 
hence ∑_{1}^{k}  λ_{n} ^{2} converges and the sum is at most x^{2}.
Sufficiency: Consider x_{k}−x_{m}=∑_{m}^{k} λ_{n} e_{n}=(∑_{m}^{k}  λ_{n} ^{2})^{1/2} for k>m. Since ∑_{m}^{k}  λ_{n} ^{2} converges x_{k} is a Cauchy sequence in H and thus has a limit x. By the Pythagoras’ theorem x_{k}^{2}=∑_{1}^{k}  λ_{n} ^{2} thus for k→ ∞ x^{2}=∑_{1}^{∞} λ_{n} ^{2} by the Lemma about inner product limit.
Observation: the closed linear span
of an orthonormal sequence in any Hilbert space looks like
l_{2}, i.e. l_{2} is a universal model for a
Hilbert space.
By Bessel’s inequality and the Riesz–Fisher theorem we know that the series ∑_{1}^{∞}⟨ x,e_{i} ⟩ e_{i} converges for any x∈ H. What is its limit?
Let y=x− ∑_{1}^{∞}⟨ x,e_{i} ⟩ e_{i}, then
⟨ y,e_{k} ⟩=⟨ x,e_{k} ⟩− 
 ⟨ x,e_{i} ⟩ ⟨ e_{i},e_{k} ⟩=⟨ x,e_{k} ⟩−⟨ x,e_{k} ⟩ =0 for all k. (16) 
A complete orthonormal sequence is also called orthonormal basis in H.
x= 
 ⟨ x,e_{n} ⟩e_{n} and  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  ^{2}= 
 ⎪ ⎪  ⟨ x,e_{n} ⟩  ⎪ ⎪  ^{2}. 
Proof. By the Riesz–Fisher theorem, equation (16) and definition of orthonormal basis.
There are constructive existence theorems in mathematics.
An example of pure existence statement
Natural questions are: Do orthonormal sequences always exist? Could we construct them?
Lin{x_{1},x_{2},…,x_{n}}=Lin{e_{1},e_{2},…,e_{n}}, for all n. 
Proof. We give an explicit algorithm working by induction. The base of induction: the first vector is e_{1}=x_{1}/x_{1}. The step of induction: let e_{1}, e_{2}, …, e_{n} are already constructed as required. Let y_{n+1}=x_{n+1}−∑_{i=1}^{n}⟨ x_{n+1},e_{i} ⟩e_{i}. Then by (16) y_{n+1} ⊥ e_{i} for i=1,…,n. We may put e_{n+1}=y_{n+1}/y_{n+1} because y_{n+1}≠ 0 due to linear independence of x_{k}’s. Also

So (e_{i}) are orthonormal sequence.

⟨ f,g ⟩= 
 f(t) 
 dt. (17) 
⟨ f,g ⟩= 
 f(t) 

 (18) 
⟨ f,g ⟩= 
 f(t) 
 e^{−t} dt. 
See Figure 8 for the five first Legendre and Chebyshev polynomials. Observe the difference caused by the different inner products (17) and (18). On the other hand note the similarity in oscillating behaviour with different “frequencies”.
Another natural question is: When is an orthonormal sequence complete?
Proof. Clearly 1 implies 2 because x=∑_{1}^{∞}⟨ x,e_{n} ⟩e_{n} in CLin((e_{n})) and x^{2}=∑_{1}^{∞}⟨ x,e_{n} ⟩e_{n} by Theorem 15.
If (e_{n}) is not complete then there exists x∈ H such that x≠ 0 and ⟨ x,e_{k} ⟩ for all k, so 3 fails, consequently 3 implies 1.
Finally if ⟨ x,e_{k} ⟩=0 for all k then ⟨ x,y ⟩=0 for all y∈Lin((e_{n})) and moreover for all y∈CLin((e_{n})), by the Lemma on continuity of the inner product. But then x∉CLin((e_{n})) and 2 also fails because ⟨ x,x ⟩=0 is not possible. Thus 2 implies 1.
x= 
 ⟨ x,e_{n} ⟩e_{n} and  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  ^{2}= 
 ⎪ ⎪  ⟨ x,e_{n} ⟩  ⎪ ⎪  ^{2}. 
Proof. Take a countable dense set (x_{k}), then H=CLin((x_{k})), delete all vectors which are a linear combinations of preceding vectors, make orthonormalisation by Gram–Schmidt the remaining set and apply the previous proposition.
Most pleasant compliments are usually orthogonal to our real qualities.
An advise based on observations
M^{⊥}={x∈ V: ⟨ x,m ⟩=0 ∀ m∈ M}. 
Proof. Clearly M^{⊥} is a subspace of H because x, y∈ M^{⊥} implies ax+by∈ M^{⊥}:
⟨ ax+by,m ⟩= a⟨ x,m ⟩+ b⟨ y,m ⟩=0. 
Also if all x_{n}∈ M^{⊥} and x_{n}→ x then x∈ M^{⊥} due to inner product limit Lemma.
Proof. For a given x there exists the unique closest point m in M by the Theorem on nearest point and by the Theorem on perpendicular (x−m)⊥ y for all y∈ M.
So x= m + (x−m)= m+n with m∈ M and n∈ M^{⊥}. The identity x^{2}=m^{2}+n^{2} is just Pythagoras’ theorem and M∩ M^{⊥}={0} because null vector is the only vector orthogonal to itself.
Finally (M^{⊥})^{⊥}=M. We have H=M⊕ M^{⊥}=(M^{⊥})^{⊥}⊕ M^{⊥}, for any x∈(M^{⊥})^{⊥} there is a decomposition x=m+n with m∈ M and n∈ M^{⊥}, but then n is orthogonal to itself and therefore is zero.
P_{M}^{2}=P_{M}, kerP_{M}=M^{⊥}, P_{M⊥}=I−P_{M}. (19) 
Proof. Let us define P_{M}(x)=m where x=m+n is the decomposition from the previous theorem. The linearity of this operator follows from the fact that both M and M^{⊥} are linear subspaces. Also P_{M}(m)=m for all m∈ M and the image of P_{M} is M. Thus P_{M}^{2}=P_{M}. Also if P_{M}(x)=0 then x⊥ M, i.e. kerP_{M}=M^{⊥}. Similarly P_{M⊥}(x)=n where x=m+n and P_{M}+P_{M⊥}=I.
 a_{k} e_{k} = 
 a_{k} e_{k} + 
 a_{k} e_{k}. 
All bases are equal, but some are more equal then others.
As we saw already any separable Hilbert space posses an orthonormal basis (infinitely many of them indeed). Are they equally good? This depends from our purposes. For solution of differential equation which arose in mathematical physics (wave, heat, Laplace equations, etc.) there is a proffered choice. The fundamental formula: d/dx e^{ax}=ae^{ax} reduces the derivative to a multiplication by a. We could benefit from this observation if the orthonormal basis will be constructed out of exponents. This helps to solve differential equations as was demonstrated in Subsection 1.2.
7.40pm Fourier series: Episode II
Today’s TV listing
Now we wish to address questions stated in Remark 9. Let us consider the space L_{2}[−π,π]. As we saw in Example 3 there is an orthonormal sequence e_{n}(t)=(2π)^{−1/2}e^{int} in L_{2}[−π,π]. We will show that it is an orthonormal basis, i.e.
f(t)∈ L_{2}[−π,π] ⇔ f(t)= 
 ⟨ f,e_{k} ⟩e_{k}(t), 
with convergence in L_{2} norm. To do this we show that CLin{e_{k}:k∈ℤ}=L_{2}[−π,π].
Let CP[−π,π] denote the continuous functions f on [−π,π] such that f(π)=f(−π). We also define f outside of the interval [−π,π] by periodicity.
Proof. Let f∈L_{2}[−π,π]. Given є>0 there exists g∈ C[−π,π] such that f−g<є/2. Form continuity of g on a compact set follows that there is M such that  g(t) <M for all t∈[−π,π].
We can now replace g by periodic g′, which coincides with g on [−π,π−δ] for an arbitrary δ>0 and has the same bounds:  g′(t) <M, see Figure 9. Then
⎪⎪ ⎪⎪  g−g′  ⎪⎪ ⎪⎪  _{2}^{2}= 
 ⎪ ⎪  g(t)−g′(t)  ⎪ ⎪  ^{2} dt ≤ (2M)^{2}δ. 
So if δ<є^{2}/(4M)^{2} then g−g′<є/2 and f−g′<є.
Now if we could show that CLin{e_{k}: k ∈ ℤ} includes
CP[−π,π] then it also includes
L_{2}[−π,π].
f_{n}= 
 ⟨ f,e_{k} ⟩ e_{k} , for n=0,1,2,… (20) 
We want to show that f−f_{n}_{2}→ 0. To this end we define nth Fejér sum by the formula
F_{n}= 
 , (21) 
and show that
⎪⎪ ⎪⎪  F_{n}−f  ⎪⎪ ⎪⎪  _{∞} → 0. 
Then we conclude
⎪⎪ ⎪⎪  F_{n}−f  ⎪⎪ ⎪⎪  _{2}=  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  F_{n}(t)−f  ⎪ ⎪  ^{2}  ⎞ ⎟ ⎟ ⎠ 
 ≤ (2π)^{1/2}  ⎪⎪ ⎪⎪  F_{n}−f  ⎪⎪ ⎪⎪  _{∞}→ 0. 
Since F_{n}∈Lin((e_{n})) then f∈CLin((e_{n})) and hence f=∑_{−∞}^{∞}⟨ f,e_{k} ⟩e_{k}.
It took 19 years of his life to prove this theorem

Proof. From notation (20):

Then from (21):

which finishes the proof.
K_{n}(t)= 

 , for t∉2πℤ. (24) 
Proof. Let z=e^{it}, then:

by switch from counting in rows to counting in columns in Table 1.
Let w=e^{it/2}, i.e. z=w^{2}, then
if w≠ ± 1. For the value of K_{n}(0) we substitute w=1 into (25).
The first eleven Fejér kernels are shown on Figure 10, we could observe that:
Proof. The first property immediately follows from the explicit formula (24). In contrast the second property is easier to deduce from expression with double sum (23):

since the formula (15).
Finally if  t >δ then sin^{2}(t/2)≥ sin^{2}(δ/2)>0 by monotonicity of sinus on [0,π/2], so:
0≤ K_{n}(t) ≤ 

implying:
0≤ 
 K_{n}(t) dt ≤ 
 → 0 as n→ 0. 
Therefore the third property follows from the squeeze rule.
Proof. Idea of the proof: if in the formula (22)
F_{n}(x)= 

 f(t) K_{n}(x−t) dt, 
t is long way from x, K_{n} is small (see Lemma 7 and Figure 10), for t near x, K_{n} is big with total “weight” 2π, so the weighted average of f(t) is near f(x).
Here are details. Using property 2 and periodicity of f and K_{n} we could express trivially
f(x)= f(x) 

 K_{n}(x−t) dt = 

 f(x) K_{n}(x−t) dt. 
Similarly we rewrite (22) as
F_{n}(x)= 

 f(t) K_{n}(x−t) dt, 
then

Given є>0 split into three intervals: I_{1}=[x−π,x−δ], I_{2}=[x−δ,x+δ], I_{3}=[x+δ,x+π], where δ is chosen such that  f(t)−f(x) <є/2 for t∈ I_{2}, which is possible by continuity of f. So
 ∫ 
 ⎪ ⎪  f(x)−f(t)  ⎪ ⎪  K_{n}(x−t) dt≤ 

 ∫ 
 K_{n}(x−t) dt < 
 . 
And

if n is sufficiently large due to property 3 of K_{n}. Hence  f(x)−F_{n}(x) <є for a large n independent of x.
We almost finished the demonstration that e_{n}(t)=(2π)^{−1/2}e^{int}
is an orthonormal basis of L_{2}[−π,π]:
 ⟨ f,e_{n} ⟩e_{n}= 
 c_{n}e^{int} where c_{n}= 
 = 

 f(t)e^{−int} dt. 
 ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪  f− 
 c_{n}e^{int}  ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪  _{2}=0. 
Proof. This follows from the previous Theorem, Lemma 1 about density of CP in L_{2}, and Theorem 15 on orthonormal basis.
The following result first appeared in the framework of L_{2}[−π,π] and only later was understood to be a general property of inner product spaces.
⟨ f,g ⟩= 
 f(t) 
 dt=2π 
 c_{n} 
 . (26) 
More generally if f and g are two vectors of a Hilbert space H with an orthonormal basis (e_{n})_{−∞}^{∞} then
⟨ f,g ⟩= 
 c_{n} 
 , where c_{n}=⟨ f,e_{n} ⟩, d_{n}=⟨ g,e_{n} ⟩, 
are the Fourier coefficients of f and g.
Proof. In fact we could just prove the second, more general, statement—the first one is its particular realisation. Let f_{n}=∑_{k=−n}^{n} c_{k}e_{k} and g_{n}=∑_{k=−n}^{n} d_{k}e_{k} will be partial sums of the corresponding Fourier series. Then from orthonormality of (e_{n}) and linearity of the inner product:
⟨ f_{n},g_{n} ⟩=⟨ 
 c_{k}e_{k}, 
 d_{k}e_{k} ⟩= 
 c_{k} 
 . 
This formula together with the facts that f_{k}→ f and g_{k}→ g (following from Corollary 9) and Lemma about continuity of the inner product implies the assertion.
Proof. The necessity, i.e. implication f∈L_{2} ⇒ ⟨ f,f ⟩=f^{2}=2π∑ c_{k} ^{2} , follows from the previous Theorem. The sufficiency follows by Riesz–Fisher Theorem.
[Wf](x)=⟨ f,e_{x} ⟩ (27) 
Heat and noise but not a fire?
Answer:
We are going to provide now few examples which demonstrate the importance of the Fourier series in many questions. The first two (Example 14 and Theorem 15) belong to pure mathematics and last two are of more applicable nature.
⟨ f,e_{n} ⟩= 
 te^{−int} dt=  ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 
 (check!), 
⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{2}^{2}= 
 t^{2} dt= 
 . 
⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{2}^{2}=2π 
 ⎪ ⎪ ⎪ ⎪ 
 ⎪ ⎪ ⎪ ⎪  ^{2}=4π 

 . 

 = 
 . 
Here is another important result.
Proof. Change variable: t=2π(x−a+b/2)/(b−a) this maps x∈[a,b] onto t∈[−π,π]. Let P denote the subspace of polynomials in C[−π,π]. Then e^{int}∈$P_^$ for any n∈ℤ since Taylor series converges uniformly in [−π,π]. Consequently P contains the closed linear span in (supremum norm) of e^{int}, any n∈ℤ, which is CP[−π,π] by the Fejér theorem. Thus $P_^$⊇ CP[−π,π] and we extend that to nonperiodic function as follows (why we could not make use of Lemma 1 here, by the way?).
For any f∈C[−π,π] let λ=(f(π)−f(−π))/(2π) then f_{1}(t)=f(t)−λ t∈ CP[−π,π] and could be approximated by a polynomial p_{1}(t) from the above discussion. Then f(t) is approximated by the polynomial p(t)=p_{1}(t)+λ t.
It is easy to see, that the rôle of exponents e^{int} in the
above prove is rather modest: they can be replaced by any functions
which has a Taylor expansion. The real glory of the Fourier analysis
is demonstrated in the two following examples.
Let we have a rod of the length 2π. The temperature at its point x∈[−π,π] and a moment t∈[0,∞) is described by a function u(t,x) on [0,∞)×[−π,π]. The mathematical equation describing a dynamics of the temperature distribution is:
 = 
 or, equivalently,  ⎛ ⎝  ∂_{t}−∂_{x}^{2}  ⎞ ⎠  u(t,x)=0. (28) 
For any fixed moment t_{0} the function u(t_{0},x) depends only from x∈[−π,π] and according to Corollary 9 could be represented by its Fourier series:
u(t_{0},x)= 
 ⟨ u,e_{n} ⟩e_{n}= 
 c_{n}(t_{0})e^{inx}, 
where
c_{n}(t_{0})= 
 = 

 u(t_{0},x)e^{−inx} dx, 
with Fourier coefficients c_{n}(t_{0}) depending from t_{0}. We substitute that decomposition into the heat equation (28) to receive:

Since function e^{inx} form a basis the last equation (29) holds if and only if
c′_{n}(t)+n^{2}c_{n}(t)=0 for all n and t. (30) 
Equations from the system (30) have general solutions of the form:
c_{n}(t)=c_{n}(0)e^{−n2t} for all t∈[0,∞), (31) 
producing a general solution of the heat equation (28) in the form:
u(t,x)= 
 c_{n}(0)e^{−n2t}e^{inx} = 
 c_{n}(0)e^{−n2t+inx}, (32) 
where constant c_{n}(0) could be defined from boundary condition. For example, if it is known that the initial distribution of temperature was u(0,x)=g(x) for a function g(x)∈L_{2}[−π,π] then c_{n}(0) is the nth Fourier coefficient of g(x).
The general solution (32) helps produce both the analytical study of the heat equation (28) and numerical simulation. For example, from (32) obviously follows that
The example of numerical simulation for the initial value problem with g(x)=2cos(2*u) + 1.5sin(u). It is clearly illustrate our above conclusions.
The earliest observations are that
The musical tone, say G5, performed on different instruments clearly has something in common and different, see Figure 13 for comparisons. The decomposition into the pure harmonics, i.e. finding Fourier coefficient for the signal, could provide the complete characterisation, see Figure 14.
The Fourier analysis tells that:
The Fourier analysis is very useful in the signal processing and is indeed the fundamental tool. However it is not universal and has very serious limitations. Consider the simple case of the signals plotted on the Figure 15(a) and (b). They are both made out of same two pure harmonics:
This appear to be two very different signals. However the Fourier performed over the whole interval does not seems to be very different, see Figure 15(c). Both transforms (drawn in bluegreen and pink) have two major pikes corresponding to the pure frequencies. It is not very easy to extract differences between signals from their Fourier transform (yet this should be possible according to our study).
Even a better picture could be obtained if we use windowed Fourier transform, namely use a sliding “window” of the constant width instead of the entire interval for the Fourier transform. Yet even better analysis could be obtained by means of wavelets already mentioned in Remark 12 in connection with Plancherel’s formula. Roughly, wavelets correspond to a sliding window of a variable size—narrow for high frequencies and wide for low.
Everything has another side
Orthonormal basis allows to reduce any question on Hilbert space to a question on sequence of numbers. This is powerful but sometimes heavy technique. Sometime we need a smaller and faster tool to study questions which are represented by a single number, for example to demonstrate that two vectors are different it is enough to show that there is a unequal values of a single coordinate. In such cases linear functionals are just what we needed.
–Is it functional?
–Yes, it works!
α(ax+by)=aα(x)+bα(y), for all x,y∈ V and a,b∈ℂ. 
We will not consider any functionals but linear, thus below functional always means linear functional.
Proof. Implication 1 ⇒ 2 is trivial.
Show 2 ⇒ 3. By the definition of continuity: for any є>0 there exists δ>0 such that v<δ implies  α(v)−α(0) <є . Take є=1 then  α(δ x) <1 for all x with norm less than 1 because δ x< δ. But from linearity of α the inequality  α(δ x) <1 implies  α(x) <1/δ<∞ for all x≤ 1.
3 ⇒ 1. Let mentioned supremum be M. For any x, y∈ V such that x≠ y vector (x−y)/x−y has norm 1. Thus  α ((x−y)/x−y) <M. By the linearity of α this implies that  α (x)−α(y) <Mx−y. Thus α is continuous.
⎪⎪ ⎪⎪  α  ⎪⎪ ⎪⎪  = 
 ⎪ ⎪  α(x)  ⎪ ⎪  . (33) 
Proof. Due to Exercise 6 we only need to show that X^{*} is complete. Let (α_{n}) be a Cauchy sequence in X^{*}, then for any x∈ X scalars α_{n}(x) form a Cauchy sequence, since  α_{m}(x)−α_{n}(x) ≤α_{m}−α_{n}·x. Thus the sequence has a limit and we define α by α(x)=lim_{n→∞}α_{n}(x). Clearly α is a linear functional on X. We should show that it is bounded and α_{n}→ α. Given є>0 there exists N such that α_{n}−α_{m}<є for all n, m≥ N. If x≤ 1 then  α_{n}(x)−α_{m}(x) ≤ є, let m→∞ then  α_{n}(x)−α(x) ≤ є, so
⎪ ⎪  α(x)  ⎪ ⎪  ≤  ⎪ ⎪  α_{n}(x)  ⎪ ⎪  +є≤  ⎪⎪ ⎪⎪  α_{n}  ⎪⎪ ⎪⎪  + є, 
i.e. α is finite and α_{n}−α≤ є, thus α_{n}→α.
Study one and get any other for free!
Hilbert spaces sale
Proof. Uniqueness: if ⟨ x,y ⟩=⟨ x,y′ ⟩ ⇔ ⟨ x,y−y′ ⟩=0 for all x∈ H then y−y′ is selforthogonal and thus is zero (Exercise 1).
Existence: we may assume that α≢0 (otherwise take y=0), then M=kerα is a closed proper subspace of H. Since H=M⊕ M^{⊥}, there exists a nonzero z∈ M^{⊥}, by scaling we could get α(z)=1. Then for any x∈ H:
x=(x−α(x)z)+α(x)z, with x−α(x)z∈ M, α(x)z∈ M^{⊥}. 
Because ⟨ x,z ⟩=α(x)⟨ z,z ⟩=α(x)z^{2} for any x∈ H we set y=z/z^{2}.
Equality of the norms α_{H*}=y_{H} follows from the Cauchy–Bunyakovskii–Schwarz inequality in the form α(x)≤ x·y and the identity α(y/y)=y.
⎪⎪ ⎪⎪  α  ⎪⎪ ⎪⎪  =  ⎪⎪ ⎪⎪  t^{2}  ⎪⎪ ⎪⎪  =  ⎛ ⎜ ⎜ ⎝ 
 (t^{2})^{2} dt  ⎞ ⎟ ⎟ ⎠ 
 = 
 . 
All the space’s a stage,
and all functionals and operators
merely players!
All our previous considerations were only a preparation of the stage and now the main actors come forward to perform a play. The vectors spaces are not so interesting while we consider them in statics, what really make them exciting is the their transformations. The natural first steps is to consider transformations which respect both linear structure and the norm.
kerT ={x∈ X: Tx=0} Im T={y∈ Y: y=Tx, for some x∈ X}. 
As usual we are interested also in connections with the second (topological) structure:
⎪⎪ ⎪⎪  T  ⎪⎪ ⎪⎪  =sup{  ⎪⎪ ⎪⎪  Tx  ⎪⎪ ⎪⎪  _{Y}:  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  _{X}≤ 1}. (34) 
T is a bounded linear operator if T=sup{Tx: x}<∞.

Proof. Proof essentially follows the proof of similar Theorem 4.
Proof. The proof repeat proof of the Theorem 7, which is a particular case of the present theorem for Y=ℂ, see Example 3.
Proof. Clearly (ST)x=S(Tx)∈ Z, and
⎪⎪ ⎪⎪  STx  ⎪⎪ ⎪⎪  ≤  ⎪⎪ ⎪⎪  S  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  Tx  ⎪⎪ ⎪⎪  ≤  ⎪⎪ ⎪⎪  S  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  T  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  , 
which implies norm estimation if x≤1.
Proof. It is induction by n with the trivial base n=1 and the step following from the previous theorem.
ST= I_{X} and TS=I_{Y}. 
⟨ Th,k ⟩_{K}=⟨ h,T^{*}k ⟩_{H} for all h∈ H, k∈ K. 
Proof. For any fixed k∈ K the expression h:→ ⟨ Th,k ⟩_{K} defines a bounded linear functional on H. By the Riesz–Fréchet lemma there is a unique y∈ H such that ⟨ Th,k ⟩_{K}=⟨ h,y ⟩_{H} for all h∈ H. Define T^{*} k =y then T^{*} is linear:

So T^{*}(λ_{1}k_{1}+λ_{2}k_{2})=λ_{1}T^{*}k_{1}+λ_{2}T^{*}k_{2}. T^{**} is defined by ⟨ k,T^{**}h ⟩=⟨ T^{*}k,h ⟩ and the identity ⟨ T^{**}h,k ⟩=⟨ h,T^{*}k ⟩=⟨ Th,k ⟩ for all h and k shows T^{**}=T. Also:

which implies T^{*}k≤T·k, consequently T^{*}≤T. The opposite inequality follows from the identity T=T^{**}.

D(x_{1},x_{2},…)=(λ_{1} x_{1}, λ_{2} x_{2}, …). 
D^{*} (x_{1},x_{2},…)=(λ_{1} x_{1}, λ_{2} x_{2}, …), 
⎪⎪ ⎪⎪  T  ⎪⎪ ⎪⎪  = 
 ⎪ ⎪  ⟨ Tx,x ⟩  ⎪ ⎪  . 
Proof. If Tx=0 for all x∈ H, both sides of the identity are 0. So we suppose that ∃ x∈ H for which Tx≠ 0.
We see that  ⟨ Tx,x ⟩ ≤ Txx ≤ Tx^{2}, so sup_{x =1}  ⟨ Tx,x ⟩ ≤ T. To get the inequality the other way around, we first write s:=sup_{x =1}  ⟨ Tx,x ⟩ . Then for any x∈ H, we have  ⟨ Tx,x ⟩ ≤ sx^{2}.
We now consider
⟨ T(x+y),x+y ⟩ =⟨ Tx,x ⟩ +⟨ Tx,y ⟩+⟨ Ty,x ⟩ +⟨ Ty,y ⟩ = ⟨ Tx,x ⟩ +2ℜ ⟨ Tx,y ⟩ +⟨ Ty,y ⟩ 
(because T being Hermitian gives ⟨ Ty,x ⟩=⟨ y,Tx ⟩ =⟨ Tx,y ⟩) and, similarly,
⟨ T(x−y),x−y ⟩ = ⟨ Tx,x ⟩ −2ℜ ⟨ Tx,y ⟩ +⟨ Ty,y ⟩. 
Subtracting gives

by the parallelogram identity.
Now, for x∈ H such that Tx≠ 0, we put y=Tx^{−1}x Tx. Then y =x and when we substitute into the previous inequality, we get
4  ⎪⎪ ⎪⎪  Tx  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  =4ℜ⟨ Tx,y ⟩ ≤ 4s  ⎪⎪ ⎪⎪  x^{2}  ⎪⎪ ⎪⎪  , 
So Tx≤ sx and it follows that T≤ s, as required.
Proof. 1⇒2. Clearly unitarity of operator implies its invertibility and hence surjectivity. Also
⎪⎪ ⎪⎪  Ux  ⎪⎪ ⎪⎪  ^{2}=⟨ Ux,Ux ⟩=⟨ x,U^{*}Ux ⟩=⟨ x,x ⟩=  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  ^{2}. 
2⇒3. Using the polarisation identity (cf. polarisation in equation (12)):

Take T=U^{*}U and T=I, then

3⇒1. Indeed ⟨ U^{*}U x,y ⟩=⟨ x,y ⟩ implies ⟨ (U^{*}U−I)x,y ⟩=0 for all x,y∈ H, then U^{*}U=I. Since U should be invertible by surjectivity we see that U^{*}=U^{−1}.
Beware of ghosts^{2} in this area!
As we saw operators could be added and multiplied each other, in some sense they behave like numbers, but are much more complicated. In this lecture we will associate to each operator a set of complex numbers which reflects certain (unfortunately not all) properties of this operator.
The analogy between operators and numbers become even more deeper since we could construct functions of operators (called functional calculus) in a way we build numeric functions. The most important functions of this sort is called resolvent (see Definition 5). The methods of analytical functions are very powerful in operator theory and students may wish to refresh their knowledge of complex analysis before this part.
An eigenvalue of operator T∈B(H) is a complex number λ such that there exists a nonzero x∈ H, called eigenvector with property Tx=λ x, in other words x∈ker(T−λ I).
In finite dimensions T−λ I is invertible if and only if λ is not an eigenvalue. In infinite dimensions it is not the same: the right shift operator S is not invertible but 0 is not its eigenvalue because Sx=0 implies x=0 (check!).
ρ (T)={λ∈ℂ: T−λ I is invertible}. 
σ(T)={λ∈ℂ: T−λ I is not invertible}. 
Even this example demonstrates that spectrum does not provide a complete description for operator even in finitedimensional case. For example, both operators in ℂ^{2} given by matrices (
0  0 
0  0 
) and (
0  0 
1  0 
) have a single point spectrum {0}, however are rather different. The situation became even worst in the infinite dimensional spaces.
For the proof we will need several Lemmas.
(I−A)^{−1}=I+A+A^{2}+A^{3}+…= 
 A^{k}. (35) 
Proof. Define the sequence of operators B_{n}=I+A+⋯+A^{N}—the partial sums of the infinite series (35). It is a Cauchy sequence, indeed:

for a large m. By the completeness of B(H) there is a limit, say B, of the sequence B_{n}. It is a simple algebra to check that (I−A)B_{n}=B_{n}(I−A)=I−A^{n+1}, passing to the limit in the norm topology, where A^{n+1}→ 0 and B_{n}→ B we get:
(I−A)B=B(I−A)=I ⇔ B=(I−A)^{−1}. 
R(λ,T)=(T−λ I)^{−1}. (36) 
Proof.
R(λ,T)= (T−λ I)^{−1}=− 
 λ^{−k−1}T^{k}. (37) 

R(λ,T)−R(µ,T)=(λ−µ)R(λ,T)R(µ,T) (38) 
Proof. Let us assume the opposite, σ(T)=∅ then the resolvent function R(λ,T) is well defined for all λ∈ℂ. As could be seen from the von Neumann series (37) R(λ,T)→ 0 as λ→ ∞. Thus for any vectors x, y∈ H the function f(λ)=⟨ R(λ,T)x,y) ⟩ is analytic (see Exercise 3) function tensing to zero at infinity. Then by the Liouville theorem from complex analysis R(λ,T)=0, which is impossible. Thus the spectrum is not empty.
Proof.[Proof of Theorem 3] Spectrum is nonempty by Lemma 8 and compact by Corollary 6.
The following definition is of interest.
r(T)=sup{  ⎪ ⎪  λ  ⎪ ⎪  : λ∈ σ(T)}. 
From the Lemma 1 immediately follows that r(T)≤T. The more accurate estimation is given by the following theorem.
We start from the following general lemma:
Proof. The statements follows from the observation that for any n and m=nk+l with 0≤ l≤ n we have a_{m}≤ ka_{n}+la_{1} thus, for big m we got a_{m}/m≤ a_{n}/n +la_{1}/m ≤ a_{n}/n+є.
Proof.[Proof of Theorem 11] The existence of the limit lim_{n→∞}T^{n}^{1/n} in (39) follows from the previous Lemma since by the Lemma 9 logT^{n+m}≤ logT^{n}+logT^{m}. Now we are using some results from the complex analysis. The Laurent series for the resolvent R(λ,T) in the neighbourhood of infinity is given by the von Neumann series (37). The radius of its convergence (which is equal, obviously, to r(T)) by the Hadamard theorem is exactly lim_{n→∞}T^{n}^{1/n}.
Proof. Indeed, as its known from the complex analysis the boundary of the convergence circle of a Laurent (or Taylor) series contain a singular point, the singular point of the resolvent is obviously belongs to the spectrum.
Proof. If (T−λ I)V=V(T−λ I)=I the by taking adjoints V^{*}(T^{*}−λI)=(T^{*}−λI)V^{*}=I. So λ ∈ ρ(T) implies λ∈ρ(T^{*}), using the property T^{**}=T we could invert the implication and get the statement of proposition.
Proof.
U=(T−iI)(T+iI)^{−1}. 
U−µ I=(T−iI)(T+iI)^{−1}−(λ−i)(λ+i)^{−1}I= 2i(λ+i)^{−1}(T−λ I)(T+iI)^{−1}, 
The above reduction of a selfadjoint operator to a unitary one (it
can be done on the opposite direction as well!) is an important tool
which can be applied in other questions as well, e.g. in the following
exercise.
It is not easy to study linear operators “in general” and there are many questions about operators in Hilbert spaces raised many decades ago which are still unanswered. Therefore it is reasonable to single out classes of operators which have (relatively) simple properties. Such a class of operators more closed to finite dimensional ones will be studied here.
These operators are so compact that we even can fit them in our course
Let us recall some topological definition and results.
In the finite dimensional vector spaces ℝ^{n} or ℂ^{n} there is the following equivalent definition of compactness (equivalence of 1 and 2 is known as Heine–Borel theorem):
The set of finite rank operators is denote by F(X,Y) and the set of compact operators—by K(X,Y)
We intend to show that F(X,Y)⊂K(X,Y).
Proof. The proof is given by an explicit construction. Let N=dimZ and z_{1}, z_{2}, …, z_{N} be a basis in Z. Let us define
S: l_{2}^{N} → Z by S(a_{1},a_{2},…,a_{N})= 
 a_{k} z_{k}, 
then we have an estimation of norm:

So S≤ (∑_{1}^{N} z_{k}^{2})^{1/2} and S is continuous.
Clearly S has the trivial kernel, particularly Sa>0 if a=1. By the Heine–Borel theorem the unit sphere in l_{2}^{N} is compact, consequently the continuous function a↦ ∑_{1}^{N} a_{k} z_{k} attains its lower bound, which has to be positive. This means there exists δ>0 such that a=1 implies Sa>δ , or, equivalently if z<δ then S^{−1} z<1. The later means that S^{−1}≤ δ^{−1} and boundedness of S^{−1}.
Proof. Let T∈F(X,Y), if (x_{n})_{1}^{∞} is a bounded sequence in X then ((Tx_{n})_{1}^{∞}⊂ Z=Im T is also bounded. Let S: l_{2}^{N}→ Z be a map constructed in the above Lemma. The sequence (S^{−1}T x_{n})_{1}^{∞} is bounded in l_{2}^{N} and thus has a limiting point, say a_{0}. Then Sa_{0} is a limiting point of (T x_{n})_{1}^{∞}.
There is a simple condition which allows to determine which diagonal operators are compact (particularly the identity operator I_{X}is not compact if dimX =∞):
Proof. If λ_{n}↛0 then there exists a subsequence λ_{nk} and δ>0 such that  λ_{nk} >δ for all k. Now the sequence (e_{nk}) is bounded but its image T e_{nk}=λ _{nk} e_{nk} has no convergent subsequence because for any k≠ l:
⎪⎪ ⎪⎪  λ _{nk}e_{nk}−λ _{nl}e_{nl}  ⎪⎪ ⎪⎪  = (  ⎪ ⎪  λ _{nk}  ⎪ ⎪  ^{2} +  ⎪ ⎪  λ _{nl}  ⎪ ⎪  ^{2})^{1/2}≥  √ 
 δ , 
i.e. T e_{nk} is not a Cauchy sequence, see Figure 16.
For the converse, note that if λ_{n}→ 0 then we can define a finite rank operator T_{m}, m≥ 1—m“truncation” of T by:
T_{m} e_{n} =  ⎧ ⎨ ⎩ 
 (40) 
Then obviously
(T−T_{m}) e_{n} =  ⎧ ⎨ ⎩ 

and T−T_{m}=sup_{n>m} λ_{n} → 0 if m→ ∞. All T_{m} are finite rank operators (so are compact) and T is also compact as their limit—by the next Theorem.
Proof.
Take a bounded sequence (x_{n})_{1}^{∞}. From compactness
of T_{1}  ⇒ ∃  subsequence (x_{n}^{(1)})_{1}^{∞} of (x_{n})_{1}^{∞}  s.t.  (T_{1}x_{n}^{(1)})_{1}^{∞} is convergent. 
of T_{2}  ⇒ ∃  subsequence (x_{n}^{(2)})_{1}^{∞} of (x_{n}^{(1)})_{1}^{∞}  s.t.  (T_{2}x_{n}^{(2)})_{1}^{∞} is convergent. 
of T_{3}  ⇒ ∃  subsequence (x_{n}^{(3)})_{1}^{∞} of (x_{n}^{(2)})_{1}^{∞}  s.t.  (T_{3}x_{n}^{(3)})_{1}^{∞} is convergent. 
…  …  …  …  … 
Could we find a subsequence which converges for all T_{m}
simultaneously? The first guess “take the intersection of all
above sequences (x_{n}^{(k)})_{1}^{∞}” does not work because the
intersection could be empty. The way out is provided by the
diagonal argument (see Table 2):
a subsequence (T_{m} x_{k}^{(k)})_{1}^{∞} is convergent for
all m, because at latest after the term x_{m}^{(m)} it is a
subsequence of (x_{k}^{(m)})_{1}^{∞}.
T_{1}x_{1}^{(1)} T_{1}x_{2}^{(1)} T_{1}x_{3}^{(1)} … T_{1}x_{n}^{(1)} … → a_{1}
T_{2}x_{1}^{(2)} T_{2}x_{2}^{(2)} T_{2}x_{3}^{(2)} … T_{2}x_{n}^{(2)} … → a_{2}
T_{3}x_{1}^{(3)} T_{3}x_{2}^{(3)} T_{3}x_{3}^{(3)} … T_{3}x_{n}^{(3)} … → a_{3}
… … … … … … T_{n}x_{1}^{(n)} T_{n}x_{2}^{(n)} T_{n}x_{3}^{(n)} … T_{n}x_{n}^{(n)} … → a_{n}
… … … … … … ↓
↘
a
We are claiming that a subsequence (T x_{k}^{(k)})_{1}^{∞} of (T x_{n})_{1}^{∞} is convergent as well. We use here є/3 argument (see Figure 17): for a given є>0 choose p∈ℕ such that T−T_{p}<є/3.
Because (T_{p} x_{k}^{(k)})→ 0 it is a Cauchy sequence, thus there exists n_{0}>p such that T_{p} x_{k}^{(k)}−T_{p} x_{l}^{(l)}< є/3 for all k, l>n_{0}. Then:

Thus T is compact.
A relation to compact operator is as follows.
Proof. Let T∈ B(H,K) have a convergent series ∑ T e_{n}^{2} in an orthonormal basis (e_{n})_{1}^{∞} of H. We again (see (40)) define the mtruncation of T by the formula
T_{m} e_{n} =  ⎧ ⎨ ⎩ 
 (41) 
Then T_{m}(∑_{1}^{∞}a_{k} e_{k})=∑_{1}^{m} a_{k} e_{k} and each T_{m} is a finite rank operator because its image is spanned by the finite set of vectors Te_{1}, …, Te_{n}. We claim that T−T_{m}→ 0. Indeed by linearity and definition of T_{m}:

Thus:

so T−T_{m}→ 0 and by the previous Theorem T is compact as a limit of compact operators.
⎪⎪ ⎪⎪  T  ⎪⎪ ⎪⎪  ≤  ⎛ ⎜ ⎜ ⎝ 
 ⎪⎪ ⎪⎪  (Te_{n})  ⎪⎪ ⎪⎪  ^{2}  ⎞ ⎟ ⎟ ⎠ 
 . 
Proof. Just consider difference of T and T_{0}=0 in (42)–(43).
(T f)(x)= 
 K(x,y)f(y) dy, f(y)∈L_{2}[0,1], (44) 
Proof. Let (e_{n})_{−∞}^{∞} be an orthonormal basis of L_{2}[0,1], e.g. (e^{2π i nt})_{n∈ℤ}. Let us consider the kernel K_{x}(y)=K(x,y) as a function of the argument y depending from the parameter x. Then:
(T e_{n})(x)= 
 K(x,y)e_{n}(y) dy= 
 K_{x}(y)e_{n}(y) dy= ⟨ K_{x},ē_{n} ⟩. 
So T e_{n}^{2}= ∫_{0}^{1} ⟨ K_{x},ē_{n} ⟩ ^{2} dx. Consequently:

(Tf)(x)= 
 (x−y)f(y) dy =x 
 f(y) dy − 
 yf(y) dy 

Tf= 
 ⟨ f,e_{1} ⟩e_{1}− 
 ⟨ f,e_{2} ⟩e_{2}, 
Recall from Section 6.4 that an operator T is normal if TT^{*}=T^{*}T; Hermitian (T^{*}=T) and unitary (T^{*}=T^{−1}) operators are normal.
Proof.

λ⟨ x,y ⟩=⟨ Tx,y ⟩ =⟨ x,T^{*}y ⟩=µ⟨ x,y ⟩ 
⎪⎪ ⎪⎪  Sx  ⎪⎪ ⎪⎪  ^{2}=⟨ Sx,Sx ⟩=⟨ S^{2}x,x ⟩≤  ⎪⎪ ⎪⎪  S^{2}  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  ^{2} 
Now we claim S=T^{2}. From Theorem 9 and 15 we get S=T^{*}T≤ T^{2}. On the other hand if x=1 then
⎪⎪ ⎪⎪  T^{*}T  ⎪⎪ ⎪⎪  ≥  ⎪ ⎪  ⟨ T^{*}Tx,x ⟩  ⎪ ⎪  =⟨ Tx,Tx ⟩=  ⎪⎪ ⎪⎪  Tx  ⎪⎪ ⎪⎪  ^{2} 
implies the opposite inequality S≥T^{2}. And because (T^{2m})^{*}T^{2m}=(T^{*}T)^{2m} we get the equality
⎪⎪ ⎪⎪  T^{2m}  ⎪⎪ ⎪⎪  ^{2}=  ⎪⎪ ⎪⎪  (T^{*}T)^{2m}  ⎪⎪ ⎪⎪  =  ⎪⎪ ⎪⎪  T^{*}T  ⎪⎪ ⎪⎪  ^{2m} =  ⎪⎪ ⎪⎪  T  ⎪⎪ ⎪⎪  ^{2m+1}. 
Thus:
r(T)= 
 ⎪⎪ ⎪⎪  T^{2m}  ⎪⎪ ⎪⎪  ^{1/2m}= 
 ⎪⎪ ⎪⎪  T  ⎪⎪ ⎪⎪  ^{2m+1/2m+1} =  ⎪⎪ ⎪⎪  T  ⎪⎪ ⎪⎪  . 
by the spectral radius formula (39).
0  1 
0  0 
Proof.
Proof.[Solution] Or straightforwardly assume opposite: there exist an δ>0 and infinitely many eigenvalues λ_{n} such that  λ_{n} >δ. By the previous Theorem there is an orthonormal sequence v_{n} of corresponding eigenvectors T v_{n}=λ_{n} v_{n}. Now the sequence (v_{n}) is bounded but its image T v_{n}=λ _{n} e_{n} has no convergent subsequence because for any k≠ l:
⎪⎪ ⎪⎪  λ _{k}v_{k}−λ _{l}e_{l}  ⎪⎪ ⎪⎪  = (  ⎪ ⎪  λ _{k}  ⎪ ⎪  ^{2} +  ⎪ ⎪  λ_{l}  ⎪ ⎪  ^{2})^{1/2}≥  √ 
 δ , 
i.e. T e_{nk} is not a Cauchy sequence, see Figure 16.
Proof. Assume without lost of generality that T≠ 0. Let λ∈σ(T), without lost of generality (multiplying by a scalar) λ=1.
We claim that if 1 is not an eigenvalue then there exist δ>0 such that
⎪⎪ ⎪⎪  (I−T)x  ⎪⎪ ⎪⎪  ≥ δ  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  . (46) 
Otherwise there exists a sequence of vectors (x_{n}) with unit norm such that (I−T)x_{n}→ 0. Then from the compactness of T for a subsequence (x_{nk}) there is y∈ H such that Tx_{nk} → y, then x_{n}→ y implying Ty=y and y≠ 0—i.e. y is eigenvector with eigenvalue 1.
Now we claim Im (I−T) is closed, i.e. y∈Im(I−T) implies y∈Im(I−T). Indeed, if (I−T)x_{n} → y, then there is a subsequence (x_{nk}) such that Tx_{nk}→ z implying x_{nk}→ y+z, then (I−T)(z+y)=y.
Finally I−T is injective, i.e ker(I−T)={0}, by (46). By the property 1, ker(I−T^{*})={0} as well. But because always ker(I−T^{*})=Im(I−T)^{⊥} (check!) we got surjectivity, i.e. Im(I−T)^{⊥}={0}, of I−T. Thus (I−T)^{−1} exists and is bounded because (46) implies y>δ (I−T)^{−1}y. Thus 1∉σ(T).
The existence of eigenvalue λ such that  λ =T follows from combination of Lemma 13 and Theorem 3.
Tx= 
 λ_{n} ⟨ x,e_{n} ⟩ e_{n}, for all x∈ H. (47) 
Conversely, if T is given by a formula (47) then it is compact and normal.
Proof. Suppose T≠ 0. Then by the previous Theorem there exists an eigenvalue λ_{1} such that  λ_{1} =T with corresponding eigenvector e_{1} of the unit norm. Let H_{1}=Lin(e_{1})^{⊥}. If x∈ H_{1} then
⟨ Tx,e_{1} ⟩=⟨ x,T^{*}e_{1} ⟩=⟨ x,λ_{1} e_{1} ⟩=λ_{1}⟨ x,e_{1} ⟩=0, (48) 
thus Tx∈ H_{1} and similarly T^{*} x ∈ H_{1}. Write T_{1}=T_{H1} which is again a normal compact operator with a norm does not exceeding T. We could inductively repeat this procedure for T_{1} obtaining sequence of eigenvalues λ_{2}, λ_{3}, …with eigenvectors e_{2}, e_{3}, …. If T_{n}=0 for a finite n then theorem is already proved. Otherwise we have an infinite sequence λ_{n}→ 0. Let
x= 
 ⟨ x,e_{k} ⟩e_{k} +y_{n} ⇒  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  ^{2}= 
 ⎪ ⎪  ⟨ x,e_{k} ⟩  ⎪ ⎪  ^{2} +  ⎪⎪ ⎪⎪  y_{n}  ⎪⎪ ⎪⎪  ^{2} , y_{n}∈ H_{n}, 
from Pythagoras’s theorem. Then y_{n}≤ x and T y_{n}≤ T_{n}y_{n}≤  λ_{n} x→ 0 by Lemma 3. Thus
T x = 
 ⎛ ⎜ ⎜ ⎝ 
 ⟨ x,e_{n} ⟩ Te_{n} + Ty_{n}  ⎞ ⎟ ⎟ ⎠  = 
 λ_{n}⟨ x,e_{n} ⟩ e_{n} 
Conversely, if T x = ∑_{1}^{∞}λ_{n}⟨ x,e_{n} ⟩ e_{n} then
⟨ Tx,y ⟩= 
 λ_{n}⟨ x,e_{n} ⟩ ⟨ e_{n},y ⟩ = 
 ⟨ x,e_{n} ⟩ λ_{n} 
 , 
thus T^{*} y = ∑_{1}^{∞}λ_{n}⟨ y,e_{n} ⟩ e_{n}. Then we got the normality of T: T^{*}Tx=TT^{*}x= ∑_{1}^{∞} λ_{n} ^{2}⟨ y,e_{n} ⟩ e_{n}. Also T is compact because it is a uniform limit of the finite rank operators T_{n}x=∑_{1}^{n} λ_{n}⟨ x,e_{n} ⟩e_{n}.
Tx= 
 λ_{n}⟨ x,g_{n} ⟩ g_{n}, 
Proof. Let (e_{n}) be the orthonormal sequence constructed in the proof of the previous Theorem. Then x is perpendicular to all e_{n} if and only if its in the kernel of T. Let (f_{n}) be any orthonormal basis of kerT. Then the union of (e_{n}) and (f_{n}) is the orthonormal basis (g_{n}) we have looked for.
Proof. Operator T^{*}T is compact and Hermitian (hence normal). From the previous Corollary there is an orthonormal basis (e_{k}) such that T^{*}T x= ∑_{n} λ_{n}⟨ x,e_{k} ⟩e_{k} for some positive λ_{n}=T e_{n}^{2}. Let µ_{n}=Te_{n} and f_{n}=Te_{n}/µ_{n}. Then f_{n} is an orthonormal sequence (check!) and
Tx= 
 ⟨ x,e_{n} ⟩ Te_{n} = 
 ⟨ x,e_{n} ⟩ µ_{n} f_{n}. 
Proof.
Sufficiency follows
from 9.
Necessity: by the
previous Corollary Tx
=∑_{n} ⟨ x,e_{n}
⟩ µ_{n} f_{n} thus T is a uniform limit of
operators T_{m} x=∑_{n=1}^{m} ⟨ x,e_{n}
⟩ µ_{n} f_{n} which are
of finite rank.
In this lecture we will study the Fredholm equation defined as follows. Let the integral operator with a kernel K(x,y) defined on [a,b]×[a,b] be defined as before:
(Tφ)(x)= 
 K(x,y)φ(y) dy. (49) 
The Fredholm equation of the first and second kinds correspondingly are:
Tφ=f and φ −λ Tφ=f, (50) 
for a function f on [a,b]. A special case is given by Volterra equation by an operator integral operator (49) T with a kernel K(x,y)=0 for all y>x which could be written as:
(Tφ)(x)= 
 K(x,y)φ(y) dy. (51) 
We will consider integral operators with kernels K such that ∫_{a}^{b}∫_{a}^{b} K(x,y) dx dy<∞, then by Theorem 15 T is a Hilbert–Schmidt operator and in particular bounded.
As a reason to study Fredholm operators we will mention that solutions of differential equations in mathematical physics (notably heat and wave equations) requires a decomposition of a function f as a linear combination of functions K(x,y) with “coefficients” φ. This is an continuous analog of a discrete decomposition into Fourier series.
Using ideas from the proof of Lemma 4 we define Neumann series for the resolvent:
(I−λ T)^{−1}=I+λ T + λ^{2}T^{2}+⋯, (52) 
which is valid for all λ<T^{−1}.
φ(x)−λ 
 y φ(y) dy=x^{2}, on L_{2}[0,1]. 
K(x,y)=  ⎧ ⎨ ⎩ 


(T^{n}f)(x) = 
 y 
 dy= 
 . 

Among other integral operators there is an important subclass with separable kernel, namely a kernel which has a form:
K(x,y)= 
 g_{j}(x)h_{j}(y). (53) 
In such a case:

i.e. the image of T is spanned by g_{1}(x), …, g_{n}(x) and is finite dimensional, consequently the solution of such equation reduces to linear algebra.



We develop some Hilbert–Schmidt theory for integral operators.
Tφ= 
 λ_{n} ⟨ φ,v_{n} ⟩v_{n} where φ= 
 ⟨ φ,v_{n} ⟩v_{n} 
Proof.


⎪ ⎪  v_{n}(x_{1})−v_{n}(x_{2})  ⎪ ⎪  ≤ 
 ⎪⎪ ⎪⎪  v_{n}  ⎪⎪ ⎪⎪  _{2} 
 ⎪ ⎪  K(x_{1},y)−K(x_{2},y)  ⎪ ⎪  dy 
φ= 

 v_{n}. (54) 
Proof. Let φ=∑_{1}^{∞}a_{n} v_{n} where a_{n}=⟨ φ,v_{n} ⟩, then
φ−λ Tφ= 
 a_{n}(1−λ λ_{n}) v_{n} =f= 
 ⟨ f,v_{n} ⟩v_{n} 
if and only if a_{n}=⟨ f,v_{n} ⟩/(1−λ λ_{n}) for all n. Note 1−λ λ_{n}≠ 0 since λ^{−1}∉σ(T).
Because λ_{n}→ 0 we got ∑_{1}^{∞} a_{n} ^{2} by its comparison with ∑_{1}^{∞} ⟨ f,v_{n} ⟩ ^{2}=f^{2}, thus the solution exists and is unique by the Riesz–Fisher Theorem.
See Exercise 30 for an example.

Proof.
(I−λ T)φ= 
 (1−λ λ_{n})⟨ φ,v_{n} ⟩v_{n} = 
 (1−λ λ_{n})⟨ φ,v_{n} ⟩v_{n}. 
φ= 

 v_{n} +φ_{0}, for any φ_{0}∈Lin(v_{1},…,v_{N}), 
(Tφ)(x)= 
 (2xy−x−y+1)φ(y) dy. 
(Tφ)(x)=x 
 (2y−1)φ(y) dy+ 
 (−y+1)φ(y) dy, 
 or T is given by the matrix  ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 


φ=f+ 
 ⟨ f,v_{2} ⟩v_{2}+Cv_{1}=f+ 
 ⟨ f,v_{2} ⟩v_{2}+Cv_{1}, C∈ℂ. 
φ=f+ 
 ⟨ f,v_{1} ⟩v_{1}+Cv_{2}=f− 
 ⟨ f,v_{2} ⟩v_{2}+Cv_{2}, C∈ℂ. 
We will work with either the field of real numbers ℝ or the complex numbers ℂ. To avoid repetition, we use K to denote either ℝ or ℂ.
Recall, see Defn. 3, a norm on a vector space V is a map ·:V→[0,∞) such that
Note, that the second and third conditions imply that linear operations—multiplication by a scalar and addition of vectors respectively—are continuous in the topology defined by the norm.
A norm induces a metric, see Defn. 1, on V by setting d(u,v)=u−v. When V is complete, see Defn. 6, for this metric, we say that V is a Banach space.
We will use the following simple inequality:
⎪ ⎪  ab  ⎪ ⎪  ≤ 
 + 
 , (57) 
Proof.[First proof: analytic] Obviously, it is enough to prove inequality for positive reals a= a  and b= b . If p>1 then 0<1/p < 1. Consider the function φ(t)=t^{m}−mt for an 0<m<1. From its derivative φ(t)=m(t^{m−1}−1) we find the only critical point t=1 on [0,∞), which is its maximum for m=1/p<1. Thus write the inequality φ(t)≤ φ(1) for t=a^{p}/b^{q} and m=1/p. After a transformation we get a· b^{−q/p}−1≤ 1/p(a^{p}b^{−q}−1) and multiplication by b^{q} with rearrangements lead to the desired result.
Proof.[Second proof: geometric] Consider the plane with coordinates (x,y) and take the curve y=x^{p−1} which is the same as x=y^{q−1}. Comparing areas on the figure:
we see that S_{1}+S_{2}≥ ab for any positive reals a and b. Elementary integration shows:
S_{1}= 
 x^{p−1} dx= 
 , S_{2}= 
 y^{q−1} dy= 
 . 
This finishes the demonstration.
 ⎪ ⎪  u_{j} v_{j}  ⎪ ⎪  ≤  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  u_{j}  ⎪ ⎪  ^{p}  ⎞ ⎟ ⎟ ⎠ 
 ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  v_{j}  ⎪ ⎪  ^{q}  ⎞ ⎟ ⎟ ⎠ 
 . 
Proof. For reasons become clear soon we use the notation u_{p}=( ∑_{j=1}^{n}  u_{j} ^{p} )^{1/p} and v_{q}= ( ∑_{j=1}^{n}  v_{j} ^{q} )^{1/q} and define for 1≤ i ≤ n:
a_{i}= 
 and b_{i}= 
 . 
Summing up for 1≤ i ≤ n all inequalities obtained from (57):
⎪ ⎪  a_{i} b_{i}  ⎪ ⎪  ≤ 
 + 
 , 
we get the result.
Using Hölder inequality we can derive the following one:
⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  u_{j}+v_{j}  ⎪ ⎪  ^{p}  ⎞ ⎟ ⎟ ⎠ 
 ≤  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  u_{j}  ⎪ ⎪  ^{p}  ⎞ ⎟ ⎟ ⎠ 
 +  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  v_{j}  ⎪ ⎪  ^{p}  ⎞ ⎟ ⎟ ⎠ 
 . 
Proof. For p>1 we have:
 ⎪ ⎪  u_{k}+v_{k}  ⎪ ⎪  ^{p} = 
 ⎪ ⎪  u_{k}  ⎪ ⎪  ⎪ ⎪  u_{k}+v_{k}  ⎪ ⎪  ^{p−1} + 
 ⎪ ⎪  v_{k}  ⎪ ⎪  ⎪ ⎪  u_{k}+v_{k}  ⎪ ⎪  ^{p−1}. (58) 
By Hölder inequality
 ⎪ ⎪  u_{k}  ⎪ ⎪  ⎪ ⎪  u_{k}+v_{k}  ⎪ ⎪  ^{p−1} ≤  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  u_{k}  ⎪ ⎪  ^{p}  ⎞ ⎟ ⎟ ⎠ 
 ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  u_{k}+v_{k}  ⎪ ⎪  ^{q(p−1)}  ⎞ ⎟ ⎟ ⎠ 
 . 
Adding a similar inequality for the second term in the right hand side of (58) and division by (∑_{1}^{n}  u_{k}+v_{k} ^{q(p−1)})^{1/q} yields the result.
Minkowski’s inequality shows that for 1≤ p<∞ (the case p=1 is easy) we can define a norm ·_{p} on K^{n} by
⎪⎪ ⎪⎪  u  ⎪⎪ ⎪⎪  _{p} =  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  u_{j}  ⎪ ⎪  ^{p}  ⎞ ⎟ ⎟ ⎠ 
 ( u =(u_{1},⋯,u_{n})∈K^{n} ). 
See, Figure 2 for illustration of various norms of this type defined in ℝ^{2}.
We can define an infinite analogue of this. Let 1≤ p<∞, let l_{p} be the space of all scalar sequences (x_{n}) with ∑_{n}  x_{n} ^{p} < ∞. A careful use of Minkowski’s inequality shows that l_{p} is a vector space. Then l_{p} becomes a normed space for the ·_{p} norm. Note also, that l_{2} is the Hilbert space introduced before in Example 2.
Recall that a Cauchy sequence, see Defn. 5, in a normed space is bounded: if (x_{n}) is Cauchy then we can find N with x_{n}−x_{m}<1 for all n,m≥ N. Then x_{n} ≤ x_{n}−x_{N} + x_{N} < x_{N}+1 for n≥ N, so in particular, x_{n} ≤ max( x_{1},x_{2},⋯,x_{N−1},x_{N}+1).
Proof. We repeat the proof of Thm. 24 changing 2 to p. Let (x^{(n)}) be a Cauchysequence in l_{p}; we wish to show this converges to some vector in l_{p}.
For each n, x^{(n)}∈l_{p} so is a sequence of scalars, say (x_{k}^{(n)})_{k=1}^{∞}. As (x^{(n)}) is Cauchy, for each є>0 there exists N_{є} so that x^{(n)} − x^{(m)}_{p} ≤ є for n,m≥ N_{є}.
For k fixed,
⎪ ⎪  x_{k}^{(n)} − x_{k}^{(m)}  ⎪ ⎪  ≤  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  x_{j}^{(n)} − x_{j}^{(m)}  ⎪ ⎪  ^{p}  ⎞ ⎟ ⎟ ⎠ 
 =  ⎪⎪ ⎪⎪  x^{(n)} − x^{(m)}  ⎪⎪ ⎪⎪  _{p} ≤ є, 
when n,m≥ N_{є}. Thus the scalar sequence (x_{k}^{(n)})_{n=1}^{∞} is Cauchy in K and hence converges, to x_{k} say. Let x=(x_{k}), so that x is a candidate for the limit of (x^{(n)}).
Firstly, we check that x−x^{(n)}∈l_{p} for some n. Indeed, for a given є>0 find n_{0} such that x^{(n)}−x^{(m)}<є for all n,m>n_{0}. For any K and m:
 ⎪ ⎪  x_{k}^{(n)}−x_{k}^{(m)}  ⎪ ⎪  ^{p} ≤  ⎪⎪ ⎪⎪  x^{(n)}−x^{(m)}  ⎪⎪ ⎪⎪  ^{p}<є^{p}. 
Let m→ ∞ then ∑_{k=1}^{K}
 x_{k}^{(n)}−x_{k} ^{p} ≤ є^{p}.
Let K→ ∞ then ∑_{k=1}^{∞} x_{k}^{(n)}−x_{k} ^{p} ≤ є^{p}. Thus
x^{(n)}−x∈l_{p} and because l_{p} is a
linear space then x = x^{(n)}−(x^{(n)}−x) is also in
l_{p}.
Finally, we saw above that for any є >0 there is n_{0} such that x^{(n)}−x<є for all n>n_{0}. Thus x^{(n)}→ x.
For p=∞, there are two analogies to the l_{p} spaces. First, we define l_{∞} to be the vector space of all bounded scalar sequences, with the supnorm (·_{∞}norm):
⎪⎪ ⎪⎪  (x_{n})  ⎪⎪ ⎪⎪  _{∞} = 
 ⎪ ⎪  x_{n}  ⎪ ⎪  ( (x_{n})∈ l_{∞} ). (59) 
Second, we define c_{0} to be the space of all scalar sequences (x_{n}) which converge to 0. We equip c_{0} with the sup norm (59). This is defined, as if x_{n}→0, then (x_{n}) is bounded. Hence c_{0} is a subspace of l_{∞}, and we can check (exercise!) that c_{0} is closed.
Proof. This is another variant of the previous proof of Thm. 6. We do the l_{∞} case. Again, let (x^{(n)}) be a Cauchy sequence in l_{∞}, and for each n, let x^{(n)}=(x_{k}^{(n)})_{k=1}^{∞}. For є>0 we can find N such that x^{(n)}−x^{(m)}_{∞} < є for n,m≥ N. Thus, for any k, we see that  x_{k}^{(n)} − x_{k}^{(m)}  < є when n,m≥ N. So (x_{k}^{(n)})_{n=1}^{∞} is Cauchy, and hence converges, say to x_{k}∈K. Let x=(x_{k}).
Let m≥ N, so that for any k, we have that
⎪ ⎪  x_{k} − x_{k}^{(m)}  ⎪ ⎪  = 
 ⎪ ⎪  x_{k}^{(n)} − x_{k}^{(m)}  ⎪ ⎪  ≤ є. 
As k was arbitrary, we see that sup_{k}  x_{k}−x_{k}^{(m)}  ≤ є. So, firstly, this shows that (x−x^{(m)})∈l_{∞}, and so also x = (x−x^{(m)}) + x^{(m)} ∈ l_{∞}. Secondly, we have shown that x−x^{(m)}_{∞} ≤ є when m≥ N, so x^{(m)}→ x in norm.
⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{p}=  ⎛ ⎜ ⎜ ⎜ ⎜ ⎝  ∫ 
 ⎪ ⎪  f(t)  ⎪ ⎪  ^{p} dt  ⎞ ⎟ ⎟ ⎟ ⎟ ⎠ 
 . 
Recall what a linear map is, see Defn. 1. A linear map is often called an operator. A linear map T:E→ F between normed spaces is bounded if there exists M>0 such that T(x) ≤ M x for x∈ E, see Defn. 3. We write B(E,F) for the set of operators from E to F. For the natural operations, B(E,F) is a vector space. We norm B(E,F) by setting
⎪⎪ ⎪⎪  T  ⎪⎪ ⎪⎪  = sup  ⎧ ⎪ ⎨ ⎪ ⎩ 
 : x∈ E, x≠0  ⎫ ⎪ ⎬ ⎪ ⎭  . (60) 
Proof. Proof essentially follows the proof of similar Theorem 4. See also discussion about usefulness of this theorem there.
Proof. In the essence, we follows the same threestep procedure as in Thms. 24, 6 and 8. Let (T_{n}) be a Cauchy sequence in B(E,F). For x∈ E, check that (T_{n}(x)) is Cauchy in F, and hence converges to, say, T(x), as F is complete. Then check that T:E→ F is linear, bounded, and that T_{n}−T→ 0.
We write B(E) for B(E,E). For normed spaces E, F and G, and for T∈B(E,F) and S∈B(F,G), we have that ST=S∘ T∈B(E,G) with ST ≤ S T.
For T∈B(E,F), if there exists S∈B(F,E) with ST=I_{E}, the identity of E, and TS=I_{F}, then T is said to be invertible, and write T=S^{−1}. In this case, we say that E and F are isomorphic spaces, and that T is an isomorphism.
If T(x)=x for each x∈ E, we say that T is an isometry. If additionally T is an isomorphism, then T is an isometric isomorphism, and we say that E and F are isometrically isomorphic.
Let E be a normed vector space, and let E^{*} (also written E′) be B(E,K), the space of bounded linear maps from E to K, which we call functionals, or more correctly, bounded linear functionals, see Defn. 1. Notice that as K is complete, the above theorem shows that E^{*} is always a Banach space.
φ_{u}(x) = 
 u_{j} x_{j}  ⎛ ⎝  x=(x_{j})∈l_{p}  ⎞ ⎠  . 
Proof. By Holder’s inequality, we see that
⎪ ⎪  φ_{u}(x)  ⎪ ⎪  ≤ 
 ⎪ ⎪  u_{j}  ⎪ ⎪  ⎪ ⎪  x_{j}  ⎪ ⎪  ≤  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  u_{j}  ⎪ ⎪  ^{q}  ⎞ ⎟ ⎟ ⎠ 
 ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  x_{j}  ⎪ ⎪  ^{p}  ⎞ ⎟ ⎟ ⎠ 
 =  ⎪⎪ ⎪⎪  u  ⎪⎪ ⎪⎪  _{q}  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  _{p}. 
So the sum converges, and hence φ_{u} is defined. Clearly φ_{u} is linear, and the above estimate also shows that φ_{u} ≤ u_{q}. The map u↦ φ_{u} is also clearly linear, and we’ve just shown that it is normdecreasing.
Now let φ∈(l_{p})^{*}. For each n, let e_{n} = (0,⋯,0,1,0,⋯) with the 1 in the nth position. Then, for x=(x_{n})∈l_{p},
⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪  x − 
 x_{k} e_{k}  ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪  _{p} =  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  x_{k}  ⎪ ⎪  ^{p}  ⎞ ⎟ ⎟ ⎠ 
 → 0, 
as n→∞. As φ is continuous, we see that
φ(x) = 

 φ(x_{k}e_{k}) = 
 x_{k} φ(e_{k}). 
Let u_{k}=φ(e_{k}) for each k. If u=(u_{k})∈l_{q} then we would have that φ=φ_{u}.
Let us fix N∈ℕ, and define
x_{k} = 

Then we see that
 ⎪ ⎪  x_{k}  ⎪ ⎪  ^{p} = 
 ⎪ ⎪  u_{k}  ⎪ ⎪  ^{p(q−1)} = 
 ⎪ ⎪  u_{k}  ⎪ ⎪  ^{q}, 
as p(q−1) = q. Then, by the previous paragraph,
φ(x) = 
 x_{k} u_{k} = 
 ⎪ ⎪  u_{k}  ⎪ ⎪  ^{q}. 
Hence
⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ≥ 
 =  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  u_{k}  ⎪ ⎪  ^{q}  ⎞ ⎟ ⎟ ⎠ 
 =  ⎛ ⎜ ⎜ ⎝ 
 ⎪ ⎪  u_{k}  ⎪ ⎪  ^{q}  ⎞ ⎟ ⎟ ⎠ 
 . 
By letting N→∞, it follows that u∈l_{q} with u_{q} ≤ φ. So φ=φ_{u} and φ = φ_{u} ≤ u_{q}. Hence every element of (l_{p})^{*} arises as φ_{u} for some u, and also φ_{u} = u_{q}.
Loosely speaking, we say that l_{q} = (l_{p})^{*}, although we should always be careful to keep in mind the exact map which gives this.
Similarly, we can show that c_{0}^{*}=l_{1} and that (l_{1})^{*}=l_{∞} (the implementing isometric isomorphism is giving by the same summation formula).
Mathematical induction is a well known method to prove statements depending from a natural number. The mathematical induction is based on the following property of natural numbers: any subset of ℕ has the least element. This observation can be generalised to the transfinite induction described as follows.
A poset is a set X with a relation ≼ such that a≼ a for all a∈ X, if a≼ b and b≼ a then a=b, and if a≼ b and b≼ c, then a≼ c. We say that (X,≼) is total if for every a,b∈ X, either a≼ b or b≼ a. For a subset S⊆ X, an element a∈ X is an upper bound for S if s≼ a for every s∈ S. An element a∈ X is maximal if whenever b∈ X is such that a≼ b, then also b≼ a.
Then Zorn’s Lemma tells us that if X is a nonempty poset such that every total subset has an upper bound, then X has a maximal element. Really this is an axiom which we have to assume, in addition to the usual axioms of settheory. Zorn’s Lemma is equivalent to the axiom of choice and Zermelo’s theorem.
Proof. We do the real case. An “extension” of φ is a bounded linear map φ_{G}:G→ℝ such that F⊆ G⊆ E, φ_{G}(x)=φ(x) for x∈ F, and φ_{G}≤φ. We introduce a partial order on the pairs (G, φ_{G}) of subspaces and functionals as follows: (G_{1}, φ_{G1})≼ (G_{2}, φ_{G2}) if and only if G_{1}⊆ G_{2} and φ_{G1}(x)=φ_{G2}(x) for all x∈ G_{1}. A Zorn’s Lemma argument shows that a maximal extension φ_{G}:G→ℝ exists. We shall show that if G≠E, then we can extend φ_{G}, a contradiction.
Let x∉G, so an extension φ_{1} of φ to the linear span of G and x must have the form
φ_{1}(x′+ax) = φ(x) + a α (x′∈ G, a∈ℝ), 
for some α∈ℝ. Under this, φ_{1} is linear and extends φ, but we also need to ensure that φ_{1}≤φ. That is, we need
⎪ ⎪  φ(x′) + aα  ⎪ ⎪  ≤  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  x′+ax  ⎪⎪ ⎪⎪  (x′∈ G, a∈ℝ). (61) 
It is straightforward for a=0, otherwise to simplify proof put −a y=x′ in (61) an divide both sides of the identity by a. Thus we need to show that there exist such α that
⎪ ⎪  α−φ(y)  ⎪ ⎪  ≤  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  x−y  ⎪⎪ ⎪⎪  for all y∈ G, a∈ℝ, 
or
φ(y)−  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  x−y  ⎪⎪ ⎪⎪  ≤ α ≤ φ(y)+  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  x−y  ⎪⎪ ⎪⎪  . 
For any y_{1} and y_{2} in G we have:
φ(y_{1})−φ(y_{2})≤  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  y_{1}−y_{2}  ⎪⎪ ⎪⎪  ≤  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  (  ⎪⎪ ⎪⎪  x−y_{2}  ⎪⎪ ⎪⎪  +  ⎪⎪ ⎪⎪  x−y_{1}  ⎪⎪ ⎪⎪  ). 
Thus
φ(y_{1})−  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  x−y_{1}  ⎪⎪ ⎪⎪  ≤ φ(y_{2})+  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  x−y_{2}  ⎪⎪ ⎪⎪  . 
As y_{1} and y_{2} were arbitrary,
 (φ(y) −  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  y+x  ⎪⎪ ⎪⎪  ) ≤ 
 (φ(y) +  ⎪⎪ ⎪⎪  φ  ⎪⎪ ⎪⎪  ⎪⎪ ⎪⎪  y+x  ⎪⎪ ⎪⎪  ). 
Hence we can choose α between the inf and the sup.
The complex case follows by “complexification”.
The HahnBanach theorem tells us that a functional from a subspace can be extended to the whole space without increasing the norm. In particular, extending a functional on a onedimensional subspace yields the following.
Another useful result which can be proved by HahnBanach is the following.
Proof. 1⇒2 follows because we can find a sequence (y_{n}) in F with y_{n}→ x; then it’s immediate that φ(x)=0, because φ is continuous. Conversely, we show that if 1 doesn’t hold then 2 doesn’t hold (that is, the contrapositive to 2⇒1).
So, x∉F. Define ψ:{F,x}→K by
ψ(y+tx) = t (y∈ F, t∈K). 
This is welldefined, for y, y′∈ F if y+tx=y′+t′x then either t=t′, or otherwise x = (t−t′)^{−1}(y′−y) ∈ F which is a contradiction. The map ψ is obviously linear, so we need to show that it is bounded. Towards a contradiction, suppose that ψ is not bounded, so we can find a sequence (y_{n}+t_{n}x) with y_{n}+t_{n}x≤1 for each n, and yet  ψ(y_{n}+t_{n}x) = t_{n} →∞. Then  t_{n}^{−1} y_{n} + x  ≤ 1/ t_{n}  → 0, so that the sequence (−t_{n}^{−1}y_{n}), which is in F, converges to x. So x is in the closure of F, a contradiction. So ψ is bounded. By HahnBanach theorem, we can find some φ∈ E^{*} extending ψ. For y∈ F, we have φ(y)=ψ(y)=0, while φ(x)=ψ(x)=1, so 2 doesn’t hold, as required.
We define E^{**} = (E^{*})^{*} to be the bidual of E, and define J:E→ E^{**} as follows. For x∈ E, J(x) should be in E^{**}, that is, a map E^{*}→K. We define this to be the map φ↦φ(x) for φ∈ E^{*}. We write this as
J(x)(φ) = φ(x) (x∈ E, φ∈ E^{*}). 
The Corollary 16 shows that J is an isometry; when J is surjective (that is, when J is an isomorphism), we say that E is reflexive. For example, l_{p} is reflexive for 1<p<∞. On the other hand c_{0} is not reflexive.
This section is not examinable. Standard facts about topology will be used in later sections of the course.
All our topological spaces are assumed Hausdorff. Let X be a compact space, and let C_{K}(X) be the space of continuous functions from X to K, with pointwise operations, so that C_{K}(X) is a vector space. We norm C_{K}(X) by setting
⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{∞} = 
 ⎪ ⎪  f(x)  ⎪ ⎪  (f∈ C_{K}(X)). 
Let E be a vector space, and let ·_{(1)} and ·_{(2)} be norms on E. These norms are equivalent if there exists m>0 with
m^{−1}  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  _{(2)} ≤  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  _{(1)} ≤ m  ⎪⎪ ⎪⎪  x  ⎪⎪ ⎪⎪  _{(2)} (x∈ E). 
Proof. Use the above lemma to construct a sequence (x_{n}) in the closed unit ball of E with, say, x_{n}−x_{m}≥1/2 for each n≠m. Then (x_{n}) can have no convergent subsequence, and so the closed unit ball cannot be compact.
The presentation in this section is close to [, , ].
Note, that in the third condition we admit any countable unions. The usage of “σ” in the names of σ algebra and σring is a reference to this. If we replace the condition by
then we obtain definitions of an algebra.
For a σalgebra R and A,B∈R, we have
A ⋂ B = X∖  ⎛ ⎝  X∖(A⋂ B)  ⎞ ⎠  = X ∖  ⎛ ⎝  (X∖ A)⋃(X∖ B)  ⎞ ⎠  ∈R. 
Similarly, R is closed under taking (countably) infinite intersections.
If we drop the first condition from the definition of (σ)algebra (but keep the above conclusion from it!) we got a (σ)ring, that is a (σ)ring is closed under (countable) unions, (countable) intersections and subtractions of sets.
Sets A_{k} are pairwise disjoint if A_{n}∩ A_{m}=∅ for n≠m. We denote the union of pairwise disjoint sets by ⊔, e.g. A ⊔ B ⊔ C.
It is easy to work with a vector space through its basis. For a ring of sets the following notion works as a helpful “basis”.
Again, any semiring contain the empty set.
As the intersection of a family of σalgebras is again a σalgebra, and the power set 2^{X} is a σalgebra, it follows that given any collection D⊆ 2^{X}, there is a σalgebra R such that D⊆R, such that if S is any other σalgebra, with D⊆S, then R⊆S. We call R the σalgebra generated by D.
We introduce the symbols +∞, −∞, and treat these as being “extended real numbers”, so −∞ < t < ∞ for t∈ℝ. We define t+∞ = ∞, t∞ = ∞ if t>0 and so forth. We do not (and cannot, in a consistent manner) define ∞ − ∞ or 0∞.
In analysis we are interested in infinities and limits, thus the following extension of additivity is very important.
We will see further examples of measures which are not σadditive in Section 12.4.
A measure µ is σfinite if X is a union of countable number of sets X_{k}, such that for any A∈ R and any k∈ ℕ the intersection A∩ X_{k} is in R and µ(A∩ X_{k})<∞.
 µ(A_{n}) = µ  ⎛ ⎝  ⋃ A_{n}  ⎞ ⎠  . 
 µ(A_{n}) = µ  ⎛ ⎝  ⋂ A_{n}  ⎞ ⎠  . 
Proof. The two first properties are easy to see. The last two properties are follows from the theorem in real analysis that any monotonic sequence of real numbers converges (recall that we admit +∞ as limits’ value).
From now on we consider only finite measures, an extension to σfinite measures will be done later.
Proof. If an extension exists it shall satisfy µ(A)=∑_{k=1}^{n} µ′(A_{k}), where A_{k}∈ S. We need to show for this definition two elements:
 µ′(A_{j})= 

 µ′(C_{jk}) = 

 µ′(C_{jk})= 
 µ′(B_{k}). 
µ(A)= 

 µ′(C_{jk})= 
 µ(A_{k}). 
Finally, show the σadditivity. For a set A=⊔_{k=1}^{∞}A_{k}, where A and A_{k}∈ R(S), find presentations A=⊔_{j=1}^{n} B_{j}, B_{j}∈ S and A_{k}=⊔_{l=1}^{m(k)} B_{lk}, B_{lk}∈ S. Define C_{jlk}=B_{j} ∩ B_{lk}∈ S, then B_{j}=⊔_{k=1}^{∞}⊔_{l=1}^{m(k)} C_{jlk} and A_{k}= ⊔_{j=1}^{n} ⊔_{l=1}^{m(k)} C_{jlk} Then, from σadditivity of µ′:

where we changed the summation order in series with nonnegative terms.
In a similar way we can extend a measure from a semiring to
corresponding σring, however it can be done even for a
larger family. The procedure recall the famous story on
Baron Munchausen
saves himself from being drowned in a swamp by pulling on his own
hair. Indeed, initially we knew measure for elements of semiring
S or their finite disjoint unions from R(S). For an arbitrary
set A we may assign a measure from an element of R(S) which
“approximates” A. But how to measure such approximation? Well,
to this end we use the measure on R(S) again (pulling on his own
hair)!
Coming back to exact definitions, we introduce the following notion.
µ^{*}(A)=inf  ⎧ ⎪ ⎨ ⎪ ⎩ 
 µ(A_{k}), such that A⊆ ⋃_{k} A_{k}, A_{k}∈ S  ⎫ ⎪ ⎬ ⎪ ⎭  . 
The final condition says that an outer measure is countably subadditive. Note, that an outer measure may be not a measure in the sense of Defn. 6 due to a luck of additivity.
µ^{*}(A) = inf  ⎧ ⎪ ⎨ ⎪ ⎩ 
 (b_{j}−a_{j}) : A⊆ ⋃_{j=1}^{∞}[a_{j},b_{j})  ⎫ ⎪ ⎬ ⎪ ⎭  . 
For example, for outer Lebesgue measure we have µ^{*}(A)=0 for any countable set, which follows, as clearly µ^{*}({x})=0 for any x∈ℝ.
Proof. For є>0, as [a,b] ⊆ [a,b+є), we have that µ^{*}([a,b])≤ (b−a)+є. As є>0, was arbitrary, µ^{*}([a,b]) ≤ b−a.
To show the opposite inequality we observe that [a,b)⊂[a,b] and µ^{*}[a,b) =b−a (because [a,b) is in the semiring) so µ^{*}[a,b]≥ b−a by 2.
Our next aim is to construct measures from outer measures. We use the notation A▵ B=(A∪ B)∖ (A∩ B) for symmetric difference of sets.
Obviously all elements of S are measurable. An alternative definition of a measurable set is due to Carathéodory.
µ^{*}(A) = µ^{*}(A⋂ E) + µ^{*}(A∖ E), 
As µ^{*} is subadditive, this is equivalent to
µ^{*}(A) ≥ µ^{*}(A⋂ E) + µ^{*}(A∖ E) (A⊆ X), 
as the other inequality is automatic.
Suppose now that the ring R(S) is an algebra (i.e., contains the maximal element X). Then, the outer measure of any set is finite, and the following theorem holds:
Proof.[Sketch of proof] Clearly, R(S)⊂ L. Now we show that µ^{*}(A)=µ(A) for a set A∈ R(S). If A⊂ ∪_{k} A_{k} for A_{k} ∈ S), then µ(A)≤ ∑_{k} µ(A_{k}), taking the infimum we get µ(A)≤µ^{*}(A). For the opposite inequality, any A∈ R(S) has a disjoint representation A=⊔_{k} A_{k}, A_{k}∈ S, thus µ^{*}(A)≤ ∑_{k} µ(A_{k})=µ(A).
Now we will show that R(S) is an incomplete metric space, with the measure µ being uniformly continuous functions. Measurable sets make the completion of R(S) with µ being continuation of µ^{*} to the completion by continuity.
Define a distance between elements A, B∈ L as the outer measure of the symmetric difference of A and B: d(A,B)=µ^{*}(A▵ B). Introduce equivalence relation A∼ B if d(A,B)=0 and use the inclusion for the triangle inequality:
A▵ B ⊆ (A▵ C) ⋃ (C▵ B) 
Then, by the definition, Lebesgue measurable sets make the closure of R(S) with respect to this distance.
We can check that measurable sets form an algebra. To this end we need to make estimations, say, of µ^{*}((A_{1}∩ A_{2})▵ (B_{1}∩ B_{2})) in terms of µ^{*}(A_{i}▵ B_{i}). A demonstration for any finite number of sets is performed through mathematical inductions. The above twosets case provide both: the base and the step of the induction.
Now, we show that L is σalgebra. Let A_{k}∈ L and A=∪_{k} A_{k}. Then for any ε>0 there exists B_{k}∈ R(S), such that µ^{*}(A_{k}▵ B_{k})<ε/2^{k}. Define B=∪_{k} B_{k}. Then
⎛ ⎝  ⋃_{k} A_{k}  ⎞ ⎠  ▵  ⎛ ⎝  ⋃_{k} B_{k}  ⎞ ⎠  ⊂ ⋃_{k}  ⎛ ⎝  A_{k} ▵ B_{k}  ⎞ ⎠  implies µ^{*}(A▵ B)<ε. 
We cannot stop at this point since B=∪_{k} B_{k} may be not in R(S). Thus, define B′_{1}=B_{1} and B′_{k}=B_{k}∖ ∪_{i=1}^{k−1} B_{i}, so B′_{k} are pairwise disjoint. Then B=⊔_{k} B′_{k} and B′_{k}∈R(S). From the convergence of the series there is N such that ∑_{k=N}^{∞}µ(B′_{k})<ε . Let B′=∪_{k=1}^{N} B′_{k}, which is in R(S). Then µ^{*}(B▵ B′)≤ ε and, thus, µ^{*}(A▵ B′)≤ 2ε.
To check that µ^{*} is measure on L we use the following
Proof.[Proof of the Lemma] Use inclusions A⊂ B∪(A▵ B) and B⊂ A∪(A▵ B).
To show additivity take A_{1,2}∈L , A=A_{1}⊔ A_{2}, B_{1,2}∈R(S) and µ^{*}(A_{i}▵ B_{i})<ε. Then µ^{*}(A▵(B_{1}∪ B_{2}))<2ε and  µ^{*}(A) − µ^{*}(B_{1}∪ B_{2}) <2ε. Thus µ^{*}(B_{1}∪ B_{2})=µ(B_{1}∪ B_{2})=µ (B_{1}) +µ (B_{2})−µ (B_{1}∩ B_{2}), but µ (B_{1}∩ B_{2})=d(B_{1}∩ B_{2},∅)=d(B_{1}∩ B_{2},A_{1}∩ A_{2})<2ε. Therefore
⎪ ⎪  µ^{*}(B_{1}⋂ B_{2})−µ (B_{1}) −µ (B_{2})  ⎪ ⎪  <2ε. 
Combining everything together we get:
⎪ ⎪  µ^{*}(A)−µ^{*}(A_{1})−µ^{*}(A_{2})  ⎪ ⎪  <6ε. 
Thus µ^{*} is additive.
Check the countable additivity for A=⊔_{k} A_{k}. The inequality µ^{*}(A)≤ ∑_{k}µ^{*}(A_{k}) follows from countable subadditivity. The opposite inequality is the limiting case of the finite inequality µ^{*}(A)≥ µ^{*}(⊔_{k=1}^{N} A_{k})=∑_{k=1}^{N}µ^{*}(A_{k}) following from additivity and monotonicity of µ^{*}.
Proof. This is a common trick, using the density and the countability of the rationals. As σalgebras are closed under taking complements, we need only show that open sets are Lebesgue measurable.
Intervals (a,b) are Lebesgue measurable by the very definition. Now let U⊆ℝ be open. For each x∈ U, there exists a_{x}<b_{x} with x∈(a_{x},b_{x})⊆ U. By making a_{x} slightly larger, and b_{x} slightly smaller, we can ensure that a_{x},b_{x}∈ℚ. Thus U = ∪_{x} (a_{x}, b_{x}). Each interval is measurable, and there are at most a countable number of them (endpoints make a countable set) thus U is the countable (or finite) union of Lebesgue measurable sets, and hence U is Lebesgue measurable itself.
We perform now an extension of finite measure to σfinite one. Let there is σadditive and σfinite measure µ defined on a semiring in X=⊔_{k} X_{k}, where restriction of µ to every X_{k} is finite. Consider the Lebesgue extension µ_{k} of µ defined within X_{k}. A set A⊂ X is measurable if every intersection A∩ X_{k} is µ_{k} measurable. For a such measurable set A we define its measure by the identity:
µ(A)= 
 µ_{k}(A⋂ X_{k}). 
We call a measure µ defined on L complete if whenever E⊆ X is such that there exists F∈L with µ(F)=0 and E⊆ F, we have that E∈L. Measures constructed from outer measures by the above theorem are always complete. On the example sheet, we saw how to form a complete measure from a given measure. We call sets like E null sets: complete measures are useful, because it is helpful to be able to say that null sets are in our σalgebra. Null sets can be quite complicated. For the Lebesgue measure, all countable subsets of ℝ are null, but then so is the Cantor set, which is uncountable.
We start from the following observation.
In view of this, it will be helpful to extend the notion of a measure to obtain a linear space.
In the following “charge” means “real charge”.
The opposite statement is also true:
To prove the theorem we need the following definition.
The relation of variation to charge is as follows:
Finally to prove the Thm. 28 we use the following
From the Thm. 28 we can deduce
d(ν_{1},ν_{2})= 
 ⎪ ⎪  ν_{1}(A)−ν_{2}(A)  ⎪ ⎪  . 
The following result is also important:
ν (A⋂ E) ≥ 0, ν(B⋂ E)≤ 0. 
Proof.[Sketch of proof] We only sketch this. We say that A∈L is positive if
ν(E⋂ A)≥0 (E∈L), 
and similiarly define what it means for a measurable set to be negative. Suppose that ν never takes the value −∞ (the other case follows by considering the charge −ν).
Let β = infν(B_{0}) where we take the infimum over all negative sets B_{0}. If β=−∞ then for each n, we can find a negative B_{n} with ν(B_{n})≤ −n. But then B=∪_{n} B_{n} would be negative with ν(B)≤ −n for any n, so that ν(B)=−∞ a contradiction.
So β>−∞ and so for each n we can find a negative B_{n} ν(B_{n}) < β+1/n. Then we can show that B = ∪_{n} B_{n} is negative, and argue that ν(B) ≤ β. As B is negative, actually ν(B) = β.
There then follows a very tedious argument, by contradiction, to show that A=X∖ B is a positive set. Then (A,B) is the required decomposition.
Consider the semiring S of intervals [a,b). There is a simple description of all measures on it. For a measure µ define
F_{µ}(t)=  ⎧ ⎪ ⎨ ⎪ ⎩ 
 (62) 
F_{µ} is monotonic and any monotonic function F defines a measure µ on S by the by µ([a,b))=F(b)−F(a). The correspondence is onetoone with the additional assumption F(0)=0.
Proof. The necessity: F(t)−F(t−0)=lim_{ε→ 0}µ([t−ε,t))=0.
For sufficiency assume [a,b)=⊔_{k} [a_{k},b_{k}). The inequality µ([a,b))≥ ∑_{k} µ([a_{k},b_{k})) follows from additivity and monotonicity. For the opposite inequality take δ_{k} s.t. F(b)−F(b−δ)<ε and F(a_{k})−F(a_{k}−δ_{k})<ε/2^{k} (use left continuity of F). Then the interval [a,b−δ] is covered by (a_{k}−δ_{k},b_{k}), there is finite subcovering. Thus µ([a,b−δ ))≤∑_{j=1}^{N} µ([a_{kj}−δ_{kj},b_{kj})).
Another possibility to build measures is their product. In particular, it allows to expand various measures defined through (62) on the real line to ℝ^{n}.
We now come to the main use of measure theory: to define a general theory of integration.
From now on, by a measure space we shall mean a triple (X,L,µ), where X is a set, L is a σalgebra on X, and µ is a σadditive measure defined on L. We say that the members of L are measurable, or Lmeasurable, if necessary to avoid confusion.
E_{c}(f)={x∈ X: f(x)<c} 
A complexvalued function is measurable if its real and imaginary parts are measurable.
Proof. Use that any open set U⊂ ℝ is a union of countable set of intervals (a,b), cf. proof of Cor. 23.
Proof. The preimage of (−∞,c) under a continuous g is an open set, and its preimage under f is measurable.
Proof. Use Cor. 3 to show measurability of λ f,  f  and f^{2}.
Next use the following identities:

If (f_{n}) is a nonincreasing sequence of measurable functions converging to f. Than E_{c}(f)=∪_{n} E_{c}(f_{n}).
Moreover any limit can be replaced by two monotonic limits:
 f_{n}(x)= 

 max (f_{n}(x), f_{n+1}(x),…,f_{n+k}(x)). (63) 
Finally if f_{1} is measurable and f_{2}=f_{1} almost everywhere, then f_{2} is measurable as well.
We can define several types of convergence for measurable functions
 ⎪ ⎪  f_{n}(x)−f(x)  ⎪ ⎪  → 0; 
f_{n}(x)→ f(x) for all x∈ X∖ A, µ(A)=0; 
µ({x∈ X:  ⎪ ⎪  f_{n}(x)−f(x)  ⎪ ⎪  >ε }) → 0. 
Clearly uniform convergence implies both convergences a.e and in measure.
Proof. Define A_{n}(ε)={x∈ X:  f_{n}(x)−f(x) ≥ ε}. Let B_{n}(ε)=∪_{k≥ n} A_{k}(ε). Clearly B_{n}(ε)⊃ B_{n+1}(ε), let B(ε)=∩_{1}^{∞}B_{n}(ε). If x∈ B(ε) then f_{n}(x)↛f(x). Thus µ(B(ε))=0, but µ(B(ε))=lim_{n→ ∞}µ(B_{n}(ε)). Since A_{n}(ε)⊂ B_{n}(ε) we see that µ(A_{n}(ε))→ 0.
Note, that the construction of sets B_{n}(ε) is just
another implementation of the “two monotonic limits”
trick (63) for sets.
However we can slightly “fix” either the set or the sequence to “upgrade” the convergence as shown in the following two theorems.
Proof. We use A_{n}(ε) and B_{n}(ε) from the proof of Thm. 6. For every ε>0 we seen µ(B_{n}(ε))→ 0, thus for each k there is N(k) such that µ(B_{N(k)}(1/k))<σ/2^{k}. Put E_{σ}=∪_{k} B_{N(k)}(1/k).
Proof. In the notations of two previous proofs: for every natural k take n_{k} such that µ(A_{nk}(1/k))< 1/2^{k}. Define C_{m}=∪_{k=m}^{∞}A_{nk}(1/k) and C=∩ C_{m}. Then, µ(C_{m})=1/2^{m−1} and, thus, µ(C)=0. If x∉C then there is such N that x∉A_{nk}(1/k) for all k>N. That means that  f_{nk}(x)−f(x) <1/k for all such k, i.e f_{nk}(x)→ f(x).
It is worth to note, that we can use the last two theorem subsequently
and upgrade the convergence in measure to the uniform convergence of a
subsequence on a subset.
First we define a sort of “basis” for the space of integral functions.
χ_{A}(x) = 

Then, if χ_{A} is measurable, then χ_{A}^{−1}( (1/2,3/2) ) = A ∈ L; conversely, if A∈L, then X∖ A∈L, and we see that for any U⊆ℝ open, χ_{A}^{−1}(U) is either ∅, A, X∖ A, or X, all of which are in L. So χ_{A} is measurable if and only if A∈L.
f = 
 t_{k} χ_{Ak} (64) 
Moreover in the above representation the sets A_{k} can be pairwise disjoint and all t_{k}≠ 0 pairwise different. In this case the representation is unique.
Notice that it is now obvious that
 ⎪ ⎪  t_{k}  ⎪ ⎪  µ(A_{k}) if f has the above unique representation f = 
 t_{k} χ_{Ak} (65) 
It is another combinatorial exercise to show that this definition is independent of the way we write f.
∫ 
 f dµ = 
 t_{k} µ(A_{k}⋂ A). 
Clearly the series converges for any simple summable function f. Moreover
Proof. This is another slightly tedious combinatorial exercise. You need to prove that the integral of a simple function is welldefined, in the sense that it is independent of the way we choose to write the simple function.
We will denote by S(X) the collection of all simple summable functions on X.
d_{1}(f,g)=  ∫ 
 ⎪ ⎪  f(x)−g(x)  ⎪ ⎪  dµ(x) (66) 
⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ∫ 
 f(x) dµ(x)−  ∫ 
 g(x) dµ(x)  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ≤ d_{1}(f,g). 
Proof. The proof is almost obvious, for example the Property 1 easily follows from Lem. 18.
We will outline 3 only. Let f is an indicator function of a set B, then A→ ∫_{A} f dµ=µ(A∩ B) is a σadditive measure (and thus—a charge). By the Cor. 33 the same is true for finite linear combinations of indicator functions and their limits in the sense of distance d_{1}.
We can identify functions which has the same values a.e. Then
S(X) becomes a metric space with the distance
d_{1} (66). The space may be incomplete and we
may wish to look for its completion. However, if we will simply try
to assign a limiting point to every Cauchy sequence in
S(X), then the resulting space becomes so huge that it
will be impossible to realise it as a space of functions on X. To
reduce the number of Cauchy sequences in S(X) eligible
to have a limit, we shall ask an additional condition. A convenient
reduction to functions on X appears if we ask both the convergence
in d_{1} metric and the pointwise convergence on X a.e.
Clearly, if a function is summable, then any equivalent function is summable as well. Set of equivalent classes will be denoted by L_{1}(X).
Proof. Define E_{kn}(f)={x∈ X: k/n≤ f(x)< (k+1)/n} and f_{n}=∑_{k} k/n χ_{Ekn} (note that the sum is finite due to boundedness of f).
Since  f_{n}(x)−f(x) <1/n we have uniform convergence (thus convergence a.e.) and (f_{n}) is the Cauchy sequence: d_{1}(f_{n},f_{m})=∫_{X} f_{n}−f_{m}  dµ≤ (1/n+1/m)µ(X).
Another simple result, which is useful on many occasions is as follows.
Proof. For any ε>0, by the Egorov’s theorem 8 we can find E, such that
Combining this we found that for n>N, d_{1}(f_{n},f)< M ε/2M + µ(X) ε /2µ(X) < ε .
To build integral we need the following
Proof. Let φ_{n}=f_{n}−g_{n}, then this is a Cauchy sequence with zero limit a.e. Assume the opposite to the statement: there exist δ>0 and sequence (n_{k}) such that ∫_{x} φ_{nk}  dµ>δ. Rescalingrenumbering we can obtain ∫_{x} φ_{n}  dµ>1.
Take quickly convergent subsequence using the Cauchy property:
d_{1}(φ_{nk},φ_{nk+1})≤ 1/2^{k+2}. 
Renumbering agian assume d_{1}(φ_{k},φ_{k+1})≤ 1/2^{k+2}
Since φ_{1} is a simple, that is φ_{1}=∑_{k} t_{k} χ_{Ak} and ∑_{k}  t_{k}  µ(A_{k})=∫_{X}  φ_{1}  dµ≥ 1. Thus there exists N, such that ∑_{k=1}^{N}  t_{k}  µ(A_{k})≥ 3/4. Put A=⊔_{k=1}^{N} A_{k} and C=max_{1≤ k ≤ N} t_{k} =max_{x∈ A} φ_{1}(x) .
By the Egorov’s Theorem 8 there is E⊂ A such that µ(E)<1/(4C) and φ_{n}⇒ 0 on B=A∖ E. Then
∫ 
 ⎪ ⎪  φ_{1}  ⎪ ⎪  dµ=  ∫ 
 ⎪ ⎪  φ_{1}  ⎪ ⎪  dµ−  ∫ 
 ⎪ ⎪  φ_{1}  ⎪ ⎪  dµ≥ 
 − 
 · C= 
 . 
Since
⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ∫ 
 ⎪ ⎪  φ_{n}  ⎪ ⎪  dµ−  ∫ 
 ⎪ ⎪  φ_{n+1}  ⎪ ⎪  dµ  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ≤ d_{1}(φ_{n},φ_{n+1})≤ 

we get
∫ 
 ⎪ ⎪  φ_{n}  ⎪ ⎪  dµ≥  ∫ 
 ⎪ ⎪  φ_{1}  ⎪ ⎪  dµ− 
 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ∫ 
 ⎪ ⎪  φ_{n}  ⎪ ⎪  dµ−  ∫ 
 ⎪ ⎪  φ_{n+1}  ⎪ ⎪  dµ  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ≥ 
 − 

 > 
 . 
But this contradicts to the fact ∫_{B}  φ_{n}  dµ → 0, which follows from the uniform convergence φ_{n}⇒ 0 on B.
∫ 
 f dµ = 
 ∫ 
 f_{n} dµ, 
Proof. The proof is follows from Prop. 21 and continuity of extension.
The space L_{1} was defined from dual convergence—in d_{1} metric and a.e. Can we get the continuity of the integral from the convergence almost everywhere alone? No, in general. However, we will state now some results on continuity of the integral under convergence a.e. with some additional assumptions. Finally, we show that L_{1}(X) is closed in d_{1} metric.
If f_{n}→^{a.e.} f, then f∈L_{1}(X) and for any measurable A:
 ∫ 
 f_{n} dµ =  ∫ 
 f dµ. 
Proof. For any measurable A the expression ν(A)=∫_{A} φ dµ defines a finite measure on X due to nonnegativeness of φ and Thm. 31.
∫ 
 f dµ=  ∫ 
 g dν. 
Proof.[Proof of the Lemma] Let M be the set of all g such that the Lemma is true. M includes any indicator functions g=χ_{B} of a measurable B:
∫ 
 f dµ=  ∫ 
 φχ_{B} dµ =  ∫ 
 φ dµ =ν(A⋂ B)=  ∫ 
 g dν. 
Thus M contains also finite liner combinations of indicators. For any n∈ℕ and a bounded g two functions g_{−}(x)=1/n[ng(x)] and g_{+}(x)=g_{−}+1/n are finite linear combinations of indicators and are in M. Since g_{−}(x)≤ g(x)≤ g_{+}(x) we have
∫ 
 g_{−} dν=  ∫ 
 φ g_{−} dµ≤  ∫ 
 φ g dµ≤  ∫ 
 φ g_{+} dµ=  ∫ 
 g_{+} dν. 
By squeeze rule for n→ ∞ we have the middle term tenses to ∫_{A}g dν, that is g∈ M.
For the proof of the theorem define:

Then g_{n} is bounded by 1 and g_{n}→^{a.e.} g. To show the theorem it will be enough to show lim_{n→ ∞}∫_{A} g_{n} dν=∫_{A} g dν. For the uniformly bounded functions on the finite measure set this can be derived from the Egorov’s Thm. 8, see an example of this in the proof of Lemma 28.
In the above proof summability of φ was used to obtain the
finiteness of the measure ν, which is required for Egorov’s
Thm. 8.
µ{x∈ X: f(x)>c} < 
 ∫ 
 f dµ. (67) 
Proof. Replacing f_{n} by f_{n}−f_{1} and f by f−f_{1} we can assume f_{n}≥ 0 and f≥ 0. Let E be the set where f is infinite, then E=∩_{N}∪_{n} E_{Nn}, where E_{Nn}={x∈ X: f_{n}(x)≥ N. By Chebyshev’s inequality we have
Nµ(E_{Nn}) <  ∫ 
 f_{n} dµ ≤  ∫ 
 f_{n} dµ≤ C, 
then µ(E_{Nn})≤ C/N . Thus µ(E)=lim_{N→∞}lim_{n→∞} µ(E_{Nn})=0.
Thus f is finite a.e.
Proof.[Proof of the Lemma] Necessity: if f is summable then for any set A⊂ X we have ∫_{A} f dµ≤ ∫_{X} f dµ<∞, thus the supremum is finite.
Sufficiency: let sup∫_{A} f dµ=M<∞, define B={x∈ X: f(x)=0} and A_{k}={x∈ X: 2^{k}≤ f(x)<2^{k+1}, k∈ℤ} we have µ(A_{k})<M/2^{k} and X=B⊔(⊔_{k=0}^{∞}A_{k}). Define

Then g(x)≤ f(x) < 2g(x). Function g is a simple function, its summability follows from the estimation ∫_{⊔−nn Ak} g dµ≤∫_{⊔−nn Ak} f dµ≤ M which is valid for any n, taking n→ ∞ we get summability of g. Furthermore, f_{n} →^{a.e.} f and f_{n}(x)≤ f(x) <2g(x), so we use the Lebesgue Thm. 33 on dominated convergence to obtain the conclusion.
Let A be a finite measure set such that f is bounded on A, then
∫ 
 f dµ 

 ∫ 
 f_{n} dµ≤ 
 ∫ 
 f_{n} dµ≤ C. 
This show summability of f by the previous Lemma. The rest of statement and (contrapositive to) the second part follows from the Lebesgue Thm. 33 on dominated convergence.
Now we can extend this result dropping the monotonicity assumption.
Proof.Let us replace the limit f_{n}→ f by two monotonic limits. Define:

Then g_{n} is a nondecreasing sequence of functions and lim_{n→ ∞} g_{n}(x)=f(x) a.e. Since g_{n}≤ f_{n}, from monotonicity of integral we get ∫_{X} g_{n} dµ≤ C for all n. Then Levi’s Thm. 37 implies that f is summable and ∫_{X} f dµ≤ C.
Now we can show that L_{1}(X) is complete:
Proof. It is clear that the distance function d_{1} indeed define a norm f_{1}=d_{1}(f,0). We only need to demonstrate the completeness. We again utilise the procedure from Rem. 7. Take a Cauchy sequence (f_{n}) and building a subsequence if necessary, assume that its quickly convergent that is d_{1}(f_{n},f_{n+1})≤ 1/2^{k}. Put φ_{1}=f_{1} and φ_{n}=f_{n}−f_{n−1} for n>1. The sequence ψ_{n}(x)=∑_{1}^{n}  φ_{k}(x)  is monotonic, integrals ∫_{X} ψ_{n} dµ are bounded by the same constant f_{1}_{1}+1. Thus, by the B. Levi’s Thm. 37 and its proof, ψ_{n}→ ψ for a summable essentially bounded function ψ (the first step is completed). Therefore, the series ∑φ_{k}(x) converges as well to a function f. But, this means that f_{n} →^{a.e.} f. We also notice  f_{n}(x) ≤ ψ(x) . Thus by the Lebesgue Thm. 33 on dominated convergence f∈ L_{1}(X) (the second step is completed). Furthermore,
0≤ 
 ∫ 
 ⎪ ⎪  f_{n}−f  ⎪ ⎪  dµ≤ 

 ⎪⎪ ⎪⎪  φ_{k}  ⎪⎪ ⎪⎪  =0. 
That is, f_{n}→ f in the norm of L_{1}(X). (That completes the third step and the whole proof).
The next important property of the Lebesgue integral is its
absolute continuity.
Proof. If f is essentially bounded by M, then it is enough to set δ=ε/M. In general let:

Then ∫_{X} f  dµ=∑_{0}^{∞}∫_{Ak} f  dµ, thus there is an N such that ∑_{N}^{∞}∫_{Ak} f  dµ=∫_{CN} f  dµ<ε/2. Now put δ =ε/2N+2, then for any A⊂ X with µ(A)<δ:
⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ∫ 
 f dµ  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ≤  ∫ 
 ⎪ ⎪  f  ⎪ ⎪  dµ=  ∫ 
 ⎪ ⎪  f  ⎪ ⎪  dµ+  ∫ 
 ⎪ ⎪  f  ⎪ ⎪  dµ < 
 + 
 =ε. 
It is wellknown geometrical interpretation of an integral in calculus as the “area under the graph”. If we advance from “area” to a “measure” then the Lebesgue integral can be treated as theory of measures of very special shapes created by graphs of functions. This shapes belong to the product spaces of the function domain and its range. We introduced product measures in Defn. 38, now we will study them in same details using the Lebesgue integral. We start from the following
Proof. For any C=A× B∈ S× T let us define f_{C}(x)=χ_{A}(x)ν(B). Then
(µ×ν)(C)=µ(A)ν(B)=  ∫ 
 f_{C} dµ. 
If the same set C has a representation C=⊔_{k} C_{k} for C_{k}∈ S× T, then σadditivity of ν implies f_{C}=∑_{k} f_{Ck}. By the Lebesgue theorem 33 on dominated convergence:
∫ 
 f_{C} dµ= 
 ∫ 
 f_{Ck} dµ. 
Thus
(µ×ν)(C)= 
 (µ×ν)(C_{k}). 
The above correspondence C↦ f_{C} can be extended to the ring
R(S× T) generated by S× T by the formula:
f_{C}= 
 f_{Ck}, for C=⊔_{k} C_{k}∈ R(S× T). 
We have the uniform continuity of this correspondence:
⎪⎪ ⎪⎪  f_{C1}−f_{C2}  ⎪⎪ ⎪⎪  _{1}≤ (µ×ν)(C_{1}▵ C_{2})=d_{1}(C_{1},C_{2}) 
because from the representation C_{1}=A_{1}⊔ B and C_{2}=A_{2}⊔ B, where B=C_{1}∩ C_{2} one can see that f_{C1}−f_{C2}=f_{A1}−f_{A2}, f_{C1▵ C2}=f_{A1}+f_{A2} together with  f_{A1}−f_{A2} ≤ f_{A1}+f_{A2} for nonnegative functions.
Thus the map C↦ f_{C} can be extended to the map of σalgebra L(X× Y) of µ×νmeasurable set to L_{1}(X) by the formula f_{limn Cn}=lim_{n} f_{Cn}.
The following lemma provides the geometric interpretation of the function f_{C} as the size of the slice of the set C along x=const.
Proof. For sets from the ring R(S× T) it is true by the definition. If C^{(n)} is a monotonic sequence of sets, then ν(lim_{n} C_{x}^{(n)})=lim_{n} ν(C_{x}^{(n)}) by σadditivity of measures. Thus the property ν(C_{x})=f_{x}(C) is preserved by monotonic limits. The following result of the separate interest:
Proof.[Proof of Lem. 47] Let C be a measurable set, put C_{n}∈R(S× T) to approximate C up to 2^{−n} in µ×ν. Let C′=∩_{n=1}^{∞}∪_{k =1}^{∞}C_{n+k}, then

Then (µ×ν)(C′▵ C)≤ 2^{1−n} for any n∈ℕ.
Coming back to Lem. 46 we notice that (in the above notations)
f_{C}=f_{C′} almost everywhere. Then:
f_{C}(x) 
 f_{C′}(x)=ν(C′_{x})=ν(C_{x}). 
The following theorem generalizes the meaning of the integral as “area under the graph”.
(µ×ν)(C)=  ∫ 
 f_{C} dµ, (68) 
Proof. If C has a finite measure, then the statement is reduced to Lem. 46 and a passage to limit in (68).
If C has an infinite measure, then there exists a sequence of C_{n}⊂ C, such that ∪_{n} C_{n}=C and (µ×ν)(C_{n})→ ∞. Then f_{C}(x)=lim_{n} f_{Cn} (x) and
∫ 
 f_{Cn} dµ=(µ×ν)(C_{n})→ +∞. 
Thus f_{C} is measurable and nonsummable.
This theorem justify the wellknown technique to calculation of areas
(volumes) as integrals of length (areas) of the sections.
(µ×ν×λ )(C)=  ∫ 
 λ(C_{xy}) d(µ×ν)(x,y)=  ∫ 
 (µ×ν)(C_{z}) dλ(z), (69) 


Proof. From the decomposition f=f_{+}−f_{−} we can reduce our consideration to nonnegative functions. Let us consider the product of three spaces (X,µ), (Y,ν), (ℝ,λ), with λ=dz being the Lebesgue measure on ℝ. Define
C={(x,y,z)∈ X× Y× ℝ: 0≤ z≤ f(x,y)}. 
Using the relation (69) we get:

the theorem follows from those relations.
Here, we consider another topic in the measure theory which benefits from the integration theory.
The above definition seems to be not justifying “absolute continuity” name, but this will become clear from the following important theorem.
ν(A)=  ∫ 
 f dµ, 
Proof.[Sketch of the proof] First we will assume that ν is a measure. Let D be the collection of measurable functions g:X→[0,∞) such that
∫ 
 g dµ ≤ ν(E) (E∈L). 
Let α = sup_{g∈D} ∫_{X} g dµ ≤ ν(X) < ∞. So we can find a sequence (g_{n}) in D with ∫_{X} g_{n} dµ → α.
We define f_{0}(x) = sup_{n} g_{n}(x). We can show that f_{0}=∞ only on a set of µmeasure zero, so if we adjust f_{0} on this set, we get a measurable function f:X→[0,∞). There is now a long argument to show that f is as required.
If ν is a charge, we can find f by applying the previous operation to the measures ν_{+} and ν_{−} (as it is easy to verify that ν_{+},ν_{−}⋘µ).
We show that f is essentially unique. If g is another function inducing ν, then
∫ 
 f−g dµ = ν(E) − ν(E) = 0 (E∈L). 
Let E = {x∈ X : f(x)−g(x)≥ 0}, so as f−g is measurable, E∈L. Then ∫_{E} f−g dµ =0 and f−g≥0 on E, so by our result from integration theory, we have that f−g=0 almost everywhere on E. Similarly, if F = {x∈ X : f(x)−g(x)≤ 0}, then F∈L and f−g=0 almost everywhere on F. As E∪ F=X, we conclude that f=g almost everywhere.
Proof. By the Radon–Nikodym theorem there is a function f∈L_{1}(X,µ) such that ν(A)=∫_{A} f dµ. Then  ν (A)=∫_{A}  f  dµ ad we get the statement from Theorem 43 on absolute continuity of the Lebesgue integral.
In this section we describe various Banach spaces of functions on sets with measure.
Let (X,L,µ) be a measure space. For 1≤ p<∞, we define L_{p}(µ) to be the space of measurable functions f:X→K such that
∫ 
 ⎪ ⎪  f  ⎪ ⎪  ^{p} dµ < ∞. 
We define ·_{p} : L_{p}(µ)→[0,∞) by
⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{p} =  ⎛ ⎜ ⎜ ⎜ ⎜ ⎝  ∫ 
 ⎪ ⎪  f  ⎪ ⎪  ^{p} dµ  ⎞ ⎟ ⎟ ⎟ ⎟ ⎠ 
 (f∈ L_{p}(µ)). 
Notice that if f=0 almost everywhere, then  f ^{p}=0 almost everywhere, and so f_{p}=0. However, there can be nonzero functions such that f=0 almost everywhere. So ·_{p} is not a norm on L_{p}(µ).
∫ 
 ⎪ ⎪  fg  ⎪ ⎪  dµ ≤  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{p}  ⎪⎪ ⎪⎪  g  ⎪⎪ ⎪⎪  _{q}. (71) 
Proof. Recall that we know from Lem. 2 that
⎪ ⎪  ab  ⎪ ⎪  ≤ 
 + 
 (a,b∈K). 
Now we follow the steps in proof of Prop. 4. Define measurable functions a,b:X→K by setting
a(x) = 
 , b(x) = 
 (x∈ X). 
So we have that
⎪ ⎪  a(x) b(x)  ⎪ ⎪  ≤ 
 + 
 (x∈ X). 
By integrating, we see that
∫ 
 ⎪ ⎪  ab  ⎪ ⎪  dµ ≤ 
 ∫ 
 ⎪ ⎪  f  ⎪ ⎪  ^{p} dµ + 
 ∫ 
 ⎪ ⎪  g  ⎪ ⎪  ^{q} dµ = 
 + 
 = 1. 
Hence, by the definition of a and b,
∫ 
 ⎪ ⎪  fg  ⎪ ⎪  ≤  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{p}  ⎪⎪ ⎪⎪  g  ⎪⎪ ⎪⎪  _{q}, 
as required.
Proof. Part 1 is easy. For 2, we need a version of Minkowski’s Inequality, which will follow from the previous lemma. We essentially repeat the proof of Prop. 5.
Notice that the p=1 case is easy, so suppose that 1<p<∞. We have that

Applying the lemma, this is
≤  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{p}  ⎛ ⎜ ⎜ ⎜ ⎜ ⎝  ∫ 
 ⎪ ⎪  f+g  ⎪ ⎪  ^{q(p−1)} dµ  ⎞ ⎟ ⎟ ⎟ ⎟ ⎠ 
 +  ⎪⎪ ⎪⎪  g  ⎪⎪ ⎪⎪  _{p}  ⎛ ⎜ ⎜ ⎜ ⎜ ⎝  ∫ 
 ⎪ ⎪  f+g  ⎪ ⎪  ^{q(p−1)} dµ  ⎞ ⎟ ⎟ ⎟ ⎟ ⎠ 
 . 
As q(p−1)=p, we see that
⎪⎪ ⎪⎪  f+g  ⎪⎪ ⎪⎪  _{p}^{p} ≤  ⎛ ⎝  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{p} +  ⎪⎪ ⎪⎪  g  ⎪⎪ ⎪⎪  _{p}  ⎞ ⎠  ⎪⎪ ⎪⎪  f+g  ⎪⎪ ⎪⎪  _{p}^{p/q}. 
As p−p/q = 1, we conclude that
⎪⎪ ⎪⎪  f+g  ⎪⎪ ⎪⎪  _{p} ≤  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{p} +  ⎪⎪ ⎪⎪  g  ⎪⎪ ⎪⎪  _{p}, 
as required.
In particular, if f,g∈ L_{p}(µ) then af+g∈ L_{p}(µ), showing that L_{p}(µ) is a vector space.
We define an equivalence relation ∼ on the space of measurable functions by setting f∼ g if and only if f=g almost everywhere. We can check that ∼ is an equivalence relation (the slightly nontrivial part is that ∼ is transitive).
Proof. We need to show that addition, and scalar multiplication, are welldefined on L_{p}(µ)/∼. Let a∈K and f_{1},f_{2},g_{1},g_{2}∈ L_{p}(µ) with f_{1}∼ f_{2} and g_{1}∼ g_{2}. Then it’s easy to see that af_{1}+g_{1} ∼ af_{2}+g_{2}; but this is all that’s required!
If f ∼ g then  f ^{p} =  g ^{p} almost everywhere, and so f_{p} = g_{p}. So ·_{p} is welldefined on equivalence classes. In particular, if f∼ 0 then f_{p}=0. Conversely, if f_{p}=0 then ∫_{X}  f ^{p} dµ=0, so as  f ^{p} is a positive function, we must have that  f ^{p}=0 almost everywhere. Hence f=0 almost everywhere, so f∼ 0. That is,
⎧ ⎨ ⎩  f∈ L_{p}(µ) : f∼ 0  ⎫ ⎬ ⎭  =  ⎧ ⎨ ⎩  f∈ L_{p}(µ) :  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{p}=0  ⎫ ⎬ ⎭  . 
It follows from the above lemma that this is a subspace of L_{p}(µ).
The above lemma now immediately shows that ·_{p} is a norm on L_{p}(µ)/∼.
We will abuse notation and continue to write members of L_{p}(µ) as functions. Really they are equivalence classes, and so care must be taken when dealing with L_{p}(µ). For example, if f∈ L_{p}(µ), it does not make sense to talk about the value of f at a point.
Proof. Consider first the case of a finite measure space X. Let f_{n} be a Cauchy sequence in L_{p}(µ). From the Hölder inequality (71) we see that f_{n}−f_{m}_{1}≤ f_{n}−f_{m}_{p} (µ(X))^{1/q}. Thus, f_{n} is also a Cauchy sequence in L_{1}(µ). Thus by the Theorem 42 there is the limit function f∈ L_{1}(µ). Moreover, from the proof of that theorem we know that there is a subsequence f_{nk} of f_{n} convergent to f almost everywhere. Thus in the Cauchy sequence inequality
∫ 
 ⎪ ⎪  f_{nk} −f_{nm}  ⎪ ⎪  ^{p} dµ <ε 
we can pass to the limit m→ ∞ by the Fatou Lemma 39 and conclude:
∫ 
 ⎪ ⎪  f_{nk} −f  ⎪ ⎪  ^{p} dµ <ε. 
So, f_{nk} converges to f in L_{p}(µ), then f_{n} converges to f in L_{p}(µ) as well.
For a σfinite measure µ we represent X=⊔_{k} X_{k} with µ(X_{k})<+∞ for all k. The restriction (f_{n}^{(k)}) of a Cauchy sequence (f_{n})⊂L_{p}(X,µ) to every X_{k} is a Cauchy sequence in L_{p}(X_{k},µ). By the previous paragraph there is the limit f^{(k)}∈ L_{p}(X_{k},µ). Define a function f∈L_{p}(X,µ) by the identities f(x)=f^{(k)} if x∈ X_{k}. By the additivity of integral, the Cauchy condition on (f_{n}) can be written as:
∫ 
 ⎪ ⎪  f_{n}−f_{m}  ⎪ ⎪  ^{p} dµ= 
 ∫ 
 ⎪ ⎪  f_{n}^{(k)}−f_{m}^{(k)}  ⎪ ⎪  ^{p} dµ<ε. 
It implies for any M:
 ∫ 
 ⎪ ⎪  f_{n}^{(k)}−f_{m}^{(k)}  ⎪ ⎪  ^{p} dµ<ε. 
In the last inequality we can pass to the limit m→ ∞:
 ∫ 
 ⎪ ⎪  f_{n}^{(k)}−f^{(k)}  ⎪ ⎪  ^{p} dµ<ε. 
Since the last inequality is independent from M we conclude:
∫ 
 ⎪ ⎪  f_{n}−f  ⎪ ⎪  ^{p} dµ= 
 ∫ 
 ⎪ ⎪  f_{n}^{(k)}−f^{(k)}  ⎪ ⎪  ^{p} dµ<ε. 
Thus we conclude that f_{n}→ f in L_{p}(X,µ).
F:L_{p}(µ)→K, g ↦  ∫ 
 fg dµ (g∈L_{p}(µ)). 
Proof. This proof very similar to proof of Thm. 13. For f∈ L_{q}(µ) and g∈ L_{p}(µ), it follows by the Hölder’s Inequality (71), that fg is summable, and
⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ∫ 
 fg dµ  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ≤  ∫ 
 ⎪ ⎪  fg  ⎪ ⎪  dµ ≤  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{q}  ⎪⎪ ⎪⎪  g  ⎪⎪ ⎪⎪  _{p}. 
Let f_{1},f_{2}∈ L_{q}(µ) and g_{1},g_{2}∈ L_{p}(µ) with f_{1}∼ f_{2} and g_{1}∼ g_{2}. Then f_{1}g_{1} = f_{2}g_{1} almost everywhere and f_{2}g_{1} = f_{2}g_{2} almost everywhere, so f_{1}g_{1} = f_{2}g_{2} almost everywhere, and hence
∫ 
 f_{1}g_{1} dµ =  ∫ 
 f_{2}g_{2} dµ. 
So Φ is welldefined.
Clearly Φ is linear, and we have shown that Φ(f) ≤ f_{q}.
Let f∈ L_{q}(µ) and define g:X→K by
g(x) = 

Then  g(x)  =  f(x) ^{q−1} for all x∈ X, and so
∫ 
 ⎪ ⎪  g  ⎪ ⎪  ^{p} dµ =  ∫ 
 ⎪ ⎪  f  ⎪ ⎪  ^{p(q−1)} dµ =  ∫ 
 ⎪ ⎪  f  ⎪ ⎪  ^{q} dµ, 
so g_{p} = f_{q}^{q/p}, and so, in particular, g∈L_{p}(µ). Let F=Φ(f), so that
F(g) =  ∫ 
 fg dµ =  ∫ 
 ⎪ ⎪  f  ⎪ ⎪  ^{q} dµ =  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{q}^{q}. 
Thus F ≥ f_{q}^{q} / g_{p} = f_{q}. So we conclude that F = f_{q}, showing that Φ is an isometry.
F(g) =  ∫ 
 fg dµ (g∈L_{p}(µ)). 
Proof.[Sketch of the proof] As µ(X)<∞, for E∈L, we have that χ_{E}_{p} = µ(E)^{1/p} < ∞. So χ_{E}∈L_{p}(µ), and hence we can define
ν(E) = F(χ_{E}) (E∈L). 
We proceed to show that ν is a signed (or complex) measure. Then we can apply the RadonNikodym Theorem 53 to find a function f:X→K such that
F(χ_{E}) = ν(E) =  ∫ 
 f dµ (E∈L). 
There is then a long argument to show that f∈ L_{q}(µ), which we skip here. Finally, we need to show that
∫ 
 fg dµ = F(g) 
for all g∈ L_{p}(µ), and not just for g=χ_{E}. That follows for simple functions with a finite set of values by linearity of the Lebesgue integral and F. Then, it can be extended by continuity to the entire space L_{p}(µ) in view in the following Prop. 12.
We note that f∈L_{p}(X) if and only if  f ^{p} is summable, thus we can use all results from Section 13 to investigate L_{p}(X).
Proof.Let f∈L_{p}(µ), and suppose for now that f≥0. For each n∈ℕ, let
f_{n} = min(n, n ⌊ n f ⌋). 
Then each f_{n} is simple, f_{n} ↑ f, and  f_{n}−f ^{p}→0 pointwise. For each n, we have that
0 ≤ f_{n} ≤ f 0 ≤ f−f_{n} ≤ f, 
so that  f−f_{n} ^{p} ≤  f ^{p} for all n. As ∫ f ^{p} dµ<∞, we can apply the Dominated Convergence Theorem to see that
 ∫ 
 ⎪ ⎪  f_{n}−f  ⎪ ⎪  ^{p} dµ = 0, 
that is, f_{n}−f_{p} → 0.
The general case follows by taking positive and negative parts, and if K=ℂ, by taking real and imaginary parts first.
Let ([0,1],L,µ) be the restriction of Lebesgue
measure to [0,1]. We often write L_{p}([0,1]) instead
of L_{p}(µ).
Proof. As [0,1] is a finite measure space, and each member of C_{K}([0,1]) is bounded, it is easy to see that each f∈ C_{K}([0,1]) is such that f_{p}<∞. So it makes sense to regard C_{K}([0,1]) as a subspace of L_{p}(µ). If C_{K}([0,1]) is not dense in L_{p}(µ), then we can find a nonzero F∈L_{p}([0,1])^{*} with F(f)=0 for each f∈ C_{K}([0,1]). This was a corollary of the HahnBanach theorem 15.
So there exists a nonzero g∈ L_{q}([0,1]) with
∫ 
 fg dµ = 0 (f∈ C_{K}([0,1])). 
Let a<b in [0,1]. By approximating χ_{(a,b)} by a continuous function, we can show that ∫_{(a,b)} g dµ = ∫ g χ_{(a,b)} dµ = 0.
Suppose for now that K=ℝ. Let A = { x∈[0,1] : g(x)≥0 } ∈ L. By the definition of the Lebesgue (outer) measure, for є>0, there exist sequences (a_{n}) and (b_{n}) with A ⊆ ∪_{n} (a_{n},b_{n}), and ∑_{n} (b_{n}−a_{n}) ≤ µ(A) + є.
For each N, consider ∪_{n=1}^{N} (a_{n},b_{n}). If some (a_{i},b_{i}) overlaps (a_{j},b_{j}), then we could just consider the larger interval (min(a_{i},a_{j}), max(b_{i},b_{j})). Formally by an induction argument, we see that we can write ∪_{n=1}^{N} (a_{n},b_{n}) as a finite union of some disjoint open intervals, which we abusing notations still denote by (a_{n},b_{n}). By linearity, it hence follows that for N∈ℕ, if we set B_{N} = ⊔_{n=1}^{N} (a_{n},b_{n}), then
∫  g χ_{BN} dµ =  ∫  g χ_{(a1,b1)⊔⋯⊔(aN,bN)} dµ = 0. 
Let B=∪_{n} (a_{n},b_{n}), so A⊆ B and µ(B) ≤ ∑_{n} (b_{n}−a_{n}) ≤ µ(A)+є. We then have that
⎪ ⎪  ∫  g χ_{BN} dµ −  ∫  g χ_{B} dµ  ⎪ ⎪  =  ⎪ ⎪  ∫  g χ_{B∖ (a1,b1)⊔⋯⊔(aN,bN)} dµ  ⎪ ⎪  . 
We now apply Hölder’s inequality to get

We can make this arbitrarily small by making N large. Hence we conclude that
∫  g χ_{B} dµ=0. 
Then we apply Hölder’s inequality again to see that
⎪ ⎪  ∫  gχ_{A} dµ  ⎪ ⎪  =  ⎪ ⎪  ∫  gχ_{A} dµ −  ∫  gχ_{B} dµ  ⎪ ⎪  =  ⎪ ⎪  ∫  g χ_{B∖ A} dµ  ⎪ ⎪  ≤  ⎪⎪ ⎪⎪  g  ⎪⎪ ⎪⎪  _{q} µ(B∖ A)^{1/p} ≤  ⎪⎪ ⎪⎪  g  ⎪⎪ ⎪⎪  _{q} є^{1/p}. 
As є>0 was arbitrary, we see that ∫_{A} g dµ=0. As g is positive on A, we conclude that g=0 almost everywhere on A.
A similar argument applied to the set {x∈[0,1] : g(x)≤0} allows us to conclude that g=0 almost everywhere. If K=ℂ, then take real and imaginary parts.
Let K be a compact (always assumed Hausdorff) topological space.
Notice that if f:K→K is a continuous function, then clearly f is B(K)measurable (the inverse image of an open set will be open, and hence certainly Borel). So if µ:B(K)→K is a finite real or complex charge (for K=ℝ or K=ℂ respectively), then f will be µsummable (as f is bounded) and so we can define
φ_{µ}:C_{K}(K) → K, φ_{µ}(f) =  ∫ 
 f dµ (f∈ C_{K}(K)). 
Clearly φ_{µ} is linear. Suppose for now that µ is positive, so that
⎪ ⎪  φ_{µ}(f)  ⎪ ⎪  ≤  ∫ 
 ⎪ ⎪  f  ⎪ ⎪  dµ ≤  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{∞} µ(K) (f∈ C_{K}(K)). 
So φ_{µ}∈ C_{K}(K)^{*} with φ_{µ}≤ µ(K).
The aim of this section is to show that all of C_{K}(K)^{*} arises in this way. First we need to define a class of measures which are in a good agreement with the topological structure.

Note the similarity between this notion and definition of outer measure.
µ(∅)=0, µ( 
 )=1, µ(A)=+∞, 
µ(A)=  ⎧ ⎨ ⎩ 

The following subspace of the space of all simple functions is helpful.
The regularity of the Lebesgue measure allows to make a stronger version of Prop. 12.
Proof. By Prop. 12, for a given f∈L_{1}(ℝ) and ε>0 there exists a simple function f_{0}=∑_{k=1}^{n} c_{n} χ_{Ak} such that f−f_{0}_{1}<ε/2. By regularity of the Lebesgue measure, for every k there is an open set C_{k}⊃ A_{k} such that 0<µ(C_{k})−µ(A_{k})<ε/2^{n} c_{k} . Clearly, C_{k}=⊔_{j} (a_{jk},b_{jk}). We define a step function f_{1}=∑_{k=1}^{n} c_{k} χ_{Ck}=∑_{k=1}^{n}∑_{j} c_{k} χ_{[ajk,bjk]}, then f_{0}−f_{1}_{1}≤ ∑_{k=1}^{n} c_{k} ε/2^{n} c_{k} =ε/2. Thus f−f_{1}_{1}<ε.
As we are working only with compact spaces, for us, “compact” is the same as “closed”. Regular measures somehow interact “well” with the underlying topology on K.
We let M_{ℝ}(K) and M_{ℂ}(K) be the collection of all finite, regular real or complex charges (that is, signed or complex measures) on B(K).
Recall, Defn. 29, that for µ∈ M_{K}(K) we define the variation of µ
⎪⎪ ⎪⎪  µ  ⎪⎪ ⎪⎪  = sup  ⎧ ⎪ ⎨ ⎪ ⎩ 
 ⎪ ⎪  µ(A_{n})  ⎪ ⎪  ⎫ ⎪ ⎬ ⎪ ⎭  , 
where the supremum is taken over all sequences (A_{n}) of pairwise disjoint members of B(K), with ⊔_{n} A_{n}=K. Such (A_{n}) are called partitions.
Proof. If µ=0 then clearly µ=0. If µ=0, then for A∈B(K), let A_{1}=A, A_{2}=K∖ A and A_{3}=A_{4}=⋯=∅. Then (A_{n}) is a partition, and so
0 = 
 ⎪ ⎪  µ(A_{n})  ⎪ ⎪  =  ⎪ ⎪  µ(A)  ⎪ ⎪  +  ⎪ ⎪  µ(K∖ A)  ⎪ ⎪  . 
Hence µ(A)=0, and so as A was arbitrary, we have that µ=0.
Clearly aµ =  a µ for a∈K and µ∈ M_{K}(K).
For µ,λ∈ M_{K}(K) and a partition (A_{n}), we have that
 ⎪ ⎪  (µ+λ)(A_{n})  ⎪ ⎪  = 
 ⎪ ⎪  µ(A_{n})+λ(A_{n})  ⎪ ⎪  ≤ 
 ⎪ ⎪  µ(A_{n})  ⎪ ⎪  + 
 ⎪ ⎪  λ(A_{n})  ⎪ ⎪  ≤  ⎪⎪ ⎪⎪  µ  ⎪⎪ ⎪⎪  +  ⎪⎪ ⎪⎪  λ  ⎪⎪ ⎪⎪  . 
As (A_{n}) was arbitrary, we see that µ+λ ≤ µ + λ.
To get a handle on the “regular” condition, we need to know a little more about C_{K}(K).
Proof. See a book on (point set) topology.
µ(U) = sup  ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩  ∫ 
 f dµ : f∈ C_{ℝ}(K), 0≤ f≤χ_{U}  ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭  . 
Proof. If 0≤ f≤χ_{U}, then
0 =  ∫ 
 0 dµ ≤  ∫ 
 f dµ ≤  ∫ 
 χ_{U} dµ = µ(U). 
Conversely, let F=K∖ U, a closed set. Let E⊆ U be closed. By Urysohn Lemma 21, there exists f:K→[0,1] continuous with f(E)={1} and f(F)={0}. So χ_{E} ≤ f ≤ χ_{U}, and hence
µ(E) ≤  ∫ 
 f dµ ≤ µ(U). 
As µ is regular,
µ(U) = sup  ⎧ ⎨ ⎩  µ(E) : E⊆ U closed  ⎫ ⎬ ⎭  ≤ sup  ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩  ∫ 
 f dµ : 0≤ f≤χ_{U}  ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭  ≤ µ(U). 
Hence we have equality throughout.
The next result tells that the variation coincides with the norm on
real charges viewed as linear functionals on C_{ℝ}(K).
⎪⎪ ⎪⎪  µ  ⎪⎪ ⎪⎪  =  ⎪⎪ ⎪⎪  φ_{µ}  ⎪⎪ ⎪⎪  := sup  ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ∫ 
 f dµ  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  : f∈ C_{ℝ}(K),  ⎪⎪ ⎪⎪  f  ⎪⎪ ⎪⎪  _{∞}≤ 1  ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭  . 
Proof. Let (A,B) be a Hahn decomposition (Thm. 34) for µ. For f∈ C_{ℝ}(K) with f_{∞}≤ 1, we have that

using the fact that µ(B)≤0 and that (A,B) is a partition of K.
Conversely, as µ is regular, for є>0, there exist closed sets E and F with E⊆ A, F⊆ B, and with µ_{+}(E)> µ_{+}(A)−є and µ_{−}(F)>µ_{−}(B)−є. By Urysohn Lemma 21, there exists f:K→[0,1] continuous with f(E)={1} and f(F)={0}. Let g=2f−1, so g is continuous, g takes values in [−1,1], and g(E)={1}, g(F)={−1}. Then

As E⊆ A, we have µ(E) = µ_{+}(E), and as F⊆ B, we have −µ(F)=µ_{−}(F). So

As є>0 was arbitrary, we see that φ_{µ} ≥  µ(A) + µ(B) =µ.
Thus, we know that M_{ℝ}(K) is
isometrically embedded in C_{ℝ}(K)^{*}.
To facilitate an approach to the key point of this Subsection we will require some more definitions.
Proof. For any function f such that f_{∞}≤ 1 the function 1−f is non negative thus: F(1)−F(f)=F(1−f)>0, Thus F(1)>F(f), that is F is bounded and its norm is F(1).
So for a positive functional you know the exact place where to spot
its norm, while a linear functional can attain its norm in an
generic point (if any) of the unit ball in C(X). It is also
remarkable that any bounded linear functional can be represented by a
pair of positive ones.