Mathematics Prelims

December 22, 2008

Convergence in Probability/Measure

Filed under: Measure Theory,Probability Theory — cjohnson @ 11:05 pm

If X_n : \Omega \to \mathbb{R} is a random variable on the probability space (\Omega, \mathcal{F}, P), then one way to define the convergence of a sequence of random variables (X_n)_{n \in \mathbb{N}} is to simply define the limit pointwise, as you would do with any old function.  That is, if there exists a random variable X such that for every \omega \in \Omega we have

\displaystyle X_n(\omega) \to X(\omega)

Similarly, we say that X_n \to X almost everywhere (almost surely) if the measure (probability) of the set of points where X_n(\omega) \not\to X(\omega) is zero.

Another way to define convergence of measurable functions in general is to talk about convergence in measure.  This is a notion of convergence where the set of points where the functions in the sequence do not get arbitrarily close to the corresponding point in the limit function has measure zero.  That is if for each \delta > 0,

\displaystyle P(\{\omega \in \Omega : |X_n(\omega) - X(\omega)| > \delta\}) \to 0

(Here we’re using random variables and a probability space, but the same thing is used for measurable functions of general measure spaces.)  To say that X_n converges to X in probability (denoted X_n \xrightarrow{P} X), is to say that for any \epsilon > 0, there exists an N \in \mathbb{N} such that for every n > N, the set of points where X_n isn’t “close” to X (here “close” means within \delta distance of X) is less than \epsilon.

As noted in Cohn, convergence in measure does not imply convergence pointwise (not even almost everywhere!), or vice versa.  For instance, the set of functions f_n = 1_{[n, \infty)} pointwise converges to the zero function, but does not converge in measure.   However, in the case of a finite measure (e.g., a probability) we have the following proposition.

Proposition If (\Omega, \mathcal{F}, P) is a measure space with P(\Omega) finite, then if (X_n)_{n \in \mathbb{N}} is a sequence of measurable functions with X_n \to X a.e., then X_n \xrightarrow{P} X.

Proof: Let \delta > 0 given.  Define A_n = \{ \omega \in \Omega : |X_n(\omega) - X(\omega)| > \delta \} and B_n = \bigcup_{k=n}^\infty A_k.  Clearly B_n forms a decreasing sequence whose intersection is a subset of the points \omega \in \Omega for which X_n(\omega) \not\to \Omega, by assumption this set has measure zero, P(\bigcap B_n) = 0, which gives that the functions converge in measure to X.

(These examples and proofs are from Donald L. Cohn’s “Measure Theory.”)

December 19, 2008

Expected Value of a Random Variable

Filed under: Measure Theory,Probability Theory — cjohnson @ 8:27 pm

If X : \Omega \to \mathbb{R} is a random variable on the probability space (\Omega, \mathcal{F}, P), then the expected value (aka expectation) of X is, in essence, a weighted average of of the values X may take on.  In the case of simple random variables this weighted average is precisely the expectation.  For a general random variable we’re actually taking a limit of the expectations of simple random variables that converge to the random variable of interest.

Assuming X is a non-negative simple random variable of the form

\displaystyle X(\omega) = \sum_{i=1}^n \alpha_i 1_{A_i}(\omega)

where A_i \in \mathcal{F}, and for the sake of simplicity that \alpha_1 < \alpha_2 < ... < \alpha_n.  The expected value of X, denoted \mathbb{E}[X], is

\displaystyle \mathbb{E}[X] = \sum_{i=1}^n \alpha_i P(A_i).

When X is not a simple random variable, we take \mathbb{E}[X] to be the supremum of the expectations of simple random variables less than X.  That is,

\displaystyle \mathbb{E}[X] = \sup \left\{ \mathbb{E}[Z] : 0 \leq Z \leq X \right\}

where each Z is a simple random variable.  When X is not non-negative, we break X into its positive and negative parts, X^{+} and X^{-}, and provided \mathbb{E}[X^{+}] and \mathbb{E}[X^{-}] are not both infinite, we take the expectation to be

\displaystyle \mathbb{E}[X] = \mathbb{E}[X^{+}] - \mathbb{E}[X^{-}].

Notice that our definition of \mathbb{E}[X] is simply the Lebesgue integral of X with respect to the probability measure P over all of \Omega.

\displaystyle \mathbb{E}[X] = \int_{\Omega} X dP

As such, useful properties of Lebesgue integrals, such as linearity, and powerful theorems like monotone and dominated convergence, hold for expectations.  One slightly less obvious nice property is that if F is the distribution function of X, then we have

\displaystyle \mathbb{E}[X] = \int_{\Omega} X dP = \int_{-\infty}^\infty x \, dF(x)

where the integral on the right refers to the Lebesgue-Stieltjes integral using the probability measure obtained from F, or simply the usual Riemann-Stieltjes integral in those cases that it exists.  To see this, simply show that the result holds for (non-negative) simple random variables, note that for a general non-negative random variable there exists an increasing sequence of non-negative simple random variables that converges to it, and apply the monotone convergence theorem.  For general random variables, break the function into its positive and negative parts and apply the previous result.

November 1, 2008

Distribution of a Random Variable

Filed under: Measure Theory,Probability Theory — cjohnson @ 12:27 pm

Let (\Omega, \mathcal{F}, P) be a probability space and X : \Omega \to \mathbb{R} a random variable.  Define a function F(x) as the probability X takes on a value less than or equal to x.

\displaystyle F(x) := P(X^{-1}((-\infty, x])

That is, F(x) is the measure of the set of \omega \in \Omega such that X(\omega) \leq x.  Notice that F(x) is an increasing function as a \leq b implies (-\infty, a] \subseteq (-\infty, b] and so

\displaystyle X^{-1}((-\infty, a]) = \{ \omega \in \Omega : X(\omega) \leq a \}

\displaystyle \, \, \subseteq \{ \omega \in \Omega : X(\omega) \leq b \}

\displaystyle \, \, = X^{-1}((-\infty, b])

And since a measure is monotonic (i.e., A \subseteq B \implies P(A) \leq P(B)), we have that F(a) \leq F(b).

Recall that if (E_n)_{n \in \mathbb{N}} is a decreasing sequence of measurable sets (i.e., E_n \supseteq E_{n+1}), then P\left(\bigcap_{n \in \mathbb{N}} E_n\right) = \lim_{n \to \infty} P(E_n), provided there exsists an E_n with P(E_n) finite (in our case P is a probability measure, so this is given to us anyway).  We can use this fact to show that our F function is right-continuous.

Let (x_n)_{n \in \mathbb{N}} be a real-valued sequence that decreases to x.  Then we have that

\displaystyle \lim_{n \to \infty} F(x_n) = \lim_{n \to \infty} P(X^{-1}((-\infty, x_n]))

\displaystyle = P \left( \bigcap_{n = 1}^\infty X^{-1}((-\infty, x_n])\right)

\displaystyle = P \left( X^{-1} \left( \bigcap_{n=1}^\infty (-\infty, x_n] \right) \right)

\displaystyle = P(X^{-1}((-\infty, x]))

\displaystyle = F(x)

This shows that \displaystyle \lim_{x \to x_0^+} F(x) = F(x_0) and so F is right-continuous.

Additionally, \lim_{x \to \infty} F(x) = 1 and \lim_{x \to -\infty} F(x) = 0.

The function F that we’ve described is known as the cumulative distribution function (or just distribution function) of our random variable X.  If F(x) only takes on countably many values, then we say that X is a discrete random variable.  If F(x) is a continuous function, we say X is a continuous random variable.

Now that we’ve seen how to construct this distribution function given a random variable X, the natural question to ask is if we have a function that satisfies the properties of a distribution that we’ve listed, is there a corresponding random variable?

Suppose that F(x) is an increasing, right-continuous function with \lim_{x \to -\infty} F(x) = 0 and \lim_{x \to \infty} F(x) = 1.  Using the ideas from the Lebesgue-Stieltjes measure article, we have that F(x) gives us a measure and sigma-algebra on \mathbb{R}.  Let P_F and \mathcal{F}_F be the measure and sigma-algebra, respectively, that we get from F(x).  It follows from the fact that \lim_{x \to \infty} F(x) = 1 and \lim_{x \to -\infty} F(x) = 0 that P_F(\mathbb{R}) = 1, and so (\mathbb{R}, \mathcal{F}_F, P_F) is a probability space.

Define X : \mathbb{R} \to \mathbb{R} as x \mapsto x.  Note that this is a random variable as

\displaystyle X^{-1}((-\infty, x]) = (-\infty, x]

To see that X is a measurable function (random variable), let A \subseteq \mathbb{R} and \epsilon > 0.  There exists a sequence (I_n)_{n \in \mathbb{N}} of intervals that covers A and

\displaystyle P_F^*(A) \leq \sum_{n=1}^\infty \ell_F(I_N) \leq P_F^*(A) + \epsilon

Define J_n := I_n \cap (-\infty, t] and K_n := I_n \cap (t, \infty); note that \bigcup_n J_n and \bigcup_n K_n cover A \cap (-\infty, t) and A \cap (t, \infty), respectively.  Also notice that \ell_F(I_n) = \ell_F(J_n) + \ell_F(K_n).  This gives us

\displaystyle P_F^*(A) \leq \sum_{n=1}^\infty \ell_F(J_n) + \sum_{n=1}^\infty \ell_F(K_n) \leq P_F^*(A) + \epsilon

\displaystyle \implies P_F^*(A) \leq P_F^*(A \cap (-\infty, t]) + P_F^*(A \cap (t, \infty)) \leq P_F^*(A) + \epsilon

\displaystyle \implies P_F^*(A) = P_F^*(A \cap (-\infty, t]) + P_F^*(A \cap (-\infty, t]^\complement)

so X is a measurable function (random variable).  This means that any function F which is increasing, right-continuous, and \lim_{x \to -\infty} F(x) = 0 and \lim_{x \to \infty} F(X) = 1, is the distribution of some random variable.

October 27, 2008

The Lebesgue-Stieltjes Measure

Filed under: Analysis,Measure Theory — cjohnson @ 5:19 pm

Consider an increasing, right-continuous function F : \mathbb{R} \to \mathbb{R}.  We can measure the length of an interval I in \mathbb{R} with end points a and b (e.g., {}[a, b]) as

\displaystyle \ell_F(I) = F(b) - \lim_{x \to a^-} F(a).

(Capinski uses \ell_F(I) = F(b) - F(a) with the interval I restricted to being of the form (a, b], but I believe this gives the same measure in the end.)

Using this definition of the length of an interval, we can then construct an outer measure on \mathbb{R}, call it \mu_F^*, as follows.

\displaystyle \mu_F^*(A) = \inf\left\{ \sum_{n=1}^\infty \ell_F(I_n) : A \subseteq \bigcup_{n=1}^\infty I_n \right\}

Where each I_n is a bounded interval.  Proceeding as we would in defining the usual Lebesgue measure on \mathbb{R}, we will let \mu_F be a measure on \mathcal{M}_F where

\displaystyle \mathcal{M}_F = \{ E \subseteq \mathbb{R} : \forall A \subseteq \mathbb{R}, \, \mu_F^*(A) = \mu_F^*(A \cap E) + \mu_F^*(A \cap E^\complement) \}

Now we’ve gone from an increasing, right-continuous function to a measure on \mathbb{R}.  Note that sets that were null with the Lebesgue measure, may not be anymore, depending on our choice of F.  For instance, if we have

\displaystyle F(X) = \left\{ \begin{array}{ll} x & : x < 0 \\ 1 + x &: x \geq 0 \end{array}\right.

Then \mu_F(\{0\}) = 1, though with the standard Lebesgue measure we have \mu(\{0\}) = 0.

It will be convenient to have the convention that if F is an increasing, right-continuous function that

\displaystyle \int_E g \, dF

is actually short-hand for the Lebesgue integral of g over E using the measure obtained from F as we’ve described above.  This is normally referred to as the Lebesgue-Stieltjes integral with integrator F.

October 26, 2008

Random Variables

Filed under: Measure Theory,Probability Theory,Topology — cjohnson @ 8:32 pm

If (\Omega, \mathcal{F}, P) is a probability space and (Y, \Sigma) is a measurable space (i.e., a set Y along with a sigma-algebra \Sigma on Y), then a random variable is a measurable function X : \Omega \to Y.  That is, for each A \in \Sigma, we have X^{-1}(A) = \{ \omega \in \Omega : X(\omega) \in A \} \in \mathcal{F}.  Generally speaking, we’ll be taking (Y, \Sigma) to be (\mathbb{R}, \mathcal{B}) where \mathcal{B} is the Borel algebra on \mathbb{R}.

Given a topological space (S, \tau), there exists a sigma-algebra \mathcal{B}(S), called the Borel algebra on S, that contains all open sets (members of \tau) and is the smallest such sigma-algebra.  This means that for each \mathcal{O} \in \tau we have \mathcal{O} \in \mathcal{B}(S) and also if \Sigma is another sigma-algebra on S with this property, then \mathcal{B}(S) \subseteq \Sigma.  In general, given a collection C of subsets of S, there exists a sigma-algebra, which we’ll call \mathcal{F}_C, that is the smallest sigma-algebra on S (smallest using \subseteq as the ordering relation) containing each c \in C; we say that \mathcal{F}_C is the sigma-algebra generated by C.  In this sense, the Borel algebra on S is the sigma-algebra generated by the topology \tau.  We will usually just write \mathcal{B} to mean \mathcal{B}(\mathbb{R}).

In the special case of (\mathbb{R}, \mathcal{B}), there’s an easy way to check to see that a function X : \Omega \to \mathbb{R} is a random variable (or a measurable function in general): we just look at the pre-images of intervals.  Since the pre-images of functions are well-behaved with respect to set operations like union, intersection, and complement, it in fact suffices to only consider pre-images of the form {}[a, \infty).  That is, if we show that X^{-1}([a, \infty)) \in \mathcal{F} for every a \in \mathbb{R}, we will have shown that X is measurable, and so a random variable.  (Actually, we can look at all intervals of the form (a, \infty), {}[a, \infty), (-\infty, a) or (-\infty, a].  Using properties of sigma-algebras we can easily show that if we have all intervals of any of these forms, we have all intervals of any other form.  Again, using properties of sigma-algebras it’s easy to take that and show that we have all countable unions of intervals — namely all countable unions of open intervals, i.e., all open sets.)

To see this, suppose for every a \in \mathbb{R} we have X^{-1}([a, \infty)) \in \mathcal{F}.

\displaystyle X^{-1}([a, \infty)) \in \mathcal{F}

\displaystyle \implies \{ \omega \in \Omega : X(\omega) \geq a \} \in \mathcal{F}

\displaystyle \implies \{ \omega \in \Omega : X(\omega) \geq a \}^\complement \in \mathcal{F}

\displaystyle \implies \{ \omega \in \Omega : X(\omega) < a \} \in \mathcal{F}

\displaystyle \implies X^{-1}((-\infty, a)) \in \mathcal{F}

So now we have the pre-images of all intervals of the form {}[a, \infty) and (-\infty, a).  If we can also get pre-images for the form (a, \infty), it’ll be an easy jump to countable unions of open intervals.  Note that

\displaystyle (a, \infty) = \bigcup_{n=1}^\infty \left[ a + \frac{1}{n}, \infty \right)

Now we easily see that

\displaystyle X^{-1}((a, \infty)) = X^{-1} \left( \bigcup_{n=1}^\infty \left[ a + \frac{1}{n}, \infty \right)\right)

\displaystyle = \bigcup_{n=1}^\infty X^{-1}\left(\left[ a + \frac{1}{n}, \infty \right)\right)

And we have pre-images of intervals of the form (a, \infty).  Combined with the fact we have intervals of the form (-\infty, b), it’s easy to see that we also have intervals of the form (a, b).  Using properties of sigma-algebras, it’s easy to show now that we have the pre-image of any open set.  This tells us that if X^{-1}([a, \infty)) \in \mathcal{F} for every a \in \mathbb{R}, then we have that X is measurable, and so a random variable.

In the case that \Omega is a countable set, we take \mathcal{F} to be 2^\Omega (the powerset of \Omega), and so any function X : \Omega \to \mathbb{R} is a random variable.  This is because the pre-image of {}[a, \infty) must be something (even if it’s empty); we have for every a \in \mathbb{R} that X^{-1}([a, \infty)) \in 2^\Omega, and so every function on a countable sample-space is a random variable.

October 25, 2008

Independence and Conditional Probability

Filed under: Measure Theory,Probability Theory — cjohnson @ 6:39 pm

Suppose that (\Omega, \mathcal{F}, P) is a measure space, A a measurable set with P(A) \in (0, \infty).  We can create a new measure space (A, \mathcal{F}_A, P_A) where

\displaystyle \mathcal{F}_A = \{ S \cap A : S \in \mathcal{F} \}

\displaystyle P_A = \frac{P(E)}{P(A)}

Note that (A, \mathcal{F}_A, P_A) is a probability space, as P_A(A) = 1, and any other set E \in \mathcal{F}_A is a subset of A.

Supposing (\Omega, \mathcal{F}, P) is a probability space, we can use this new probability space in our definition of conditional probability.  The probability P_A(E) represents the probability of E occurring, where we already know A has occurred.  Normally, instead of going through the trouble of writing out a new sigma-algebra and probability measure each time, we simply take P(B|A) to be the probability of B \cap A using the P_A measure defined above.  Of course, our measure and sigma-algebra are so simple that we can just write this in one line as

\displaystyle P(B|A) = \frac{P(B \cap A)}{P(A)}

We call this the probability of B given A.  Now if P(B|A) = P(B), we say that A and B are independent events.  If this is the case then we have

\displaystyle P(B) = \frac{P(B \cap A)}{P(A)}

\displaystyle \implies P(A) P(B) = P(B \cap A)

This is certainly a useful property as it makes proofs of interesting facts fall out easily when we consider sequences of independent random variables (a related idea) later.

Now consider the fact that P(A|B) = \frac{P(A \cap B)}{P(B)} implies P(A \cap B) = P(A|B)P(B).  Plugging into the formula for P(B|A) we arrive at the following, known as Bayes’ theorem.

\displaystyle P(B|A) = \frac{P(B \cap A)}{P(A)} = \frac{P(A|B) P(B)}{P(A)}

Note that P(A|B) \neq P(B|A).

October 24, 2008

Probability Spaces

Filed under: Measure Theory,Probability Theory — cjohnson @ 1:03 am

Recall that given a set \Omega, a \sigma-algebra on \Omega is a collection of subsets of \Omega, call it \mathcal{F}, that satisfies the following properties.

  • \displaystyle \emptyset, \Omega \in \mathcal{F}
  • \displaystyle A \in \mathcal{F} \implies A^\complement \in \mathcal{F}
  • \displaystyle (A_n)_{n \in \mathbb{N}} \subseteq \mathcal{F} \implies \bigcup_{n \in \mathbb{N}} A_n \in \mathcal{F}

A function P : \mathcal{F} \to \mathbb{R} is called a measure if it satisfies the properties

  • \displaystyle A \subseteq B \implies P(A) \leq P(B)
  • \displaystyle (A_n)_{n \in \mathbb{N}}, A_i \cap A_j = \emptyset \implies P\left(\bigcup_{n \in \mathbb{N}} A_n\right) = \sum_{n \in \mathbb{N}} P(A_n)
  • \displaystyle P(\emptyset) = 0

The triple (\Omega, \mathcal{F}, P) is then called a measure space.  In the event that P(\Omega) = 1 we say that our triple is in fact a probability space.  This is a formalization of our intuitive ideas of what a “probability space” should be — we have a set of all things that could conceivably happen (a sample space), a family of subsets of items from that sample space (events), and a way to assign a numerical value (a probability) to each of those events.  The properties of a measure, in particular, aren’t particularly surprising in this context.  If we have one event contained in another event, we’d expect the probability of the larger event to be greater than the probability of the smaller event; if we have a sequence of disjoint events, the probability at least one of them occurs is the sum of their probabilities; the probability something happens is one, and the probability nothing happens is zero.

One especially useful property of measures in general is that if A and B are measurable sets with B \subseteq A and A has finite measure, then P(B) = P(A) - P(A \setminus B).  When dealing with probabilities, we’re given that P(\Omega) = 1, so for any A \in \mathcal{F} we have P(A^\complement) = P(\Omega) - P(\Omega \setminus A^\complement) = 1 - P(A).

The traditional first examples of probability theory, dice, coins, and cards, are easy to express in these more formal terms.  A roll of a standard six sided die, for instance, has \Omega = \{1, 2, 3, 4, 5, 6\}, \mathcal{F} = 2^{\Omega} and P(A) = \frac{|A|}{6} where 2^{\Omega} denotes the power set (the collection of all subsets) of \Omega and |A| is the cardinality of A.  In general, given a finite set sample space, the power set of \Omega defines a \sigma-algebra, and P(A) = \frac{|A|}{|\Omega|} gives a measure where all events of the same size (cardinality) are equally likely.

October 22, 2008

Total and Bounded Variation

Filed under: Analysis,Real Analysis — cjohnson @ 5:57 pm

The total variation of a function f over an interval \left[a, b\right], which we’ll denote V_{[a, b]}(f), describes how a function varies over that interval.  That is, suppose P = \{a = x_0 < x_1 < ... < x_n = b\}.  With respect to this particular P, the variation is simply

\displaystyle \sum_{k=1}^n |f(x_k) - f(x_{k-1})|.

The total variation of f is then the supremum of sums such as the above, but over all partitions of \left[a, b \right]:

\displaystyle V_{[a, b]}(f) = \sup_{P \vdash [a, b]} \sum_{k=1}^n |f(x_k) - f(x_{k-1})|.

In the case of f : \mathbb{R} \to \mathbb{C}, with f continuous, V_{[a, b]}(f) gives the length of the curve in \mathbb{C} defined by f.  We take the interval, and break it into lots of little pieces, then look at the corresponding pieces of that function in the complex plane.  As the absolute value in the complex plane gives the distance between two points, we’re basically connected the dots on our curve, and measure the lengths of the connecting line segments.  As we make our partition finer and finer, the pieces of the curve are getting smaller and smaller, giving a better approximation to the length of the curve.  When we take the supremum, we’re getting the “best” approximation (the true value).

Notice that the total variation doesn’t give us the length of a curve for f : \mathbb{R} \to \mathbb{R}, though.  For instance, consider the interval [0, 1] and the function f(x) = x.  Obviously the length of the curve in \mathbb{R}^2 is \sqrt{2}.  However, if we partition the interval, the terms |f(x_k) - f(x_{k-1})| only give the vertical component of the length.  Summing up all these terms, however, we will only have one, not \sqrt{2}.

We say that f is of bounded variation if its total variation is finite.  That is, if there exists an M \in \mathbb{R} such that for any partition P of \left[a, b\right] we choose,

\displaystyle \sum_{k=1}^n |f(x_k) - f(x_{k-1})| \leq M.

Update: We have to be a little bit careful with the discussion of the length of the curve in the above.  A better way to state the above is to talk about the trace of the curve (range of the function).  If the function is continuous then total variation gives the length of the trace, which in the case f : \mathbb{R} \to \mathbb{R} is the the length “travelled” along the path if projected to the y axis.  There’s a nice animation of this idea on Wikipedia’s page on total variation.

October 19, 2008

The Riemann-Stieltjes Integral

Filed under: Analysis,Calculus,Real Analysis — cjohnson @ 10:58 am

The technique of integration that is generally taught in a second semester calculus course is called Riemann integration.  This is given by taking a closed, bounded interval \left[a, b\right] and partitioning it into a set of points P = \{x_0, x_1, ..., x_n\} where a = x_0 < x_1 < ... < x_{n-1} < x_n = b.  Given a function f : [a, b] \to \mathbb{R} we then consider sums of the form

\displaystyle \sum_{k=1}^n f(y_k) [x_k - x_{k-1}]

where y_k \in [x_{k-1}, x_k].  Sums such as this are called Riemann sums.  In particular, we’d like to consider Riemann sums where the y_k we choose in each sub-interval of the partition is the one that gives us the largest value of f(x) over that subinterval.  Defining M_k and m_k as follows, we can then do this easily.

\displaystyle M_k = \sup_{x \in [x_{k-1}, x_k]} f(x)

\displaystyle m_k = \inf_{x \in [x_{k-1}, x_k]} f(x)

We can then define the upper and lower Riemann sums of f with respect to the partition P, denoted U(P, f) and L(P, f), respectively, as follows.

\displaystyle U(P, f) = \sum_{k=1}^n M_k [x_k - x_{k-1}]

\displaystyle L(P, f) = \sum_{k=1}^n m_k [x_k - x_{k-1}]

Now, we can take the infimum and supremum of these over all partitions of \left[a, b\right] to get the upper and lower Riemann integrals of f over the interval \left[a, b\right]:

\displaystyle \overline{\int_a^b f} = \inf_{P \vdash [a, b]} U(P, f)

\displaystyle \underline{\int_a^b f} = \sup_{P \vdash [a, b]} L(P, f)

(Where P \vdash [a, b] means that P is a partition of \left[a, b\right].)

In the event that \overline{\int_a^b f} = \underline{\int_a^b f}, we say that the function f is Riemann integrable, and simply write \int_a^b f for this common value.

While the Riemann integral is certainly a useful tool, it has some severe restrictions.  It is only defined for bounded intervals, but that is easily fixed by taking a limit as one (or both) of the endpoints goes to infinity.  There are some serious problems with having a sequence of functions, as the integral of the limit may not equal the limit of the integrals.  In those cases we have to either impose some pretty severe restrictions on how the sequence converges (i.e., require uniform convergence), or use more advanced tools from measure theory (namely the monotone and dominated convergence theorems with the Lebesgue integral).

Another less serious limitation is that it’s not immediately clear how to extend the Riemann integral to allow us to integrate in other spaces, namely how to integrate over \mathbb{R}^2 or \mathbb{C}.  An important, though very simple, extension of the Riemann integral that can help us rectify those problems (as well as make notation in probability theory a bit more compact) by letting us consider contour integrals is the Riemann-Stieltjes integral.

The Riemann-Stieltjes integral is defined almost exactly like the Riemann integral is above, except that instead of multiplying by the factor \left[x_k - x_{k-1}\right] in our Riemann sum, we multiply by \left[g(x_k) - g(x_{k-1})\right].  That is, given two functions f, g : [a, b] \to \mathbb{R} we can define,

\displaystyle U(P, f, g) = \sum_{k=1}^n M_k [g(x_k) - g(x_{k-1})]

\displaystyle L(P, f, g) = \sum_{k=1}^n m_k [g(x_k) - g(x_{k-1})]

And we define the upper and lower integrals of f with respect to g as

\displaystyle \overline{\int_a^b f \, dg} = \inf_{P \vdash [a, b]} U(P, f, g)

\displaystyle \underline{\int_a^b f\, dg} = \sup_{P \vdash [a, b]} L(P, f, g)

Again, if these values coincide, we refer to this value as \int_a^b f \, dg.  We call f the integrand and g the integrator.

Of course, now we may ask if the Riemann-Stieltjes integral has all of the properties of the traditional Riemann integral, and what new properties it may have that the Riemann integral does not.  One property that’s easy to check, though, is that of linearity.

Thanks to properties of the supremum, and infimum, we know that if \alpha is a constant and S is a set, \sup (\alpha s) = \alpha \sup S.  Carrying this into our definition of the Riemann-Stieltjes integral, we have that if \alpha is a constant, and f, g are functions such that \int_a^b f \, dg exists, then \int_a^b (\alpha f) \, dg = \alpha \int_a^b f \, dg.

Similarly, as \sup(A + B) = \sup A + \sup B, we can show that \int_a^b f + h \, dg = \int_a^b f \, dg + \int_a^b h \, dg.  (Of course, to use this for linearity of the integral we need to also show that U(P, f + h, g) = U(P, f, g) + U(P, h, g), and similarly for L(P, f + h, g), but this follows easily by distributing the sum (f + h)(y_k) = f(y_k) + h(y_k) over the g(x_k) - g(x_{k-1}) term in the Riemann sum.)

July 21, 2008

The Banach Fixed Point Theorem

Filed under: Analysis,Functional Analysis — cjohnson @ 10:06 pm

A mapping T : X \to X from a metric space X to itself is called a contraction if there exists an \alpha \in (0, 1) such that for every x, y \in X we have d(Tx, Ty) \leq \alpha d(x, y).  The Banach fixed point theorem (aka the contraction theorem) says that every contraction on a non-empty complete metric space has exactly one fixed point.

Proof:  Suppose X is a complete metric space and that T : X \to X is a contraction.  Let x_0 \in X be any point.  Define x_1 = T x_0, x_2 = T x_1 = T^2 x_0, x_3 = T x_2 = T^3 x_0 and so forth.  If we write d(x_{n+1}, x_n) in terms of Tx_n andTx_{n-1}, we see that d(x_{n+1}, x_n) \leq \alpha^n d(x_1, x_0).  Now consider d(x_n, x_m) where we assume, without loss of generality, that n > m.  By the triangle inequality we have

\displaystyle d(x_n, x_m) \leq (\alpha^{n-1} + \alpha^{n-2} + ... + \alpha^m) d(x_1, x_0)

\displaystyle \qquad = \frac{\alpha^m (1 - \alpha^{n - m})}{1 - \alpha} d(x_1, x_0)

\displaystyle \qquad \leq \frac{\alpha^m}{1 - \alpha} d(x_0, x_1)

We can obviously make this value as small as we’d like by picking a large enough m, so the sequence (x_n)_{n \in \mathbb{N}} is Cauchy and must converge.  Call the limit of this sequence x.  Now consider the distance between x and Tx.

\displaystyle d(x, Tx) \leq d(x, x_m) + d(x_m, Tx) \leq d(x, x_m) + \alpha d(x_{m-1}, x)

By picking a large enough m, we can make this as small as we’d like as well, so we have d(x, Tx) = 0 so T has a fixed point.  For the uniqueness of x, suppose y is also a fixed point.

\displaystyle d(x, y) = d(Tx, Ty) \leq \alpha d(x, y)

The only way this holds is if d(x, y) = 0, so x = y.  Note that the x we constructed didn’t depend on on our initial value of x_0, so every sequence we construct by picking a point in X and iterating T will result in the same limit.

Next Page »

Blog at WordPress.com.