Mathematics Prelims

December 22, 2008

Convergence in Probability/Measure

Filed under: Measure Theory,Probability Theory — cjohnson @ 11:05 pm

If X_n : \Omega \to \mathbb{R} is a random variable on the probability space (\Omega, \mathcal{F}, P), then one way to define the convergence of a sequence of random variables (X_n)_{n \in \mathbb{N}} is to simply define the limit pointwise, as you would do with any old function.  That is, if there exists a random variable X such that for every \omega \in \Omega we have

\displaystyle X_n(\omega) \to X(\omega)

Similarly, we say that X_n \to X almost everywhere (almost surely) if the measure (probability) of the set of points where X_n(\omega) \not\to X(\omega) is zero.

Another way to define convergence of measurable functions in general is to talk about convergence in measure.  This is a notion of convergence where the set of points where the functions in the sequence do not get arbitrarily close to the corresponding point in the limit function has measure zero.  That is if for each \delta > 0,

\displaystyle P(\{\omega \in \Omega : |X_n(\omega) - X(\omega)| > \delta\}) \to 0

(Here we’re using random variables and a probability space, but the same thing is used for measurable functions of general measure spaces.)  To say that X_n converges to X in probability (denoted X_n \xrightarrow{P} X), is to say that for any \epsilon > 0, there exists an N \in \mathbb{N} such that for every n > N, the set of points where X_n isn’t “close” to X (here “close” means within \delta distance of X) is less than \epsilon.

As noted in Cohn, convergence in measure does not imply convergence pointwise (not even almost everywhere!), or vice versa.  For instance, the set of functions f_n = 1_{[n, \infty)} pointwise converges to the zero function, but does not converge in measure.   However, in the case of a finite measure (e.g., a probability) we have the following proposition.

Proposition If (\Omega, \mathcal{F}, P) is a measure space with P(\Omega) finite, then if (X_n)_{n \in \mathbb{N}} is a sequence of measurable functions with X_n \to X a.e., then X_n \xrightarrow{P} X.

Proof: Let \delta > 0 given.  Define A_n = \{ \omega \in \Omega : |X_n(\omega) - X(\omega)| > \delta \} and B_n = \bigcup_{k=n}^\infty A_k.  Clearly B_n forms a decreasing sequence whose intersection is a subset of the points \omega \in \Omega for which X_n(\omega) \not\to \Omega, by assumption this set has measure zero, P(\bigcap B_n) = 0, which gives that the functions converge in measure to X.

(These examples and proofs are from Donald L. Cohn’s “Measure Theory.”)

December 19, 2008

Expected Value of a Random Variable

Filed under: Measure Theory,Probability Theory — cjohnson @ 8:27 pm

If X : \Omega \to \mathbb{R} is a random variable on the probability space (\Omega, \mathcal{F}, P), then the expected value (aka expectation) of X is, in essence, a weighted average of of the values X may take on.  In the case of simple random variables this weighted average is precisely the expectation.  For a general random variable we’re actually taking a limit of the expectations of simple random variables that converge to the random variable of interest.

Assuming X is a non-negative simple random variable of the form

\displaystyle X(\omega) = \sum_{i=1}^n \alpha_i 1_{A_i}(\omega)

where A_i \in \mathcal{F}, and for the sake of simplicity that \alpha_1 < \alpha_2 < ... < \alpha_n.  The expected value of X, denoted \mathbb{E}[X], is

\displaystyle \mathbb{E}[X] = \sum_{i=1}^n \alpha_i P(A_i).

When X is not a simple random variable, we take \mathbb{E}[X] to be the supremum of the expectations of simple random variables less than X.  That is,

\displaystyle \mathbb{E}[X] = \sup \left\{ \mathbb{E}[Z] : 0 \leq Z \leq X \right\}

where each Z is a simple random variable.  When X is not non-negative, we break X into its positive and negative parts, X^{+} and X^{-}, and provided \mathbb{E}[X^{+}] and \mathbb{E}[X^{-}] are not both infinite, we take the expectation to be

\displaystyle \mathbb{E}[X] = \mathbb{E}[X^{+}] - \mathbb{E}[X^{-}].

Notice that our definition of \mathbb{E}[X] is simply the Lebesgue integral of X with respect to the probability measure P over all of \Omega.

\displaystyle \mathbb{E}[X] = \int_{\Omega} X dP

As such, useful properties of Lebesgue integrals, such as linearity, and powerful theorems like monotone and dominated convergence, hold for expectations.  One slightly less obvious nice property is that if F is the distribution function of X, then we have

\displaystyle \mathbb{E}[X] = \int_{\Omega} X dP = \int_{-\infty}^\infty x \, dF(x)

where the integral on the right refers to the Lebesgue-Stieltjes integral using the probability measure obtained from F, or simply the usual Riemann-Stieltjes integral in those cases that it exists.  To see this, simply show that the result holds for (non-negative) simple random variables, note that for a general non-negative random variable there exists an increasing sequence of non-negative simple random variables that converges to it, and apply the monotone convergence theorem.  For general random variables, break the function into its positive and negative parts and apply the previous result.

November 1, 2008

Distribution of a Random Variable

Filed under: Measure Theory,Probability Theory — cjohnson @ 12:27 pm

Let (\Omega, \mathcal{F}, P) be a probability space and X : \Omega \to \mathbb{R} a random variable.  Define a function F(x) as the probability X takes on a value less than or equal to x.

\displaystyle F(x) := P(X^{-1}((-\infty, x])

That is, F(x) is the measure of the set of \omega \in \Omega such that X(\omega) \leq x.  Notice that F(x) is an increasing function as a \leq b implies (-\infty, a] \subseteq (-\infty, b] and so

\displaystyle X^{-1}((-\infty, a]) = \{ \omega \in \Omega : X(\omega) \leq a \}

\displaystyle \, \, \subseteq \{ \omega \in \Omega : X(\omega) \leq b \}

\displaystyle \, \, = X^{-1}((-\infty, b])

And since a measure is monotonic (i.e., A \subseteq B \implies P(A) \leq P(B)), we have that F(a) \leq F(b).

Recall that if (E_n)_{n \in \mathbb{N}} is a decreasing sequence of measurable sets (i.e., E_n \supseteq E_{n+1}), then P\left(\bigcap_{n \in \mathbb{N}} E_n\right) = \lim_{n \to \infty} P(E_n), provided there exsists an E_n with P(E_n) finite (in our case P is a probability measure, so this is given to us anyway).  We can use this fact to show that our F function is right-continuous.

Let (x_n)_{n \in \mathbb{N}} be a real-valued sequence that decreases to x.  Then we have that

\displaystyle \lim_{n \to \infty} F(x_n) = \lim_{n \to \infty} P(X^{-1}((-\infty, x_n]))

\displaystyle = P \left( \bigcap_{n = 1}^\infty X^{-1}((-\infty, x_n])\right)

\displaystyle = P \left( X^{-1} \left( \bigcap_{n=1}^\infty (-\infty, x_n] \right) \right)

\displaystyle = P(X^{-1}((-\infty, x]))

\displaystyle = F(x)

This shows that \displaystyle \lim_{x \to x_0^+} F(x) = F(x_0) and so F is right-continuous.

Additionally, \lim_{x \to \infty} F(x) = 1 and \lim_{x \to -\infty} F(x) = 0.

The function F that we’ve described is known as the cumulative distribution function (or just distribution function) of our random variable X.  If F(x) only takes on countably many values, then we say that X is a discrete random variable.  If F(x) is a continuous function, we say X is a continuous random variable.

Now that we’ve seen how to construct this distribution function given a random variable X, the natural question to ask is if we have a function that satisfies the properties of a distribution that we’ve listed, is there a corresponding random variable?

Suppose that F(x) is an increasing, right-continuous function with \lim_{x \to -\infty} F(x) = 0 and \lim_{x \to \infty} F(x) = 1.  Using the ideas from the Lebesgue-Stieltjes measure article, we have that F(x) gives us a measure and sigma-algebra on \mathbb{R}.  Let P_F and \mathcal{F}_F be the measure and sigma-algebra, respectively, that we get from F(x).  It follows from the fact that \lim_{x \to \infty} F(x) = 1 and \lim_{x \to -\infty} F(x) = 0 that P_F(\mathbb{R}) = 1, and so (\mathbb{R}, \mathcal{F}_F, P_F) is a probability space.

Define X : \mathbb{R} \to \mathbb{R} as x \mapsto x.  Note that this is a random variable as

\displaystyle X^{-1}((-\infty, x]) = (-\infty, x]

To see that X is a measurable function (random variable), let A \subseteq \mathbb{R} and \epsilon > 0.  There exists a sequence (I_n)_{n \in \mathbb{N}} of intervals that covers A and

\displaystyle P_F^*(A) \leq \sum_{n=1}^\infty \ell_F(I_N) \leq P_F^*(A) + \epsilon

Define J_n := I_n \cap (-\infty, t] and K_n := I_n \cap (t, \infty); note that \bigcup_n J_n and \bigcup_n K_n cover A \cap (-\infty, t) and A \cap (t, \infty), respectively.  Also notice that \ell_F(I_n) = \ell_F(J_n) + \ell_F(K_n).  This gives us

\displaystyle P_F^*(A) \leq \sum_{n=1}^\infty \ell_F(J_n) + \sum_{n=1}^\infty \ell_F(K_n) \leq P_F^*(A) + \epsilon

\displaystyle \implies P_F^*(A) \leq P_F^*(A \cap (-\infty, t]) + P_F^*(A \cap (t, \infty)) \leq P_F^*(A) + \epsilon

\displaystyle \implies P_F^*(A) = P_F^*(A \cap (-\infty, t]) + P_F^*(A \cap (-\infty, t]^\complement)

so X is a measurable function (random variable).  This means that any function F which is increasing, right-continuous, and \lim_{x \to -\infty} F(x) = 0 and \lim_{x \to \infty} F(X) = 1, is the distribution of some random variable.

October 27, 2008

The Lebesgue-Stieltjes Measure

Filed under: Analysis,Measure Theory — cjohnson @ 5:19 pm

Consider an increasing, right-continuous function F : \mathbb{R} \to \mathbb{R}.  We can measure the length of an interval I in \mathbb{R} with end points a and b (e.g., {}[a, b]) as

\displaystyle \ell_F(I) = F(b) - \lim_{x \to a^-} F(a).

(Capinski uses \ell_F(I) = F(b) - F(a) with the interval I restricted to being of the form (a, b], but I believe this gives the same measure in the end.)

Using this definition of the length of an interval, we can then construct an outer measure on \mathbb{R}, call it \mu_F^*, as follows.

\displaystyle \mu_F^*(A) = \inf\left\{ \sum_{n=1}^\infty \ell_F(I_n) : A \subseteq \bigcup_{n=1}^\infty I_n \right\}

Where each I_n is a bounded interval.  Proceeding as we would in defining the usual Lebesgue measure on \mathbb{R}, we will let \mu_F be a measure on \mathcal{M}_F where

\displaystyle \mathcal{M}_F = \{ E \subseteq \mathbb{R} : \forall A \subseteq \mathbb{R}, \, \mu_F^*(A) = \mu_F^*(A \cap E) + \mu_F^*(A \cap E^\complement) \}

Now we’ve gone from an increasing, right-continuous function to a measure on \mathbb{R}.  Note that sets that were null with the Lebesgue measure, may not be anymore, depending on our choice of F.  For instance, if we have

\displaystyle F(X) = \left\{ \begin{array}{ll} x & : x < 0 \\ 1 + x &: x \geq 0 \end{array}\right.

Then \mu_F(\{0\}) = 1, though with the standard Lebesgue measure we have \mu(\{0\}) = 0.

It will be convenient to have the convention that if F is an increasing, right-continuous function that

\displaystyle \int_E g \, dF

is actually short-hand for the Lebesgue integral of g over E using the measure obtained from F as we’ve described above.  This is normally referred to as the Lebesgue-Stieltjes integral with integrator F.

October 26, 2008

Random Variables

Filed under: Measure Theory,Probability Theory,Topology — cjohnson @ 8:32 pm

If (\Omega, \mathcal{F}, P) is a probability space and (Y, \Sigma) is a measurable space (i.e., a set Y along with a sigma-algebra \Sigma on Y), then a random variable is a measurable function X : \Omega \to Y.  That is, for each A \in \Sigma, we have X^{-1}(A) = \{ \omega \in \Omega : X(\omega) \in A \} \in \mathcal{F}.  Generally speaking, we’ll be taking (Y, \Sigma) to be (\mathbb{R}, \mathcal{B}) where \mathcal{B} is the Borel algebra on \mathbb{R}.

Given a topological space (S, \tau), there exists a sigma-algebra \mathcal{B}(S), called the Borel algebra on S, that contains all open sets (members of \tau) and is the smallest such sigma-algebra.  This means that for each \mathcal{O} \in \tau we have \mathcal{O} \in \mathcal{B}(S) and also if \Sigma is another sigma-algebra on S with this property, then \mathcal{B}(S) \subseteq \Sigma.  In general, given a collection C of subsets of S, there exists a sigma-algebra, which we’ll call \mathcal{F}_C, that is the smallest sigma-algebra on S (smallest using \subseteq as the ordering relation) containing each c \in C; we say that \mathcal{F}_C is the sigma-algebra generated by C.  In this sense, the Borel algebra on S is the sigma-algebra generated by the topology \tau.  We will usually just write \mathcal{B} to mean \mathcal{B}(\mathbb{R}).

In the special case of (\mathbb{R}, \mathcal{B}), there’s an easy way to check to see that a function X : \Omega \to \mathbb{R} is a random variable (or a measurable function in general): we just look at the pre-images of intervals.  Since the pre-images of functions are well-behaved with respect to set operations like union, intersection, and complement, it in fact suffices to only consider pre-images of the form {}[a, \infty).  That is, if we show that X^{-1}([a, \infty)) \in \mathcal{F} for every a \in \mathbb{R}, we will have shown that X is measurable, and so a random variable.  (Actually, we can look at all intervals of the form (a, \infty), {}[a, \infty), (-\infty, a) or (-\infty, a].  Using properties of sigma-algebras we can easily show that if we have all intervals of any of these forms, we have all intervals of any other form.  Again, using properties of sigma-algebras it’s easy to take that and show that we have all countable unions of intervals — namely all countable unions of open intervals, i.e., all open sets.)

To see this, suppose for every a \in \mathbb{R} we have X^{-1}([a, \infty)) \in \mathcal{F}.

\displaystyle X^{-1}([a, \infty)) \in \mathcal{F}

\displaystyle \implies \{ \omega \in \Omega : X(\omega) \geq a \} \in \mathcal{F}

\displaystyle \implies \{ \omega \in \Omega : X(\omega) \geq a \}^\complement \in \mathcal{F}

\displaystyle \implies \{ \omega \in \Omega : X(\omega) < a \} \in \mathcal{F}

\displaystyle \implies X^{-1}((-\infty, a)) \in \mathcal{F}

So now we have the pre-images of all intervals of the form {}[a, \infty) and (-\infty, a).  If we can also get pre-images for the form (a, \infty), it’ll be an easy jump to countable unions of open intervals.  Note that

\displaystyle (a, \infty) = \bigcup_{n=1}^\infty \left[ a + \frac{1}{n}, \infty \right)

Now we easily see that

\displaystyle X^{-1}((a, \infty)) = X^{-1} \left( \bigcup_{n=1}^\infty \left[ a + \frac{1}{n}, \infty \right)\right)

\displaystyle = \bigcup_{n=1}^\infty X^{-1}\left(\left[ a + \frac{1}{n}, \infty \right)\right)

And we have pre-images of intervals of the form (a, \infty).  Combined with the fact we have intervals of the form (-\infty, b), it’s easy to see that we also have intervals of the form (a, b).  Using properties of sigma-algebras, it’s easy to show now that we have the pre-image of any open set.  This tells us that if X^{-1}([a, \infty)) \in \mathcal{F} for every a \in \mathbb{R}, then we have that X is measurable, and so a random variable.

In the case that \Omega is a countable set, we take \mathcal{F} to be 2^\Omega (the powerset of \Omega), and so any function X : \Omega \to \mathbb{R} is a random variable.  This is because the pre-image of {}[a, \infty) must be something (even if it’s empty); we have for every a \in \mathbb{R} that X^{-1}([a, \infty)) \in 2^\Omega, and so every function on a countable sample-space is a random variable.

October 25, 2008

Independence and Conditional Probability

Filed under: Measure Theory,Probability Theory — cjohnson @ 6:39 pm

Suppose that (\Omega, \mathcal{F}, P) is a measure space, A a measurable set with P(A) \in (0, \infty).  We can create a new measure space (A, \mathcal{F}_A, P_A) where

\displaystyle \mathcal{F}_A = \{ S \cap A : S \in \mathcal{F} \}

\displaystyle P_A = \frac{P(E)}{P(A)}

Note that (A, \mathcal{F}_A, P_A) is a probability space, as P_A(A) = 1, and any other set E \in \mathcal{F}_A is a subset of A.

Supposing (\Omega, \mathcal{F}, P) is a probability space, we can use this new probability space in our definition of conditional probability.  The probability P_A(E) represents the probability of E occurring, where we already know A has occurred.  Normally, instead of going through the trouble of writing out a new sigma-algebra and probability measure each time, we simply take P(B|A) to be the probability of B \cap A using the P_A measure defined above.  Of course, our measure and sigma-algebra are so simple that we can just write this in one line as

\displaystyle P(B|A) = \frac{P(B \cap A)}{P(A)}

We call this the probability of B given A.  Now if P(B|A) = P(B), we say that A and B are independent events.  If this is the case then we have

\displaystyle P(B) = \frac{P(B \cap A)}{P(A)}

\displaystyle \implies P(A) P(B) = P(B \cap A)

This is certainly a useful property as it makes proofs of interesting facts fall out easily when we consider sequences of independent random variables (a related idea) later.

Now consider the fact that P(A|B) = \frac{P(A \cap B)}{P(B)} implies P(A \cap B) = P(A|B)P(B).  Plugging into the formula for P(B|A) we arrive at the following, known as Bayes’ theorem.

\displaystyle P(B|A) = \frac{P(B \cap A)}{P(A)} = \frac{P(A|B) P(B)}{P(A)}

Note that P(A|B) \neq P(B|A).

October 24, 2008

Probability Spaces

Filed under: Measure Theory,Probability Theory — cjohnson @ 1:03 am

Recall that given a set \Omega, a \sigma-algebra on \Omega is a collection of subsets of \Omega, call it \mathcal{F}, that satisfies the following properties.

  • \displaystyle \emptyset, \Omega \in \mathcal{F}
  • \displaystyle A \in \mathcal{F} \implies A^\complement \in \mathcal{F}
  • \displaystyle (A_n)_{n \in \mathbb{N}} \subseteq \mathcal{F} \implies \bigcup_{n \in \mathbb{N}} A_n \in \mathcal{F}

A function P : \mathcal{F} \to \mathbb{R} is called a measure if it satisfies the properties

  • \displaystyle A \subseteq B \implies P(A) \leq P(B)
  • \displaystyle (A_n)_{n \in \mathbb{N}}, A_i \cap A_j = \emptyset \implies P\left(\bigcup_{n \in \mathbb{N}} A_n\right) = \sum_{n \in \mathbb{N}} P(A_n)
  • \displaystyle P(\emptyset) = 0

The triple (\Omega, \mathcal{F}, P) is then called a measure space.  In the event that P(\Omega) = 1 we say that our triple is in fact a probability space.  This is a formalization of our intuitive ideas of what a “probability space” should be — we have a set of all things that could conceivably happen (a sample space), a family of subsets of items from that sample space (events), and a way to assign a numerical value (a probability) to each of those events.  The properties of a measure, in particular, aren’t particularly surprising in this context.  If we have one event contained in another event, we’d expect the probability of the larger event to be greater than the probability of the smaller event; if we have a sequence of disjoint events, the probability at least one of them occurs is the sum of their probabilities; the probability something happens is one, and the probability nothing happens is zero.

One especially useful property of measures in general is that if A and B are measurable sets with B \subseteq A and A has finite measure, then P(B) = P(A) - P(A \setminus B).  When dealing with probabilities, we’re given that P(\Omega) = 1, so for any A \in \mathcal{F} we have P(A^\complement) = P(\Omega) - P(\Omega \setminus A^\complement) = 1 - P(A).

The traditional first examples of probability theory, dice, coins, and cards, are easy to express in these more formal terms.  A roll of a standard six sided die, for instance, has \Omega = \{1, 2, 3, 4, 5, 6\}, \mathcal{F} = 2^{\Omega} and P(A) = \frac{|A|}{6} where 2^{\Omega} denotes the power set (the collection of all subsets) of \Omega and |A| is the cardinality of A.  In general, given a finite set sample space, the power set of \Omega defines a \sigma-algebra, and P(A) = \frac{|A|}{|\Omega|} gives a measure where all events of the same size (cardinality) are equally likely.

July 20, 2008

The Monotone Convergence Theorem

Filed under: Analysis,Measure Theory — cjohnson @ 4:59 pm

If (f_n : E \to \mathbb{R})_{n \in \mathbb{N}} is a sequence of non-negative measurable functions and (f_n(x))_{n \in \mathbb{N}} increases monotonically to f(x) for each x \in E, f_n \nearrow f pointwise (this can actually be relaxed to just having f_n \nearrow f a.e.), then

\displaystyle \lim_{n \to \infty} \int_E f_n(x) \, dm = \int_E f \, dm.

Proof:

As f_n \nearrow f a.e., we have that f_n \leq f a.e. for each n \in \mathbb{N}, and so \int_E f_n \, dm \leq \int_E f \, dm for each n.  Now also notice that since f_n \leq f_{n + 1}, that \int_E f_n \, dm forms a bounded monotonic sequence sequence of real numbers, so it must converge.  So we have

\displaystyle \lim_{n \to \infty} \int_E f_n \, dm \leq \int_E f \, dm = \int_E \lim_{n \to \infty} f \, dm

By Fatou’s lemma we have that

\displaystyle \int_E f \, dm = \int_E \lim_{n \to \infty} f_n \, dm = \int_E \liminf_{n \to \infty} f_n \, dm

\displaystyle \qquad \leq \liminf_{n \to \infty} \int_E f_n \, dm = \lim_{n \to \infty} \int_E f_n \, dm

Since we have inequalities both ways, we must have equality.

Fatou’s Lemma, Part I

Filed under: Analysis,Measure Theory — cjohnson @ 4:38 pm

If E \subset \mathbb{R} is a Lebesgue measurable set and (f_n : E \to \mathbb{R})_{n \in \mathbb{N}} is a sequence of non-negative measurable functions, then

\displaystyle \liminf_{n \to \infty} \int_E f_n dm \geq \int_e \liminf_{n \to \infty} f_n dm

Proof: Let g_n = \inf_{k \geq n} f_k and f = \liminf_{n \to \infty} f_n = \lim_{n \to \infty} g_n.  Let \phi be a simple function with \phi \leq f.   Assume f > 0 on E, as the case where f = 0 is trivial.  Now define \overline{\phi} as follows.

\overline{\phi}(x) = \left\{ \begin{array}{lr} \phi(x) - \epsilon &: \phi(x) > 0 \\ 0 &: \phi(x) = 0 \end{array} \right.

Where \epsilon > 0 is a value that ensures \overline{\phi} \geq 0.  Now note that \overline{\phi} < f and g_n \nearrow f, so there must exist an N \in \mathbb{N} such that g_n \geq \overline{\phi} for all n > N.

Let A_k = \{ x \in E : g_k(x) \geq \overline{\phi}(x) \} and notice that A_k \subseteq A_{k+1}.  Also, \bigcup_{k=1}^\infty A_k = E.  For k \geq n we have the following.

\displaystyle \int_{A_n \cap E} \overline{\phi} \, dm \leq \int_{A_n \cap E} g_n \, dm

\displaystyle \qquad \leq \int_{A_n \cap E} f_k \, dm

\displaystyle \qquad \leq \int_E f_k \, dm

So we have

\displaystyle \int_{A_n \cap E} \overline{\phi} \, dm \leq \liminf_{k \to \infty} \int_E f_k \, dm

Now notice that \overline{\phi} is a simple function, and so we can write it as

\displaystyle \overline{\phi} = \sum_{i=1}^\ell c_i 1_{B_i}

Now we have

\displaystyle \int_{A_n \cap E} \overline{\phi} \, dm = \sum_{i=1}^\ell c_i m(A_n \cap E \cap B_i)

Letting n \to \infty this gives us

\displaystyle \int_E \overline{\phi} \, dm = \sum_{i=1}^\ell c_i m(E \cap B_i)

This leaves us with

\displaystyle \int_E \overline{\phi} \, dm \leq \liminf_{k \to \infty} \int_E f_k \, dm

If m(\{ x \in E : \phi(x) > 0 \}) < \infty, then we have

\int_E \overline{\phi} \, dm = \int_E \phi \, dm - \epsilon \, m(\{ x \in E : \phi (x) > 0 \})

This gives the desired result if we let \epsilon \to 0 as \phi is an arbitrary simple function with \phi \leq f.

Note: This is an awesomely useful lemma, but I hate this proof.

July 17, 2008

The Cantor Set: An Uncountable Null Set

Filed under: Analysis,Measure Theory — cjohnson @ 12:29 am

We’ve seen that every countable set is null, which naturally leads to the question of whether or not any uncountable sets can be null.  The uncountable sets we use most often (namely intervals or unions of intervals) are not null, but constructing an uncountable null set known as the Cantor set is not very difficult.

The Cantor set is defined iteratively, with the actual set being the intersection of all iterations.  We begin by letting C_0 = [0, 1].  Then we construct C_1 from C_0 by removing the middle third of C_0, this gives us C_1 = [0, 1/3] \cup [2/3, 1].  We then construct C_2 by removing the middle third of the intervals in C_1, so we have C_2 = [0, 1/9] \cup [2/9, 1/3] \cup [2/3, 7/9] \cup [8/9, 1].  In general, we will construct C_n from C_{n-1} by removing the middle third of any interval in C_{n-1}.  Finally, we define the Cantor set C as follows.

\displaystyle C = \bigcap_{n=0}^\infty C_n

Now note that at any given iteration there are 2^n intervals (we start with 2^0 = 1 intervals with C_0, then in each iteration double the number of intervals as we split each one into two pieces, the third on the left, and the third on the right).  Furthermore, each interval has length \left( \frac{1}{3} \right)^n.  This means the sum total length of C_n is

\displaystyle \sum_{n=1}^{2^n} \left( \frac{1}{3} \right)^n = \left( \frac{2}{3} \right)^n

Let \epsilon > 0 be given.  Since \left( \frac{2}{3} \right) < 1, we can find an n \in \mathbb{N} such that \left( \frac{2}{3} \right)^n < \epsilon, and so there exists an n such that the sum total length of the intervals that compose C_n is less than \epsilon. Now note that since C is the intersection of sets that include C_n, the sum total length of C must be less than that of C_n which will be less than any given \epsilon > 0 for a large enough n.  This means that C is null.

Now notice that the Cantor set is not empty, as the end points of any interval in some C_n will be in every C_n. (Suppose you look at the end points of C_n.  Those points will be inside of intervals in each C_m with m < n, and on the endpoints of intervals in each C_m with m \geq n.)

We can show that the Cantor set is uncountable by showing that an element in the Cantor set is obtained if we take a number in [0, 1], write it in ternary, and that representation doesn’t have any ones.  Basically, if we were to write down the ternary representation of some number in [0, 1] we’d start off with the first place after the decimal point (the 1/3 spot).  If we write down a 0, we’re on the left-hand side of C_0, if we write down a 2 we’re on the right-hand side of C_0, and if we were to write down a 1, we’d be in the interval (1/3, 2/3) and thus not in the Cantor set.  When we write down the second digit after the decimal point (the 1/9 spot), we will “jump” to the left-hand interval of the interval we just split up if we write down a 0, to the right-hand interval if we write down a 2, and into the middle-third the interval we just split (i.e., outside of the Cantor set) if we write down a 1.  This is an injective mapping from the numbers in [0, 1] whose ternary expansion don’t have a one (which is an uncountable set) into C, and so the Cantor set must be uncountable.

Next Page »

Blog at WordPress.com.