Mathematics Prelims

December 23, 2008

Periodicity, Positive Recurrence and Null Recurrence

Filed under: Stochastics — cjohnson @ 12:25 am

Imagine a chain where we start in some state (call it state zero), and the probability of being in state zero after n transitions is zero unless n is a multiple of some number, say d.  If d is the smallest such number, then we say that that state has period d.  We’ve already seen an example of a chain with such a state in the simple random walk.  In the simple random walk it was impossible to move from any state back to itself in an odd number of transitions: to do so you have to step left the same number of times you step right, which is of course an even number.  Since two divides every even number, we have that the state has a period of two.  A state with period one is called aperiodic.

If a state is recurrent, we say that the state is positive recurrent if the expected amount of time between recurrences (when the chain is in that state) is finite.  A state that is recurrent but not positive recurrent is called null recurrent.  A positive recurrent, aperiodic state is called ergodic.

An example of a null recurrent state appears in the simple random walk.  We’ve seen that all states are recurrent, but it turns out that all states are in fact null recurrent.  For a discussion of the null recurrence of states in the simple random walk, see section 2.2 in The Theory of Stochastic Processes by Cox and Miller.

December 22, 2008

Convergence in Probability/Measure

Filed under: Measure Theory, Probability Theory — cjohnson @ 11:05 pm

If X_n : \Omega \to \mathbb{R} is a random variable on the probability space (\Omega, \mathcal{F}, P), then one way to define the convergence of a sequence of random variables (X_n)_{n \in \mathbb{N}} is to simply define the limit pointwise, as you would do with any old function.  That is, if there exists a random variable X such that for every \omega \in \Omega we have

\displaystyle X_n(\omega) \to X(\omega)

Similarly, we say that X_n \to X almost everywhere (almost surely) if the measure (probability) of the set of points where X_n(\omega) \not\to X(\omega) is zero.

Another way to define convergence of measurable functions in general is to talk about convergence in measure.  This is a notion of convergence where the set of points where the functions in the sequence do not get arbitrarily close to the corresponding point in the limit function has measure zero.  That is if for each \delta > 0,

\displaystyle P(\{\omega \in \Omega : |X_n(\omega) - X(\omega)| > \delta\}) \to 0

(Here we’re using random variables and a probability space, but the same thing is used for measurable functions of general measure spaces.)  To say that X_n converges to X in probability (denoted X_n \xrightarrow{P} X), is to say that for any \epsilon > 0, there exists an N \in \mathbb{N} such that for every n > N, the set of points where X_n isn’t “close” to X (here “close” means within \delta distance of X) is less than \epsilon.

As noted in Cohn, convergence in measure does not imply convergence pointwise (not even almost everywhere!), or vice versa.  For instance, the set of functions f_n = 1_{[n, \infty)} pointwise converges to the zero function, but does not converge in measure.   However, in the case of a finite measure (e.g., a probability) we have the following proposition.

Proposition If (\Omega, \mathcal{F}, P) is a measure space with P(\Omega) finite, then if (X_n)_{n \in \mathbb{N}} is a sequence of measurable functions with X_n \to X a.e., then X_n \xrightarrow{P} X.

Proof: Let \delta > 0 given.  Define A_n = \{ \omega \in \Omega : |X_n(\omega) - X(\omega)| > \delta \} and B_n = \bigcup_{k=n}^\infty A_k.  Clearly B_n forms a decreasing sequence whose intersection is a subset of the points \omega \in \Omega for which X_n(\omega) \not\to \Omega, by assumption this set has measure zero, P(\bigcap B_n) = 0, which gives that the functions converge in measure to X.

(These examples and proofs are from Donald L. Cohn’s “Measure Theory.”)

Recurrence of States

Filed under: Probability Theory, Stochastics — cjohnson @ 9:25 pm

For some states in a Markov chain, there is a positive probability that once the chain transitions out of that state, it will never come back.  For instance, in the chain with transition matrix

\displaystyle \left[ \begin{array}{cc} 0 & 1 \\ 0 & 1 \end{array} \right]

Once the chain transitions out of state one, it will never again enter state one, as it will stay in state two forever.  There are times, however, when the chain will (with probability one) re-enter that state.  For example, consider the chain

\displaystyle \left[ \begin{array}{cc} 1/2 & 1/2 \\ 1/2 & 1/2 \end{array} \right]

We note that the probability the chain re-enters state one at time n is given by (1/2)^n as chain must stay in state two for n-1 transitions, each of which has probability 1/2, and then must finally transition to state one with probability 1/2.  Now, the probability the state ever re-enters state one is the union of the events where it first enters at n=1, or at n=2, or n=3, … These are all disjoint events, though, so we have

\displaystyle \sum_{n=1}^\infty \left(\frac{1}{2}\right)^n = 1

We say that state i is recurrent if, given X_0 = i, the probability the chain will ever re-enter state i is one.  If a state is not recurrent, it is called transient.  If a state is recurrent, then it will (with probability one) be entered infinitely often, assuming it’s entered at all, as the chain essentially “restarts” itself each time it enters that state, due to the Markov property.

Now suppose states i and j belong to the same class and that state i is recurrent.  Since i and j belong to the same class, we have P_{ij} > 0.  Note that 1 - P_{ij} (the probability of not transitioning to state j from state i) is less than one.  Each time the chain is in state i there is a chance it will not transition to state j.  However, since i is recurrent, it will be entered infinitely often, so the chance of never entering state j is

\displaystyle \prod_{n=1}^\infty (1 - P_{ij}) = 0

So, if the chain enters state i, it will (with probability one) also enter state j at some point.  This shows that if i is recurrent, then j must be as well.  This shows that recurrence (and transcience) is a class property: in any given class, all states are transient or all states are recurrent.

An easier way to test for recurrence is to note that state i is recurrent if and only if

\displaystyle \sum_{n=1}^\infty P_{ii}^{(n)} = \infty

(This follows by setting I_n to be the Bernoulli random variable that is one if X_n = i and zero otherwise.  This expected number of times state one is entered is infinite, so \mathbb{E}\left[\sum I_n \right] is infinite.  By linearity, though, this expectation is simply \sum \mathbb{E}[I_n] = \sum P_{ii}^{(n)}.)

Consider the simple random walk where the probability of stepping to the left or right at each transition is 1/2.  We will suppose that our states are the integers, like we’re walking along an axis, and that we start in state zero.  Note that all states communicate with all other states, so the chain is irreducible (there is only one class of states).

Now, supposing we’re in state zero, what’s the probability we ever re-enter state zero?  To enter state zero, we must have had the same number of steps left as right.  This is binomially distributed, so the probability that X_n = 0 is

\displaystyle \binom{n}{n/2} \left(\frac{1}{2}\right)^{n/2} \left(\frac{1}{2}\right)^{n/2}

Of course, this depends on n being even, so we can rewrite this as

\displaystyle P_{00}^{(2n)} = \binom{2n}{n} \left( \frac{1}{2} \right)^{2n}

Using Stirling’s approximation to the factorial, re-writing the above binomial, we have that

\displaystyle P_{00}^{(2n)} \sim \frac{4^n}{\sqrt{\pi n}}

and so summing over all n we have that \sum P_{00}^{(2n)} = \infty which implies zero is a recurrent state.  This in turn implies all states are recurrent.

December 20, 2008

Communication and Equivalence of States

Filed under: Probability Theory, Stochastics — cjohnson @ 4:41 pm

When states i and j, i \neq j, of a Markov chain can be reached from one another (that is, there is a positive probability of transitioning to i from j, and a positive probability of transitioning to j from i; note these needn’t be one-step transitions), we say that the two states communicate.  Note that every state communicates with itself, as if we’re in state i we’ll be in state i after zero transitions.  Communication of states forms an equivalence relation, and so when we say two states belong to the same class, we mean that they belong to the same equivalence class with this relation.   If a chain has only one class of states, the chain is called irreducible.

For instance, consider the chain that simply moves from state one to state two, from state two to state three, then from state three back to state one, and the process repeats.  Such a chain would have a transition probability matrix of

\displaystyle \left[ \begin{array}{ccc} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{array} \right]

This chain has only one class and so is irreducible.

The chain with transition matrix

\displaystyle \left[ \begin{array}{cccc} 1/2 & 1/4 & 1/4 & 0 \\ 0 & 1/2 & 1/2 & 0 \\ 1/3 & 1/3 & 0 & 1/3 \\ 0 & 0 & 0 & 1 \end{array}\right]

has two classes, \{1, 2, 3\} and \{4\}.  The first three states communicate with one another as if you’re any one of \{1, 2, 3\}, then there is a positive probability of transitioning to any of \{1, 2, 3\}.  Notice that we can transition from state two to state one by going through state three first.  State four is in a class by itself, though, as state four can only communicate with itself: Once you’re in state four, you stay in state four with probability one.

The Chapman-Kolmogorov Equations

Filed under: Probability Theory, Stochastics — cjohnson @ 12:46 pm

Supposing we’re given each P_{ij} for a Markov chain, we have all of the one-step transition probabilities.  Calculating the n-step transition probabilities, we arrive at the Chapman-Kolmogorov equations.  We will let P_{ij}^(n) denote the probability that we arrive at state j after n transitions, given that we start in i:

\displaystyle P_{ij}^(n) = P(X_n = j | X_0 = i)

We begin by considering n = 2.  If X_0 = i, we find the chance of X_2 = j by considering all the paths that start at i and end at j; that is, we look at all posibilities of X_1.  Since we have each P_{ik}, we just look at the chance we first transition to state k, then from k transition to j.  This gives

\displaystyle P_{ij}^(2) = \sum_k P(X_2 = j \cap X_1 = k | X_0 = i)

\displaystyle \, \, = \sum_k P(X_2 = j| X_1 = k) P(X_1 = k | X_0 = i)

\displaystyle \, \, = \sum_k P(X_1 = j | X_0 = k) P(X_1 = k | X_0 = i)

\displaystyle \, \, = \sum_k P_{kj} P_{ik}

\displaystyle \, \, = \sum_k P_{ik} P_{kj}

Note this gives the two-step transition probabilities form the matrix P^2.  We can extend this result inductively, then we have that the n-step transition probability matrix is simply P^n.

Continuing with the weather example from last time, we have that the two-step probability transition matrix is

\displaystyle \left[ \begin{array}{ccc} 1/2 & 1/4 & 1/4 \\ 1/8 & 1/2 & 3/8 \\ 1/4 & 0 & 3/4 \end{array} \right]^2 = \left[ \begin{array}{ccc} 11/32 & 1/4 & 13/32 \\ 7/32 & 9/32 & 1/2 \\ 5/16 & 1/16 & 5/8 \end{array}\right]

And so, according to our model, if it’s rainy today, there is an 11/32 chance it will be rainy two days from now, 1/4 chance it snows two days from now, and 13/32 chance it’s sunny.

Introduction to Markov Chains

Filed under: Probability Theory, Stochastics — cjohnson @ 11:15 am

A stochastic process is an indexed collection of random variables, where the index is normally thought of as time.  Generally we will assume that this time index is either discrete (i.e., from the set \mathbb{N} or \mathbb{N}_0) or continous ({}[0, \infty)).  A Markov chain, unless stated otherwise, we will assume to be a discrete time process indexed by \mathbb{N}_0.  This is a process where each X_n is a random variable whose range is a countable set, which we will generally take to be \mathbb{N} or the first however-many natural numbers.  We then say that X_n is the state of the process at time n, and the range of the X_n is called the state space.

The primary property of a Markov chain is called the Markov property which effectively says that the distribution of X_{n + 1}, given X_n does not depend on any of X_0, X_1, ..., X_{n-1}.  That is, the next state we transition to depends only on the current state, and none of the previous.  Formally,

\displaystyle P(X_{n + 1} = j | X_n = i, X_{n-1} = i_{n-1}, X_{n-2} = i_{n-2}, ..., X_0 = i_0)

\displaystyle = P(X_{n+1} = j | X_n = i)

For example, imagine that you were to start flipping a coin and you record the number of heads you get as you flip the coin, where each time you flip a head you add one to your total, but you subtract one each time you flip a tail.  That is, if you were to flip HHTHTT your total would be zero.  This is an example of a Markov chain because the probability your total after the next flip depends only what your total currently is, and not the “path” you took to get to that total.  In our example, the probability we have one at the next flip is 1/2 (we flip heads) and the probability we have negative one at the next flip is 1/2 (we flip tails).  This is also true if we were to flip HT or TTTTTTHHHHHH or any other string of heads and tails with the same number of heads as tails.  A process such as the one we’ve described is called a random walk.

Because of this Markov property, we can describe the probability of transitioning between states as a matrix.  Starting off with an example with finitely many states, suppose that if it’s raining today it will rain tomorrow with probability 1/2, snow with probability 1/4, or be sunny with probability 1/4.  If it’s snowing today, it will rain tomorrow with probaiblity 1/8, snow with probability 1/2, and it will be sunny with probability 3/8.  If it’s sunny today, it will rain tomorrow with probability 1/4, snow with probability 0, and be sunny with probability 3/4.  The probability transition matrix is then

\displaystyle \left[ \begin{array}{ccc} 1/2 & 1/4 & 1/4 \\ 1/8 & 1/2 & 3/8 \\ 1/4 & 0 & 3/4 \end{array} \right]

Where, if we call this matrix P, then the entry P_{ij} refers to the probability that we transition to state j, given that we’re in state i.  Also note that each row corresponds to a discrete probability distribution, so we must have that the sum of the entries on any given row is equal to one.

December 19, 2008

Expected Value of a Random Variable

Filed under: Measure Theory, Probability Theory — cjohnson @ 8:27 pm

If X : \Omega \to \mathbb{R} is a random variable on the probability space (\Omega, \mathcal{F}, P), then the expected value (aka expectation) of X is, in essence, a weighted average of of the values X may take on.  In the case of simple random variables this weighted average is precisely the expectation.  For a general random variable we’re actually taking a limit of the expectations of simple random variables that converge to the random variable of interest.

Assuming X is a non-negative simple random variable of the form

\displaystyle X(\omega) = \sum_{i=1}^n \alpha_i 1_{A_i}(\omega)

where A_i \in \mathcal{F}, and for the sake of simplicity that \alpha_1 < \alpha_2 < ... < \alpha_n.  The expected value of X, denoted \mathbb{E}[X], is

\displaystyle \mathbb{E}[X] = \sum_{i=1}^n \alpha_i P(A_i).

When X is not a simple random variable, we take \mathbb{E}[X] to be the supremum of the expectations of simple random variables less than X.  That is,

\displaystyle \mathbb{E}[X] = \sup \left\{ \mathbb{E}[Z] : 0 \leq Z \leq X \right\}

where each Z is a simple random variable.  When X is not non-negative, we break X into its positive and negative parts, X^{+} and X^{-}, and provided \mathbb{E}[X^{+}] and \mathbb{E}[X^{-}] are not both infinite, we take the expectation to be

\displaystyle \mathbb{E}[X] = \mathbb{E}[X^{+}] - \mathbb{E}[X^{-}].

Notice that our definition of \mathbb{E}[X] is simply the Lebesgue integral of X with respect to the probability measure P over all of \Omega.

\displaystyle \mathbb{E}[X] = \int_{\Omega} X dP

As such, useful properties of Lebesgue integrals, such as linearity, and powerful theorems like monotone and dominated convergence, hold for expectations.  One slightly less obvious nice property is that if F is the distribution function of X, then we have

\displaystyle \mathbb{E}[X] = \int_{\Omega} X dP = \int_{-\infty}^\infty x \, dF(x)

where the integral on the right refers to the Lebesgue-Stieltjes integral using the probability measure obtained from F, or simply the usual Riemann-Stieltjes integral in those cases that it exists.  To see this, simply show that the result holds for (non-negative) simple random variables, note that for a general non-negative random variable there exists an increasing sequence of non-negative simple random variables that converges to it, and apply the monotone convergence theorem.  For general random variables, break the function into its positive and negative parts and apply the previous result.

November 1, 2008

Distribution of a Random Variable

Filed under: Measure Theory, Probability Theory — cjohnson @ 12:27 pm

Let (\Omega, \mathcal{F}, P) be a probability space and X : \Omega \to \mathbb{R} a random variable.  Define a function F(x) as the probability X takes on a value less than or equal to x.

\displaystyle F(x) := P(X^{-1}((-\infty, x])

That is, F(x) is the measure of the set of \omega \in \Omega such that X(\omega) \leq x.  Notice that F(x) is an increasing function as a \leq b implies (-\infty, a] \subseteq (-\infty, b] and so

\displaystyle X^{-1}((-\infty, a]) = \{ \omega \in \Omega : X(\omega) \leq a \}

\displaystyle \, \, \subseteq \{ \omega \in \Omega : X(\omega) \leq b \}

\displaystyle \, \, = X^{-1}((-\infty, b])

And since a measure is monotonic (i.e., A \subseteq B \implies P(A) \leq P(B)), we have that F(a) \leq F(b).

Recall that if (E_n)_{n \in \mathbb{N}} is a decreasing sequence of measurable sets (i.e., E_n \supseteq E_{n+1}), then P\left(\bigcap_{n \in \mathbb{N}} E_n\right) = \lim_{n \to \infty} P(E_n), provided there exsists an E_n with P(E_n) finite (in our case P is a probability measure, so this is given to us anyway).  We can use this fact to show that our F function is right-continuous.

Let (x_n)_{n \in \mathbb{N}} be a real-valued sequence that decreases to x.  Then we have that

\displaystyle \lim_{n \to \infty} F(x_n) = \lim_{n \to \infty} P(X^{-1}((-\infty, x_n]))

\displaystyle = P \left( \bigcap_{n = 1}^\infty X^{-1}((-\infty, x_n])\right)

\displaystyle = P \left( X^{-1} \left( \bigcap_{n=1}^\infty (-\infty, x_n] \right) \right)

\displaystyle = P(X^{-1}((-\infty, x]))

\displaystyle = F(x)

This shows that \displaystyle \lim_{x \to x_0^+} F(x) = F(x_0) and so F is right-continuous.

Additionally, \lim_{x \to \infty} F(x) = 1 and \lim_{x \to -\infty} F(x) = 0.

The function F that we’ve described is known as the cumulative distribution function (or just distribution function) of our random variable X.  If F(x) only takes on countably many values, then we say that X is a discrete random variable.  If F(x) is a continuous function, we say X is a continuous random variable.

Now that we’ve seen how to construct this distribution function given a random variable X, the natural question to ask is if we have a function that satisfies the properties of a distribution that we’ve listed, is there a corresponding random variable?

Suppose that F(x) is an increasing, right-continuous function with \lim_{x \to -\infty} F(x) = 0 and \lim_{x \to \infty} F(x) = 1.  Using the ideas from the Lebesgue-Stieltjes measure article, we have that F(x) gives us a measure and sigma-algebra on \mathbb{R}.  Let P_F and \mathcal{F}_F be the measure and sigma-algebra, respectively, that we get from F(x).  It follows from the fact that \lim_{x \to \infty} F(x) = 1 and \lim_{x \to -\infty} F(x) = 0 that P_F(\mathbb{R}) = 1, and so (\mathbb{R}, \mathcal{F}_F, P_F) is a probability space.

Define X : \mathbb{R} \to \mathbb{R} as x \mapsto x.  Note that this is a random variable as

\displaystyle X^{-1}((-\infty, x]) = (-\infty, x]

To see that X is a measurable function (random variable), let A \subseteq \mathbb{R} and \epsilon > 0.  There exists a sequence (I_n)_{n \in \mathbb{N}} of intervals that covers A and

\displaystyle P_F^*(A) \leq \sum_{n=1}^\infty \ell_F(I_N) \leq P_F^*(A) + \epsilon

Define J_n := I_n \cap (-\infty, t] and K_n := I_n \cap (t, \infty); note that \bigcup_n J_n and \bigcup_n K_n cover A \cap (-\infty, t) and A \cap (t, \infty), respectively.  Also notice that \ell_F(I_n) = \ell_F(J_n) + \ell_F(K_n).  This gives us

\displaystyle P_F^*(A) \leq \sum_{n=1}^\infty \ell_F(J_n) + \sum_{n=1}^\infty \ell_F(K_n) \leq P_F^*(A) + \epsilon

\displaystyle \implies P_F^*(A) \leq P_F^*(A \cap (-\infty, t]) + P_F^*(A \cap (t, \infty)) \leq P_F^*(A) + \epsilon

\displaystyle \implies P_F^*(A) = P_F^*(A \cap (-\infty, t]) + P_F^*(A \cap (-\infty, t]^\complement)

so X is a measurable function (random variable).  This means that any function F which is increasing, right-continuous, and \lim_{x \to -\infty} F(x) = 0 and \lim_{x \to \infty} F(X) = 1, is the distribution of some random variable.

October 26, 2008

Random Variables

Filed under: Measure Theory, Probability Theory, Topology — cjohnson @ 8:32 pm

If (\Omega, \mathcal{F}, P) is a probability space and (Y, \Sigma) is a measurable space (i.e., a set Y along with a sigma-algebra \Sigma on Y), then a random variable is a measurable function X : \Omega \to Y.  That is, for each A \in \Sigma, we have X^{-1}(A) = \{ \omega \in \Omega : X(\omega) \in A \} \in \mathcal{F}.  Generally speaking, we’ll be taking (Y, \Sigma) to be (\mathbb{R}, \mathcal{B}) where \mathcal{B} is the Borel algebra on \mathbb{R}.

Given a topological space (S, \tau), there exists a sigma-algebra \mathcal{B}(S), called the Borel algebra on S, that contains all open sets (members of \tau) and is the smallest such sigma-algebra.  This means that for each \mathcal{O} \in \tau we have \mathcal{O} \in \mathcal{B}(S) and also if \Sigma is another sigma-algebra on S with this property, then \mathcal{B}(S) \subseteq \Sigma.  In general, given a collection C of subsets of S, there exists a sigma-algebra, which we’ll call \mathcal{F}_C, that is the smallest sigma-algebra on S (smallest using \subseteq as the ordering relation) containing each c \in C; we say that \mathcal{F}_C is the sigma-algebra generated by C.  In this sense, the Borel algebra on S is the sigma-algebra generated by the topology \tau.  We will usually just write \mathcal{B} to mean \mathcal{B}(\mathbb{R}).

In the special case of (\mathbb{R}, \mathcal{B}), there’s an easy way to check to see that a function X : \Omega \to \mathbb{R} is a random variable (or a measurable function in general): we just look at the pre-images of intervals.  Since the pre-images of functions are well-behaved with respect to set operations like union, intersection, and complement, it in fact suffices to only consider pre-images of the form {}[a, \infty).  That is, if we show that X^{-1}([a, \infty)) \in \mathcal{F} for every a \in \mathbb{R}, we will have shown that X is measurable, and so a random variable.  (Actually, we can look at all intervals of the form (a, \infty), {}[a, \infty), (-\infty, a) or (-\infty, a].  Using properties of sigma-algebras we can easily show that if we have all intervals of any of these forms, we have all intervals of any other form.  Again, using properties of sigma-algebras it’s easy to take that and show that we have all countable unions of intervals — namely all countable unions of open intervals, i.e., all open sets.)

To see this, suppose for every a \in \mathbb{R} we have X^{-1}([a, \infty)) \in \mathcal{F}.

\displaystyle X^{-1}([a, \infty)) \in \mathcal{F}

\displaystyle \implies \{ \omega \in \Omega : X(\omega) \geq a \} \in \mathcal{F}

\displaystyle \implies \{ \omega \in \Omega : X(\omega) \geq a \}^\complement \in \mathcal{F}

\displaystyle \implies \{ \omega \in \Omega : X(\omega) < a \} \in \mathcal{F}

\displaystyle \implies X^{-1}((-\infty, a)) \in \mathcal{F}

So now we have the pre-images of all intervals of the form {}[a, \infty) and (-\infty, a).  If we can also get pre-images for the form (a, \infty), it’ll be an easy jump to countable unions of open intervals.  Note that

\displaystyle (a, \infty) = \bigcup_{n=1}^\infty \left[ a + \frac{1}{n}, \infty \right)

Now we easily see that

\displaystyle X^{-1}((a, \infty)) = X^{-1} \left( \bigcup_{n=1}^\infty \left[ a + \frac{1}{n}, \infty \right)\right)

\displaystyle = \bigcup_{n=1}^\infty X^{-1}\left(\left[ a + \frac{1}{n}, \infty \right)\right)

And we have pre-images of intervals of the form (a, \infty).  Combined with the fact we have intervals of the form (-\infty, b), it’s easy to see that we also have intervals of the form (a, b).  Using properties of sigma-algebras, it’s easy to show now that we have the pre-image of any open set.  This tells us that if X^{-1}([a, \infty)) \in \mathcal{F} for every a \in \mathbb{R}, then we have that X is measurable, and so a random variable.

In the case that \Omega is a countable set, we take \mathcal{F} to be 2^\Omega (the powerset of \Omega), and so any function X : \Omega \to \mathbb{R} is a random variable.  This is because the pre-image of {}[a, \infty) must be something (even if it’s empty); we have for every a \in \mathbb{R} that X^{-1}([a, \infty)) \in 2^\Omega, and so every function on a countable sample-space is a random variable.

October 25, 2008

Independence and Conditional Probability

Filed under: Measure Theory, Probability Theory — cjohnson @ 6:39 pm

Suppose that (\Omega, \mathcal{F}, P) is a measure space, A a measurable set with P(A) \in (0, \infty).  We can create a new measure space (A, \mathcal{F}_A, P_A) where

\displaystyle \mathcal{F}_A = \{ S \cap A : S \in \mathcal{F} \}

\displaystyle P_A = \frac{P(E)}{P(A)}

Note that (A, \mathcal{F}_A, P_A) is a probability space, as P_A(A) = 1, and any other set E \in \mathcal{F}_A is a subset of A.

Supposing (\Omega, \mathcal{F}, P) is a probability space, we can use this new probability space in our definition of conditional probability.  The probability P_A(E) represents the probability of E occurring, where we already know A has occurred.  Normally, instead of going through the trouble of writing out a new sigma-algebra and probability measure each time, we simply take P(B|A) to be the probability of B \cap A using the P_A measure defined above.  Of course, our measure and sigma-algebra are so simple that we can just write this in one line as

\displaystyle P(B|A) = \frac{P(B \cap A)}{P(A)}

We call this the probability of B given A.  Now if P(B|A) = P(B), we say that A and B are independent events.  If this is the case then we have

\displaystyle P(B) = \frac{P(B \cap A)}{P(A)}

\displaystyle \implies P(A) P(B) = P(B \cap A)

This is certainly a useful property as it makes proofs of interesting facts fall out easily when we consider sequences of independent random variables (a related idea) later.

Now consider the fact that P(A|B) = \frac{P(A \cap B)}{P(B)} implies P(A \cap B) = P(A|B)P(B).  Plugging into the formula for P(B|A) we arrive at the following, known as Bayes’ theorem.

\displaystyle P(B|A) = \frac{P(B \cap A)}{P(A)} = \frac{P(A|B) P(B)}{P(A)}

Note that P(A|B) \neq P(B|A).

Next Page »

Blog at WordPress.com.