Probability Spaces

October 24, 2008

Probability Spaces

Filed under: Measure Theory,Probability Theory — cjohnson @ 1:03 am

Recall that given a set $\Omega$ , a $\sigma$ -algebra on $\Omega$ is a collection of subsets of $\Omega$ , call it $\mathcal{F}$ , that satisfies the following properties.

$\displaystyle \emptyset, \Omega \in \mathcal{F}$
$\displaystyle A \in \mathcal{F} \implies A^\complement \in \mathcal{F}$
$\displaystyle (A_n)_{n \in \mathbb{N}} \subseteq \mathcal{F} \implies \bigcup_{n \in \mathbb{N}} A_n \in \mathcal{F}$

A function $P : \mathcal{F} \to \mathbb{R}$ is called a measure if it satisfies the properties

$\displaystyle A \subseteq B \implies P(A) \leq P(B)$
$\displaystyle (A_n)_{n \in \mathbb{N}}, A_i \cap A_j = \emptyset \implies P\left(\bigcup_{n \in \mathbb{N}} A_n\right) = \sum_{n \in \mathbb{N}} P(A_n)$
$\displaystyle P(\emptyset) = 0$

The triple $(\Omega, \mathcal{F}, P)$ is then called a measure space. In the event that $P(\Omega) = 1$ we say that our triple is in fact a probability space. This is a formalization of our intuitive ideas of what a “probability space” should be — we have a set of all things that could conceivably happen (a sample space), a family of subsets of items from that sample space (events), and a way to assign a numerical value (a probability) to each of those events. The properties of a measure, in particular, aren’t particularly surprising in this context. If we have one event contained in another event, we’d expect the probability of the larger event to be greater than the probability of the smaller event; if we have a sequence of disjoint events, the probability at least one of them occurs is the sum of their probabilities; the probability something happens is one, and the probability nothing happens is zero.

One especially useful property of measures in general is that if $A$ and $B$ are measurable sets with $B \subseteq A$ and $A$ has finite measure, then $P(B) = P(A) - P(A \setminus B)$ . When dealing with probabilities, we’re given that $P(\Omega) = 1$ , so for any $A \in \mathcal{F}$ we have $P(A^\complement) = P(\Omega) - P(\Omega \setminus A^\complement) = 1 - P(A)$ .

The traditional first examples of probability theory, dice, coins, and cards, are easy to express in these more formal terms. A roll of a standard six sided die, for instance, has $\Omega = \{1, 2, 3, 4, 5, 6\}$ , $\mathcal{F} = 2^{\Omega}$ and $P(A) = \frac{|A|}{6}$ where $2^{\Omega}$ denotes the power set (the collection of all subsets) of $\Omega$ and $|A|$ is the cardinality of $A$ . In general, given a finite set sample space, the power set of $\Omega$ defines a $\sigma$ -algebra, and $P(A) = \frac{|A|}{|\Omega|}$ gives a measure where all events of the same size (cardinality) are equally likely.