|
MSC Classification: 60 (Probability theory) |
Prerequisites: combinatorics
Getting Oriented
| Rough Guides to Probability |
Probability is the study of chance…
Combinatorics
Before one can ask “how likely?” one must ask “how many?”. The answer to that question is given by combinatorics. The basic question is: how many ways are there to choose
objects out of a set of
? Well, the answer depends on whether order matters, whether they are distinct objects, and so on:
- There are
ways to order a set of
objects; - Combination: there are
ways to select
objects out of
when order does not matter; - Permutation: there are
ways to make this selection when order matters; - Multichoose: there are
ways to partition
objects into sets of size
.
These are related to powers of polynomials by the Binomial Theorem
(1)
and the Multinomial Theorem
(2)
Basic Probability
Probability theory uses the notion of an experiment, and an outcome for that experiment, also called an event. The sample space of events is the set of possible outcomes. We may, of course, talk about unions, intersections (denoted
rather than
), and complements of events (and sets of events). The probability of certain outcomes is given by the probability measure:
Definition
A probability measure is a function
, where
is the sample space, such that: (1)
; and (2)
if the events
are disjoint.
For example, if all outcomes in a given sample space are equally likely, then the probability of event
occurring is
.
One can conclude from these axioms that
and that if
then
. We can also show that for an increasing (decreasing) sequence of events
, we have
. The Principle of Inclusion and Exclusion, or PIE, states that
, and can easily be extended to any number of sets.
Conditional Probability
If an experiment is repeated, the first outcome may affect the probability of the second. In an extreme example, if someone draws two balls out of a bag contains a black ball and a white ball, then the color of the second ball is completely determined by the color of the first. This is an example of conditional probability
Definition
Conditional probability is the probability of an event
given that an event
has already occurred, given by
.
Note that conditional probability
is itself a probability measure.
The above formulae give rise to several computational rules: for two events, we have
and
. Both of these may be generalized for several events.
If the first outcome does not effect the second, then the two events are independent and
, in which case
.
The Odds Ratio
The odds ratio of an event
is given by

which uniquely determines the probability. If an event
has already occurred, the odds ratio becomes
, so is just multiplied by some factor.
Random Variables
Discrete Random Variables
A random variable
may encode the outcomes of an experiment if they are numerical. For discrete outcomes, the probability mass function is
, and the cumulative distribution function is
. Note that
,
, and that
is right continuous. Given a random variable, we have:
- expected value: the linear function
; - expected value of a function
; - variance:
; - standard deviation:
.
Examples:
Many experiments have just two outcomes: success and failure. This simple experiment gives rise to several random variables:
- Bernoulli random variable
: a single experiment is performed, with probability
of success; - Binomial random variable
:
independent Bernoulli trials are performed, with probability of
successes given by
. This has expected value
and variance
; - Poisson random variable
: an approximation of
for
large, given by
. The parameter
is both the expected value and the variance; - Geometric random variable
: measures the time until the first success in independent Bernoulli trials, given by
. Here,
and
; - Negative binomial random variable
: measures the number of trials before
successes, given by
. We have
and
.
When taking
objects from a sample of
white and
black objects, the probability of getting
white objects is given by the hypergeometric random variable
, with
. The expected value is
.
Continuous Random Variables
Random variables may also be continuous (take the speeds of cars along a highway for example), in which case the //probability mass function
} is defined on a continuous set. Rather than exact values, we usually speak of the random variable lying in some range of values
. When
is continuous, the probability of a precise value being taken is always zero, but if
has a discontinuity at
, then the probability of
taking the value
is just the size of the jump. We have:
- cumulative distribution function:
(so
); - expected value of
:
; - expected value of a function
:
; - variance: as before,
.
A function
of a continuous random variable
has its own probability mass function:
.
Examples:
- Uniform Distribution: all outcomes in an interval
are equally probable, giving a mass function
; - Normal Distribution: given by
, with parameters
(the mean) and
(the standard deviation). It is an amazing fact that this distribution approximates the behavior of almost all experiments when the number of trials is very high; - Exponential Distribution: given by
, with parameter
. Its mean and variance are
and
. - Gamma Distribution: given by
for
, where
is the standard gamma function. It has expected value
and variance
; - Beta Distribution: given by
for
, where
. Its mean and variance are
and
.
The exponential distribution is closely related to the hazard or failure rate function
. Here,
represents the probability that an item which is
years old will fail within an additional time
. For the exponential distribution,
is constant, so failure does not become more likely over time. Actually, the exponential distribution is the only one having a constant failure rate.
Jointly Distributed Random Variables
This section generalizes both random variables and conditional probability. When the outcome of two random values (discrete or continuous) are not independent, we have a joint probability mass function given by
and a corresponding cumulative distribution function
. Note that the cumulative functions for
and
can be found from
by
and
.
The random variables
and
are independent iff their mass function factors into separate functions of
and
. Otherwise, we have a conditional probability mass function given by
.
The distribution of the sum of two random variables is given by their convolution
. This allows us to show that the distribution of several independent normal random variables with parameters
is again a normal random variable with parameters
. Parameters also add for the sum of independent Poisson variables.
Properties of Expectation
The expected value of a function
is
in the discrete case, and with an integral in the continuous case. Clearly, the sum of expected values is the expected value of the sums.
The covariance of
and
is defined as
, which can be shown to equal
. Note that
. Covariance is related to variance by the identity
. We define the correlation between
and
to be
.
Assuming that
, we can define the conditional expected value of
. This is given by
in the discrete case and
in the continuous case. Note that
; as a consequence we have
in the discrete case, and a corresponding formula in the continuous case.
We can also define the conditional variance:
, giving the formula
.
Given a random variable
, the moment generating function is given by
. The moments of
can then be found by differentiating
and evaluating the result at
. This uniquely determines the distribution of the random variable. We also have
, if
and
are independent.
The multivariate normal distribution is defined for linear combinations of a finite set of independent standard normal random variables. They have sample mean
and sample variance
. Both
and
are random variables, for independent identically distributed
, and they are independent. The variable
has a normal distribution, with mean
and variance
.
Going Further
Limit Theorems
The main idea in this section is proving that as the number of trials of an experiment grows very large, we will see a normal distribution.
First, we have two inequalities. The Markov Inequality states that
for nonnegative random variables. The Chebyshev Inequality states that
for all positive
. These can be used to prove:
Theorem (Central Limit Theorem)
Given independent identically distributed random variables
with mean
and variance
, we have
approaches
as
. This just means that the sum
approaches the normal distribution.
As a consequence, we have the Strong Law of Large Numbers, which states that the average success of
trials of such variables approaches their mean, as
.





