Table of Contents

Probability Distribution

Discrete Random Variable

Probability Mass Function

Important Discrete Random Variables

Bernoulli Random Variable

Binomial Random Variable

Geometric Random Variable

Equally Likely Random Variable

Poisson Random Variable

Continuous Random Variable

Probability Density Function

Important Continuous Random Variables

Uniform Random Variable

Exponential Random Variable

Normal Distribution

Cumulative Distribution

Expected Value of a Random Variable

Raw Moments

Variance

Joint Distribution

Marginal Distribution

Discrete Case

Continuous Case

Joint Cumulative Distribution Function

Joint Moments

Expected Value

Variance

Joint Moments

Covariance

Correlation

Joint Conditional Probability

Bayes Theorem

A random variable is a function which associates every outcome of a random experiment to some real number. This real number is used to denote a “reward” for every outcome.

$X$ is a random variable if,

\forall w \in Ω, X : Ω \to R

Range - The range of a random variable is the set of values the random variable can take.

Probability Distribution

A probability distribution describes the probability of the outcomes of a random experiment.

Discrete Random Variable

$X$ is a Discrete Random Variable if its range is a countable set (finite set or countably infinite set).

Probability Mass Function

The probability mass function assigns a probability to each possible value of the discrete random variable.

P (X = x_{i}) = p_{X} (x_{i})

Here $p_{x}$ is the probability mass function of $X$ .

Properties -

0 \leq p_{X} (x_{i}) \leq 1

i = 1 \sum n p_{X} (x_{i}) = 1 i = 1 \sum \infty p_{X} (x_{i}) = 1, if X has a finite range, if X has a countably infinite range

P (X = A) = i; x_{i} \in A \sum p_{X} (x_{i})

Important Discrete Random Variables

Bernoulli Random Variable

A Bernoulli random variable is a discrete random variable that takes values 0 and 1, representing failure and success respectively.

If $P (X = Success) = p$ , then $P (X = Failure) = 1 - p$ . A random variable being a Bernoulli Random Variable is denoted by

X \sim Bernoulli (p)

where $p$ is the probability of success.

Binomial Random Variable

A Binomial random variable counts the number of successes of $n$ independent and identically distributed Bernoulli trials.

A random variable being a Bernoulli Random Variable is denoted by

X \sim Binomial (n, p)

where $n$ is the number of Bernoulli trials and $p$ is the probability of success.

Repeated Bernoulli trials are repeated multiple time, we would be interested in the probability of achieving exactly some $k$ successes. The probability of $k$ successes in a sequence of $n$ events would be $p^{k} (1 - p)^{n - k}$ . There can be $^{n} C_{k}$ number of such sequences where $k$ successes occur, each with the same probability of $p^{k} (1 - p)^{n - k}$ . Thus the total probability ends up being -

P (X = k) =^{n} C_{k} \cdot p^{k} (1 - p)^{(n - k)}

Requirements -

Associated with random experiments with only two outcomes (Trials must be Bernoulli trials).
Finite number of trials, $n$ .
Probability of success = $p$ , Probability of failure = $1 - p$ .
Trials must be independent.
The probability distribution of success and failure across the trials should remain identical.

Geometric Random Variable

A Geometric random variable counts the number of independent and identical Bernoulli trials needed until the first success occurs.

A random variable being a Bernoulli Random Variable is denoted by

X \sim Geometric (p)

where $p$ is the probability of success.

The random variable holding some value $k$ means that the first success occurs after $k$ trials. The probability of the random variable can be written as -

P (X = k) = {p (1 - p)^{k - 1} 0, if k = 1, 2, \dots, otherwise

Requirements -

Keep doing trials until the first success occurs.
All trials must be independent.
Probability distribution of success and failure must be identical across all trials.

Equally Likely Random Variable

A random variable is an equally likely random variable if all possible values of the random variable have the same probability.

X \sim EquallyLikely(N) P (X = x_{i}) = \frac{1}{N}, \forall i = 1, 2, \dots, N

This is also called as the uniform discrete random variable.

Poisson Random Variable

A Poisson random variable counts the number of events occurring in a fixed interval of time/space when -

Events occur independently.
The average rate $λ > 0$ is a constant.
Two events do no occur at exactly the same interval.

X \sim Poisson(λ) P (X = k) = e^{- λ} \frac{λ ^{k}}{k !}, k = 0, 1, 2, \dots

Here $λ$ is the average arrival rate and is $> 0$ .

Continuous Random Variable

$X$ is a Continuous Random Variable if its range is an interval. Any interval $[a, b]$ is an uncountably infinite set. Thus $P (X = x_{0}) = 0$ because of the reasoning given here.

This is why we talk about probability around some $△ x$ neighbourhood of $x_{0}$ and not at $x_{0}$ . The probability around some $△ x$ neighbourhood of $x_{0}$ is literally just the area under the curve from $x - \frac{△ x}{2}$ to $x + \frac{△ x}{2}$ .

P (x - \frac{△ x}{2} \leq x + \frac{△ x}{2}) = \int_{x - \frac{△ x}{2}}^{x + \frac{△ x}{2}} f_{X} (x) d x

Probability Density Function

A function which describes how probability is distributed over the values of a continuous random variable. The probabilities are obtaining by integrating the function over the interval.

Here $f_{X} (x)$ is called the probability density function. Properties of PDF -

The area under the probability density curve for the entire interval $[a, b]$ is 1.

\int_{a}^{b} f_{X} (x) d x = 1

$f_{X} (x) \geq 0$ for all $x$ .
Probability of an interval $A$ is $\int_{A} f_{X} (x) d x$ .

Important Continuous Random Variables

Uniform Random Variable

A uniform random variable is a continuous random variable whose probability density function is constant for all $x$ in the interval.

If the interval is $[a, b]$ it is denoted by $X \sim Uniform (a, b)$ . As the PDF is a constant, we can say $f_{X} (x) = c$ , where $c$ is unknown.

\Rightarrow \Rightarrow \Rightarrow \int_{a}^{b} f_{X} (x) d x \int_{a}^{b} c d x c (b - a) c = 1 = 1 = 1 = \frac{1}{b - a}

Thus,

f_{X} (x) = ⎩ ⎨ ⎧ \frac{1}{b - a} 0, if a \leq b, otherwise

This is a continuous analog of the Equally Likely Random Variable.

Exponential Random Variable

An exponential random variable models the waiting time until the first occurrence of an event in a process with a constant rate.

A continuous random variable is exponentially distributed if its probability density function is -

X \sim Exponential (λ) f_{X} (x) = ⎩ ⎨ ⎧ λ e^{- λ x} 0, if x \geq 0, otherwise

Here $λ$ is the rate of occurrence of events and is always $>$ 0.

This is a continuous analog of the Geometric Random Variable.

Gaussian/Normal Distribution

A probability distribution in which values are symmetrically distributed about the mean, with most observations clustering near the mean and fewer occurring as we move away from it.

X \sim N (μ, σ^{2}) f_{X} (x) = \frac{1}{σ 2 π} exp (- \frac{( x - μ ) ^{2}}{2 σ ^{2}}), where - \infty < x < \infty

Cumulative Distribution

The cumulative distribution of a random variable represents the probability accumulated up to $x$ .

F_{X} (x) = P (X \leq x)

Properties -

$0 \leq F_{X} (a) \leq 1$ , $\forall x \in range (X)$
$F_{X} (- \infty) = 0$
$F_{X} (\infty) = 1$

P (a \leq x \leq b) = P (x \leq b) - P (x \leq a) = F_{X} (b) - F_{X} (a)

$F_{X}$ is a non-decreasing function. The graph of $F_{X} (x)$ would look like -
- Step Function for Discrete Random Variable.
- Sigmoid Function for Continuous Random Variable.
$F_{X}$ is only right continuous as approaching from left can get stuck at a “jump” in CDF of discrete random variable. This jump is $P_{X} (x)$ for that point. We can say that -

P_{X} (x) = F_{X} (x^{+}) - F_{X} (x^{-})

CDF always exists irrespective of whether the random variable is discrete or continuous. PDF doesn’t exist for discrete random variables, and PMF doesn’t exist for continuous random variables.

Expected Value of a Random Variable

The expected value of a random variable is the average of all outcomes achieved by performing a large number of experimental trials.

E [X] = i = 1 \sum n x_{i} P (x_{i}) = \int_{- \infty}^{\infty} x f_{X} (x) d x For Discrete R.V For Continuous R.V

The expected values for the popular distributions are -

Distribution	Expected Value
Bernoulli	$p$
Binomial	$n p$
Geometric	$1/ p$
Poisson	$λ$
Uniform	$(b + a) /2$
Exponential	$1/ λ$
Gaussian	$μ$
These can be obtained by applying the above formulas and simplifying the equations.

Properties of $E [X]$ -

$E [X]$ doesn’t generally indicate the most probable value of $X$ .
$E [a X] = a E [X]$ , if $a$ is a scalar.
$E [X + Y] = E [X] + E [Y]$
$E [a] = a$ , if $a$ is a scalar.
$E [a X + b] = a E [X] + b$

The expectation of a random variable is a linear operator.

E[X] is the best predictor of outcome of an experiment.

Suppose the outcome of a random experiment is $b$ and let $X$ be the random variable. On an average, we’d like $b$ to be as close to the true outcome of $X$ . So we want to minimize $(X - b)^{2}$ .

MSE (b) \frac{d MSE}{d b} = E [(X - b)^{2}] = E [X^{2}] - 2 b E [X] + b^{2} = 2 b - 2 E [X]

For $\frac{d MSE}{d b}$ to be 0, $b = E [X]$ . Thus $E [X]$ is the best predictor.

Raw Moments

The raw moments for a random variable $X$ are calculated by doing -

E [(X - c)^{n}] = i = 1 \sum n (x_{i} - c)^{n} p_{X} (x_{i}) = \int_{- \infty}^{\infty} (x_{i} - c)^{n} f_{X} (x) d x

If $n = 1$ , then the moment $E [X - c]$ measures the “center of mass” around $c$ .
If $n = 2$ , then the moment $E [(X - c)^{2}]$ measures the spread of values of $X$ around $c$ .
Only difference the value of $n$ makes is whether $n$ is odd or even and how large is $n$ . Larger values of $n$ will emphasis more on the “tails” of the distribution of $X$ .
If $n$ is odd and not 1, then the moment $E [(X - c)^{n}]$ measures the symmetry of $X$ around $c$ .
If $n$ is even and not 2, then the moment $E [(X - c)^{n}]$ measures the heaviness of the tails of $X$ from $c$ .

$E [(X - μ)^{n}]$ is known as the $n^{t h}$ central moment with $c = μ$ . “Central” because the mean of $X - μ$ is 0.

Important moments -

The first moment around the origin $c = 0$ is known as the mean $μ$ .
The second central moment is known as the variance $σ_{X}^{2}$ .
The fourth central moment is known as the kurtosis.

Variance

The variance $σ_{X}^{2}$ of a random variable measures the spread of the values of the random variable. The square root of the variance $σ_{X}$ is called the standard deviation.

Properties of $σ_{X}^{2}$ - 1.

Var (X) = E [(X - μ)^{2}] = E [X^{2} - 2 μ X + μ^{2}] = E [X^{2}] - 2 E [μ X] + E [μ^{2}] = E [X^{2}] - 2 μ E [X] + μ^{2} = E [X^{2}] - 2 μ^{2} + μ^{2} = E [X^{2}] - μ^{2} = E [X^{2}] - E [X]^{2}

$Var (b) = 0$ , if $b$ is a constant.
$Var (X) \geq 0$ as the variance is the expected value of a squared quantity.
$Var (a X + b) = a^{2} Var (X)$ .

The variance for the popular distributions are -

Distribution	Expected Value
Bernoulli	$p (1 - p)$
Binomial	$n p (1 - p)$
Geometric	$1 - p / p^{2}$
Poisson	$λ$
Uniform	$(b - a)^{2} /12$
Exponential	$1/ λ^{2}$
Gaussian	$σ_{X}^{2}$
These can be obtained by applying the above formulas and simplifying the equations.

Joint Distribution

The joint distribution of multiple random variables is the probability associated to all possible permutation of values the random variables hold simultaneously.

P (X = x, Y = y) = P (X \cap Y)

This distribution is also denoted as $p_{X Y}$ .

Properties -

$0 \leq p_{X Y} (x, y) \leq 1$ .
$\sum_{i, j} p_{X Y} (x_{i}, y_{j}) = 1$ .

Marginal Distribution

The probability distribution of one random variable holding a fixed value over all possible values for the rest of the random variable is called the margin distribution for that random variable.

Discrete Case

For discrete random variables we sum the probabilities like -

j \sum p_{X Y} (x_{i}, y_{j}) i \sum p_{X Y} (x_{i}, y_{j}) = p_{X} (x_{i}) = p_{Y} (y_{j})

If $P ({X = x_{i}} \cap {Y = y_{j}}) = p_{X} (x_{i}) \cdot p_{Y} (y_{j})$ at every point $(x_{i}, y_{j})$ , we say that $X$ and $Y$ are independent random variables.

Continuous Case

For continuous random variables we integrate the probabilities like -

\int_{y - △ y}^{y + △ y} f_{X Y} (x_{i}, y_{j}) d y \int_{x - △ x}^{x + △ x} f_{X Y} (x_{i}, y_{j}) d x = f_{X} (x_{i}) = f_{Y} (y_{j})

If $f_{X Y} (x, y) = f_{X} (x) \cdot f_{Y} (y)$ at every point $(x_{i}, y_{j})$ , we say that $X$ and $Y$ are independent random variables.

Joint Cumulative Distribution Function

The joint cumulative distribution function tells the probability of a set of random variables holding values less than or equal to some threshold.

F_{X Y} = P (X \leq x, Y \leq y)

Properties -

$0 \leq F_{X Y (x, y)} \leq 1$ .
The marginal CDF is written as $F_{X} (x)$ , where

F_{X} (x) F_{Y} (y) = F_{X Y} (x, \infty) = F_{X Y} (\infty, y)

$F_{X Y} (\infty, \infty) = 1$ .

F_{X Y} (x, - \infty) F_{X Y} (- \infty, y) = 0 = 0

If $x \leq x_{1}$ and $y \leq y_{1}$ ,

F_{X Y} (x, y) \leq F_{X Y} (x_{1}, y_{1})

Joint Moments

Suppose $X, Y$ are two random variables, then

Expected Value

If $Z = X + Y$ , then

E [X + Y] = E [X] + E [Y]

This means that $E [a X + b Y] = a E [X] + b E [Y]$ .

If $Z = X Y$ , then we can’t simplify $E [Z] = E [X Y]$ unless $P_{X Y} (x_{i}, y_{j}) = P_{X} (x_{i}) \cdot P_{Y} (y_{j})$ . But if this were to be true then $X, Y$ are independent.

So, if $X, Y$ are independent, then

E [X Y] = E [X] E [Y]

Variance

If $Z = X + Y$ , then

Var [X + Y] = Var [X] + Var [Y] + 2 Covariance (E [X Y] - E [X] E [Y])

If $X, Y$ are independent then $E [X Y] = E [X] E [Y]$ , which would cause this Covariance term to be 0. So if $X, Y$ are independent,

Var [X + Y] σ_{Z}^{2} = Var [X] + Var [Y] = σ_{X}^{2} + σ_{Y}^{2} (Pythagoras’ Theorem in Stats)

Joint Moments

The joint moments of $X$ and $Y$ are defined as -

E [X^{m} Y^{n}] E [X^{m} Y^{n}] = i \sum j \sum (x_{i} - c_{1})^{m} (y_{j} - c_{2})^{n} P_{X Y} (x_{i}, y_{j}) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} (x_{i} - c_{1})^{m} (y_{j} - c_{2})^{n} f_{X Y} (x_{i}, y_{j}) d y d x (For Discrete) (For Continuous)

If we take $(c_{1}, c_{2}) = (0, 0)$ and calculate the moments about the origin,

If we set $m = 1, n = 0$ we get $E [X]$ .
Similarly, if we set $m = 0, n = 1$ we get $E [Y]$ .
If we set $m = 1, n = 1$ we get $E [X Y]$ . This can be called the correlation iff $X$ and $Y$ are standardized ( $μ = 0$ and $σ = 1$ ).

If we take $(c_{1}, c_{2}) = (E [X], E [Y])$ and calculate the central moments,

If $m = 1, n = 0$ we get 0. This makes sense as we are centering $X$ and then trying to find its mean.
If $m = 0, n = 1$ we get 0 again due to a similar logic but for $Y$ .
If $m = 1, n = 1$ we get the Covariance of $X, Y$ .

Covariance

Covariance of two random variables is defined as,

Cov (X, Y) = E [X Y] - E [X] E [Y]

If $X, Y$ are independent, then $E [X Y] = E [X] E [Y]$ and thus the covariance is 0.

Whenever $Cov (X, Y) = 0$ , we say that $X, Y$ are uncorrelated.

Independent random variables are uncorrelated.
But not all uncorrelated random variables are independent.

Correlation

The covariance between two random variables can help understand how one variable’s values affects the other’s, but the problem with it is that its unbounded and depends on the units of measurement.

Example - If $X$ means weight (kg) and $Y$ means height (cm), then $X Y$ is in the units kgcm.

To resolve this, we divide the covariance by the product of the standard deviations of the two random variables.

ρ_{X Y} = \frac{Cov ( X , Y )}{σ _{X} σ _{Y}} = E [\frac{X - E [ X ]}{σ _{X}}] \cdot E [\frac{Y - E [ Y ]}{σ _{Y}}]

Here $ρ_{X Y}$ denotes the correlation coefficient between $X$ and $Y$ .

$- 1 \leq ρ_{X Y} \leq 1$ .
$ρ_{X Y} = - 1$ means as $X$ increases $Y$ decreases, and vice-versa.
$ρ_{X Y} = + 1$ means as $X$ increases $Y$ increases, and vice-versa.
$ρ_{X Y} = 0$ means change in $X$ doesn’t mean any change in $Y$ . In such scenario we say $X$ and $Y$ are orthogonal to each other.

If $X$ and $Y$ are independent -

$ρ_{X Y} = 0$
But if $ρ_{X Y} = 0$ , that doesn’t mean that $X$ and $Y$ are independent.

But do there exist a pair of random variables such that if they are uncorrelated, they are independent? YES!

Uncorrelated Gaussian Random Variables are necessarily independent.

Joint Conditional Probability

p_{X, Y} (x_{i}, y_{j}) = P (X = x_{i}, Y = y_{j}) = P (Y = y_{j} ∣ X = x_{i}) P (X = x_{i}) = P (Y = y_{j} ∣ X = x_{i}) p_{X} (x_{i}) (Law of Total Probability) (p_{X} is the marginal probability)

From this we can say that,

p_{Y ∣ X} (y_{j} ∣ x_{i}) = \frac{p _{X, Y} ( x _{i} , y _{j} )}{p _{X} ( x _{i} )}

The number $p_{Y ∣ X} (y_{j} ∣ x_{i})$ is a valid probability as it satisfies all the Axioms of Probability. Conditional PMF is a family of functions keeping $x_{i}$ constant.

We can similarly write the conditional joint probability for continuous random variables as,

f_{Y ∣ X} (y_{j} ∣ x_{i}) = \frac{f _{X, Y} ( x _{i} , y _{j} )}{f _{X} ( x _{i} )}

Bayes Theorem

Using this we can rewrite the Bayes Theorem w.r.t joint probability.

p_{Y ∣ X} (y_{j} ∣ x_{i}) f_{Y ∣ X} (y_{j} ∣ x_{i}) = \frac{p _{X ∣ Y} ( x _{i} ∣ y _{j} ) \cdot p _{Y} ( y _{j} )}{\sum _{j} p _{X ∣ Y} ( x _{i} ∣ y _{j} ) \cdot p _{Y} ( y _{j} )} = \frac{f _{X ∣ Y} ( x _{i} ∣ y _{j} ) \cdot f _{Y} ( y _{j} )}{\int _{- \infty}^{\infty} f _{X ∣ Y} ( x _{i} ∣ y _{j} ) \cdot f _{Y} ( y _{j} ) d y} (For Discrete) (For Continuous)

GATE & BS Notes

Explorer

Random Variables

Probability Distribution

Discrete Random Variable

Probability Mass Function

Important Discrete Random Variables

Bernoulli Random Variable

Binomial Random Variable

Geometric Random Variable

Equally Likely Random Variable

Poisson Random Variable

Continuous Random Variable

Probability Density Function

Important Continuous Random Variables

Uniform Random Variable

Exponential Random Variable

Gaussian/Normal Distribution

Cumulative Distribution

Expected Value of a Random Variable

E[X] is the best predictor of outcome of an experiment.

Raw Moments

Variance

Joint Distribution

Marginal Distribution

Discrete Case

Continuous Case

Joint Cumulative Distribution Function

Joint Moments

Expected Value

Variance

Joint Moments

Covariance

Correlation

Joint Conditional Probability

Bayes Theorem

Graph View

Table of Contents

Backlinks