Table of Contents

A random variable is a function which associates every outcome of a random experiment to some real number. This real number is used to denote a “reward” for every outcome.

is a random variable if,

Range - The range of a random variable is the set of values the random variable can take.

Probability Distribution

A probability distribution describes the probability of the outcomes of a random experiment.

Discrete Random Variable

is a Discrete Random Variable if its range is a countable set (finite set or countably infinite set).

Probability Mass Function

The probability mass function assigns a probability to each possible value of the discrete random variable.

Here is the probability mass function of .

Properties -

Important Discrete Random Variables

Bernoulli Random Variable

A Bernoulli random variable is a discrete random variable that takes values 0 and 1, representing failure and success respectively.

If , then . A random variable being a Bernoulli Random Variable is denoted by

where is the probability of success.

Binomial Random Variable

A Binomial random variable counts the number of successes of independent and identically distributed Bernoulli trials.

A random variable being a Bernoulli Random Variable is denoted by

where is the number of Bernoulli trials and is the probability of success.

Repeated Bernoulli trials are repeated multiple time, we would be interested in the probability of achieving exactly some successes. The probability of successes in a sequence of events would be . There can be number of such sequences where successes occur, each with the same probability of . Thus the total probability ends up being -

Requirements -

  1. Associated with random experiments with only two outcomes (Trials must be Bernoulli trials).
  2. Finite number of trials, .
  3. Probability of success = , Probability of failure = .
  4. Trials must be independent.
  5. The probability distribution of success and failure across the trials should remain identical.

Geometric Random Variable

A Geometric random variable counts the number of independent and identical Bernoulli trials needed until the first success occurs.

A random variable being a Bernoulli Random Variable is denoted by

where is the probability of success.

The random variable holding some value means that the first success occurs after trials. The probability of the random variable can be written as -

Requirements -

  1. Keep doing trials until the first success occurs.
  2. All trials must be independent.
  3. Probability distribution of success and failure must be identical across all trials.

Equally Likely Random Variable

A random variable is an equally likely random variable if all possible values of the random variable have the same probability.

This is also called as the uniform discrete random variable.

Poisson Random Variable

A Poisson random variable counts the number of events occurring in a fixed interval of time/space when -

  1. Events occur independently.
  2. The average rate is a constant.
  3. Two events do no occur at exactly the same interval.

Here is the average arrival rate and is .

Continuous Random Variable

is a Continuous Random Variable if its range is an interval. Any interval is an uncountably infinite set. Thus because of the reasoning given here.

This is why we talk about probability around some neighbourhood of and not at . The probability around some neighbourhood of is literally just the area under the curve from to .

Probability Density Function

A function which describes how probability is distributed over the values of a continuous random variable. The probabilities are obtaining by integrating the function over the interval.

Here is called the probability density function. Properties of PDF -

  1. The area under the probability density curve for the entire interval is 1.
  1. for all .
  2. Probability of an interval is .

Important Continuous Random Variables

Uniform Random Variable

A uniform random variable is a continuous random variable whose probability density function is constant for all in the interval.

If the interval is it is denoted by . As the PDF is a constant, we can say , where is unknown.

Thus,

This is a continuous analog of the Equally Likely Random Variable.

Exponential Random Variable

An exponential random variable models the waiting time until the first occurrence of an event in a process with a constant rate.

A continuous random variable is exponentially distributed if its probability density function is -

Here is the rate of occurrence of events and is always 0.

This is a continuous analog of the Geometric Random Variable.

Gaussian/Normal Distribution

A probability distribution in which values are symmetrically distributed about the mean, with most observations clustering near the mean and fewer occurring as we move away from it.

Cumulative Distribution

The cumulative distribution of a random variable represents the probability accumulated up to .

Properties -

  1. ,
  1. is a non-decreasing function. The graph of would look like -
    • Step Function for Discrete Random Variable.
    • Sigmoid Function for Continuous Random Variable.
  2. is only right continuous as approaching from left can get stuck at a “jump” in CDF of discrete random variable. This jump is for that point. We can say that -
  1. CDF always exists irrespective of whether the random variable is discrete or continuous. PDF doesn’t exist for discrete random variables, and PMF doesn’t exist for continuous random variables.

Expected Value of a Random Variable

The expected value of a random variable is the average of all outcomes achieved by performing a large number of experimental trials.

The expected values for the popular distributions are -

DistributionExpected Value
Bernoulli
Binomial
Geometric
Poisson
Uniform
Exponential
Gaussian
These can be obtained by applying the above formulas and simplifying the equations.

Properties of -

  1. doesn’t generally indicate the most probable value of .
  2. , if is a scalar.
  3. , if is a scalar.

The expectation of a random variable is a linear operator.

E[X] is the best predictor of outcome of an experiment.

Suppose the outcome of a random experiment is and let be the random variable. On an average, we’d like to be as close to the true outcome of . So we want to minimize .

For to be 0, . Thus is the best predictor.

Raw Moments

The raw moments for a random variable are calculated by doing -

  • If , then the moment measures the “center of mass” around .
  • If , then the moment measures the spread of values of around .
  • Only difference the value of makes is whether is odd or even and how large is . Larger values of will emphasis more on the “tails” of the distribution of .
  • If is odd and not 1, then the moment measures the symmetry of around .
  • If is even and not 2, then the moment measures the heaviness of the tails of from .

is known as the central moment with . “Central” because the mean of is 0.

Important moments -

  1. The first moment around the origin is known as the mean .
  2. The second central moment is known as the variance .
  3. The fourth central moment is known as the kurtosis.

Variance

The variance of a random variable measures the spread of the values of the random variable. The square root of the variance is called the standard deviation.

Properties of - 1.

  1. , if is a constant.
  2. as the variance is the expected value of a squared quantity.
  3. .

The variance for the popular distributions are -

DistributionExpected Value
Bernoulli
Binomial
Geometric
Poisson
Uniform
Exponential
Gaussian
These can be obtained by applying the above formulas and simplifying the equations.

Joint Distribution

The joint distribution of multiple random variables is the probability associated to all possible permutation of values the random variables hold simultaneously.

This distribution is also denoted as .

Properties -

  1. .
  2. .

Marginal Distribution

The probability distribution of one random variable holding a fixed value over all possible values for the rest of the random variable is called the margin distribution for that random variable.

Discrete Case

For discrete random variables we sum the probabilities like -

If at every point , we say that and are independent random variables.

Continuous Case

For continuous random variables we integrate the probabilities like -

If at every point , we say that and are independent random variables.

Joint Cumulative Distribution Function

The joint cumulative distribution function tells the probability of a set of random variables holding values less than or equal to some threshold.

Properties -

  1. .
  2. The marginal CDF is written as , where
  1. .
  1. If and ,

Joint Moments

Suppose are two random variables, then

Expected Value

If , then

This means that .

If , then we can’t simplify unless . But if this were to be true then are independent.

So, if are independent, then

Variance

If , then

If are independent then , which would cause this Covariance term to be 0. So if are independent,

Joint Moments

The joint moments of and are defined as -

If we take and calculate the moments about the origin,

  1. If we set we get .
  2. Similarly, if we set we get .
  3. If we set we get . This can be called the correlation iff and are standardized ( and ).

If we take and calculate the central moments,

  1. If we get 0. This makes sense as we are centering and then trying to find its mean.
  2. If we get 0 again due to a similar logic but for .
  3. If we get the Covariance of .

Covariance

Covariance of two random variables is defined as,

If are independent, then and thus the covariance is 0.

Whenever , we say that are uncorrelated.

  • Independent random variables are uncorrelated.
  • But not all uncorrelated random variables are independent.

Correlation

The covariance between two random variables can help understand how one variable’s values affects the other’s, but the problem with it is that its unbounded and depends on the units of measurement.

Example - If means weight (kg) and means height (cm), then is in the units kgcm.

To resolve this, we divide the covariance by the product of the standard deviations of the two random variables.

Here denotes the correlation coefficient between and .

  1. .
  2. means as increases decreases, and vice-versa.
  3. means as increases increases, and vice-versa.
  4. means change in doesn’t mean any change in . In such scenario we say and are orthogonal to each other.

If and are independent -

  • But if , that doesn’t mean that and are independent.

But do there exist a pair of random variables such that if they are uncorrelated, they are independent? YES!

  • Uncorrelated Gaussian Random Variables are necessarily independent.

Joint Conditional Probability

From this we can say that,

The number is a valid probability as it satisfies all the Axioms of Probability. Conditional PMF is a family of functions keeping constant.

We can similarly write the conditional joint probability for continuous random variables as,

Bayes Theorem

Using this we can rewrite the Bayes Theorem w.r.t joint probability.