1. Home
statistics

Statistics Tutorial Concepts - From Beginner to Pro

Master all key statistical concepts, from data collection to analysis, with this comprehensive tutorial.

  • 20
  • 3
right-top-arrow
6

Cumulative Distribution Function (CDF)

Updated on 26/09/2024435 Views

In the Probability curriculum, the Cumulative Distribution Function of a real-valued random variable considered as "X" is evaluated at x, the probability that X takes as a value less than or equal to the x.

A random variable is when a variable defines the possible outcome of any unexpected phenomenon. This is defined for both a discrete and a random variable.

Moreover, it is also used to specify the distribution of multivariate random variables. If your random variable is above a certain level, then it would be known as a complementary cumulative Distribution function or tail distribution.

In this article, we'll learn about cumulative distribution function, its properties, formulas, applications, and examples.Let's start by understanding what a CDF function or, say, a Cumulative Distribution Function is.

What is Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF function) tells us the chance that a random number will be less than or equal to a specific value. It's like a map showing how likely different values are. You can use this info to make an Excel graph showing the probability distribution.

The Cumulative Distribution Function (CDF) helps us find the total probability up to a certain point. It's handy for determining the likelihood of a random event and comparing probabilities between different outcomes.

For discrete data, it adds up the probabilities until the value we're interested in. It calculates the area under the probability curve up to that point for continuous data.

Now, let us understand the Cumulative Distribution Function of Formula.

Cumulative Distribution Function Formula

For discrete random variables, the Cumulative Distribution Function (CDF) tells us the probability that the variable is less than or equal to a specific value.

So, if we want to find the probability between two specific points, say 'a' and 'b,' we just subtract the CDF value of 'a' from the CDF value of 'b,' represented as:

\[ P(a < X \leq b) = F_X(b) - F_X(a) \]

The CDF is calculated slightly differently for continuous random variables. We express it using the cumulative probability density function (pdf).

If the random variable has a chance of having a specific value, say 'b', then we need to be a bit careful.

We subtract the CDF value at 'b' from the limit of the CDF as we approach 'b' from the left side. This accounts for the probability concentrated at 'b,' shown as:

\[ P(X = b) = F_X(b) - \lim_{{x \to b^-}} F_X(x) \]

Now, let us understand Cumulative Distribution Function Properties.

Cumulative Distribution Function Properties

The CDF of a normal distribution has the following essential properties:

Every CDF Fx is non-decreasing and right continuous limx→-∞Fx(x) = 0 and limx→+∞Fx(x) = 1 For all the real numbers a and b with a continuous random variable X, the function fx is equal to the derivative of Fx.

If X is an entirely discrete random variable, where it assumes the values as x1, x2, x3… with probability pi = p(xi), and the CDF of X will be discontinuous at the points xi: FX(x) = P(X ≤ x) =

∑𝑥𝑖≤𝑥𝑃(𝑋=𝑥𝑖)=∑𝑥𝑖≤𝑥𝑝(𝑥𝑖)

This function is defined for all real numbers; occasionally, an implicit definition is used instead of an explicit one. The CDF is a key idea in PDFs (Probability Distribution Functions).

X is the random variable in this straightforward example of CDF, provided by rolling a fair six-sided die.

We are aware that the following is the probability of rolling a six-sided die:

The probability of receiving 1 equals P(X≤1) = 1 / 6.

The probability of receiving 2 equals P(X≤ 2) = 2 / 6.

The probability of obtaining 3 is equal to P(X≤3) = 3 / 6.

The probability of receiving 4 is equal to P(X≤ 4 ) = 4 / 6.

The probability of receiving 5 equals P(X≤5) = 5 / 6.

Probability of receiving 6 is equal to P(X≤6) = 6 / 6 = 1

From the above, it is noted that the probability value always falls between 0 and 1, and it is non-decreasing and right-continuous in nature.

Now, let us understand how to use the Cumulative Distribution Function Formula.

How to use Cumulative Distribution Functions?

Cumulative probability distribution functions are helpful in telling you the chance that the next thing you observe will be equal to or less than a specific value. Knowing this can be super useful when making decisions, especially when uncertainty is involved.

Moreover, cumulative distribution probabilities are equivalent to percentiles. A Cumulative probability of 0.80 is the same as the 80th percentile. That's why CDFs are great for finding percentiles.

Let's say we want to know the chance that an adult male in the U.S. is less than or equal to 6 feet tall. We use a Cumulative probability Distribution Function (CDF) to find this. However, we first need to know what kind of distribution represents the data to use in a CDF.

For instance, the heights of adult males in the U.S. usually follow a normal distribution, which means we use a normal CDF. We also need to know specific details about this distribution, like its average (mean) height and how spread the heights are (standard deviation).

The standard height of an adult male in the U.S. is about 69.2 inches, and the standard deviation is around 2.66 inches. With this information, we can use the normal CDF to determine the likelihood of someone being 6 feet tall or less. Since 6 feet is 72 inches, we plug that into the CDF calculation.

Now, let's compare distributions.

Comparing Distributions

Cumulative distribution functions are best for comparing two distributions. By comparing a CDF of two random variables, we can check if one is more likely to be less than or equal to a specific value than the other. This helps us decide whether one is more likely to have a particular property.

We will compare how common it is to find men who are 6 feet tall to women who are 6 feet tall. To do this, we'll use some math called the normal CDF. This helps us determine whether a woman will be 6 feet tall or shorter. For women, heights typically spread out in a pattern that looks like a bell curve, where most are around an average height, and fewer are much taller or shorter.

We know that women are, on average, about 64.3 inches tall, and the usual change from this average is about 2.58 inches.

The numbers show that almost all women, about 99.9%, are 6 feet tall or shorter. That's like saying they're in the top 0.1% tallest among women. On the other hand, around 85.4% of men are shorter than 6 feet.

If we compare the chances, we find that men taller than 6 feet are about 103 times more likely to be seen than women taller than 6 feet. For us as clothing makers, this information is handy because it tells us that finding a woman over 6 feet tall is rare!

But which is the best out of probability distribution function and cumulative distribution function?

Probability Distribution Function vs Cumulative Distribution Function

Cumulative and probability distribution functions define a random variable's distribution. Additionally, both PDF and CDF display the same underlying probability information but in very different ways.

The PDF shows the shape of the distribution, while a CDF would describe the accumulation of probabilities as the value of a random variable increases.

Aspect

Probability Density Function (PDF)

Cumulative Distribution Function (CDF)

Representation

Often represented graphically as a curve where the area under the curve represents probabilities.

Typically shown as a curve or step function where the height at each point represents cumulative probabilities.

Focus

Focuses on the likelihood of the random variable being at a particular value or range of values.

Focuses on the cumulative probability of the random variable being less than, or equal to a specific value.

Usefulness

Useful for understanding the shape of the distribution and the relative likelihoods of different outcomes.

Useful for analyzing probabilities up to a certain point and comparing probabilities between different outcomes.

Integration

Integrating over the entire range gives the total probability, which equals 1.

Area under the curve up to a specific point represents the cumulative probability up to that point.

Example

In a normal distribution, the PDF curve peaks at the mean and decreases symmetrically on both sides.

For a normal distribution, the CDF starts at 0 and rises gradually to 1, often following an S-shaped curve.

Calculation

Calculated by finding the derivative of the cumulative distribution function.

Calculated by summing or integrating probabilities up to a specific point.

Comparison

Comparing two PDFs helps understand the relative likelihoods of different outcomes for different distributions.

Comparing two CDFs helps determine which distribution is more likely to have a particular property or outcome.

Practical Application

Used in statistical analysis, hypothesis testing, and probability calculations.

Used for predicting probabilities of events, making decisions under uncertainty, and analyzing datasets.

Frequently Asked Questions

  1. What is a Cumulative Distribution Function (CDF)?

The Cumulative Distribution Function (CDF) gives the probability that a random variable is less than or equal to a certain value.

  1. How is the CDF different from a Probability Density Function (PDF)?

The CDF shows cumulative probabilities rather than likelihoods at specific points, while the PDF represents the likelihood of the random variable taking on a particular value.

  1. What does the CDF graph look like?

The CDF graph typically starts at 0 and ends at 1, rising steadily or in steps depending on the distribution.

  1. How is the CDF useful in statistics?

It's helpful in analyzing probabilities of outcomes in a dataset and understanding the cumulative probability distribution of a random variable.

  1. Can the CDF be used for continuous and discrete random variables?

The CDF can be used for continuous and discrete random variables by summing or integrating probabilities.

  1. How can I calculate probabilities using the CDF?

Probabilities can be calculated by finding the CDF value at a given point.

  1. What is the relationship between the CDF and the Survival Function?

The Survival Function is one minus the CDF, representing the probability that the random variable is more significant than a particular value.

  1. Can CDF ever be negative?

CDF values are never negative; they range from 0 to 1.

image

Ashish Kumar Korukonda

9+ years experienced data analytics professional, Currently leading entire Analytics unit which includes Analytical Engineering, Product & Busine…Read More

Get Free Career Counselling
form image
+91
*
By clicking, I accept theT&Cand
Privacy Policy
image
Join 10M+ Learners & Transform Your Career
Learn on a personalised AI-powered platform that offers best-in-class content, live sessions & mentorship from leading industry experts.
right-top-arrowleft-top-arrow

upGrad Learner Support

Talk to our experts. We’re available 24/7.

text

Indian Nationals

1800 210 2020

text

Foreign Nationals

+918045604032

Disclaimer

upGrad does not grant credit; credits are granted, accepted or transferred at the sole discretion of the relevant educational institution offering the diploma or degree. We advise you to enquire further regarding the suitability of this program for your academic, professional requirements and job prospects before enr...