View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
  1. Home
  2. Data Science
  3. Hypothesis Testing

Hypothesis Testing Course Online

Hypothesis testing is one of the most pivotal concepts in statistics with many real-life applications. Get Hypothesis Testing Programs from the World’s Top Universities

banner image

Hypothesis Testing Course Overview

Hypothesis testing is one of the most pivotal concepts in statistics with many real-life applications. It is used by researchers all over the world to test new theories before implementing them. It helps different companies set a baseline quality of their product and decide on improvements.

There are a lot of different parts of testing a hypothesis, from creating a statistical statement to calculations. Nowadays, most of the work in this field is done using software like python, Minitab, SQL, or R.

Apart from using software, testing problems can also be solved by hand, though the process would be time-consuming and tedious.

Simply put, hypothesis testing is a process of examination of claims made against a process with the help of observed data. The process can be anything and is not related to only statistical problems.

Consider a set of random variables X1,X2, X3, ..., XN.

Let F denote the distribution function of the set of random variables.

Note that F is chosen to keep with the experiment’s model belonging to a family of distributions .

Now, the above problem would fall under the umbrella of hypothesis testing if a suggestion of the form

H0 : F 0

is encountered, where 0 is a specified proper subset of .

A statistical hypothesis is a statement used to examine the validity of claims made about the distributions of a set of random variables. The examination process is performed based on a set of observations on the random variable.

The process of examination of the above claims is known as hypothesis testing.

Definition:

If a hypothesis H0 (taken together with the model) specifies the joint distribution of X1,X2, X3, ..., Xn completely, then it is known as a simple hypothesis.

If H0 does not specify the joint distribution completely, it is said to be a composite hypothesis.

types of setup

Parametric Setup

A problem of hypothesis testing falls under a parametric setup if it is assumed that the distribution function F belonging to the set of random variables X1,X2, X3, ..., Xn is known (usually assumed to follow a Normal distribution) except for some parameter or parameters .

Non-parametric Setup

A non-parametric setup is used for testing a hypothesis when the assumption of normality is violated. The different tests, like the t-test and f-test, work efficiently when the random variables follow a normal distribution. But for non-normal distributions, these methods are sub-optimum.

Another term used to define a non-parametric setup is distribution-free because the procedures used for testing under this case do not depend on the distribution of the random variables.

Null Hypothesis

In a testing problem, the statistical hypothesis statement that equates to two or more possible outcomes of the experiment is known as a null hypothesis. It is usually taken to be the observed difference between the testing parameters.

It is denoted by H0.

Example:

Consider a testing problem where it is required to test if the mean of a particular distribution, indicated by F (say), acquires a specific value 0 (say). If denotes the mean of the distribution, the null hypothesis will be -

H0 : =0

null hypothesis

Alternative Hypothesis

An alternative or alternate hypothesis is proposed in a testing problem to counter the null hypothesis. If the data from the experiment contradicts the null hypothesis, the alternate hypothesis is suggested as another option.

It is generally represented by H1 or Ha.

Example:

Consider a testing problem where it is required to test if the mean of a particular distribution, indicated by F (say), acquires a specific value 0 (say). If denotes the mean of the distribution, the null hypothesis will be -

H0 : =0

Now, if the data obtained contradicts H0, then it gets rejected by the experimenter, and the alternate hypothesis gets accepted, denoted by -

H1 : 0

The testing problem is usually written as

To test - H0 : =0against H1 : 0

The alternate hypothesis can also be of the form -

H1 : <0 or H1 : > 0

Rejection or acceptance of a null hypothesis

A null hypothesis is rejected or accepted based on the data collected by the experimenter.

Consider the following testing problem:

Let X1,X2, X3, ..., Xn be a set of random variables independently and identically distributed following a normal distribution with mean and standard deviation 0, where the value of 0 is known.

To test: H0 : =0 against H1 : 0

Where the value of 0 is known.

Now, one can carry out testing in two ways. Either a particular test can be used, or simply the mean of the distribution can be calculated using the observed values of X.

Suppose, after calculation, the mean value comes out to be X. Two cases may arise.

Case I: 0=X

In this case, the observed data does not contradict the null hypothesis, so the null hypothesis is not rejected in favor of the alternate hypothesis.

Case II: 0X
In this case, the observed data contradicts the null hypothesis, so it gets rejected in favor of the alternate hypothesis.

How do we correctly calculate the p-value to make an informed decision to accept or reject our null and alternate

In general, the P-value associated with a test statistic in a testing problem denotes the probability that a given point lies in withing the critical region. Experimenters use these values to decide whether to accept or reject a null hypothesis.

So, P-value or Probability value is a measure of the probability of occurrence of the event under study by the experimenter under the conditions of a null hypothesis.

Example:
Let there be a bulb manufacturer who claims that a particular lot of bulbs have a lifetime of units. Suppose N bulbs are present in the lot.

This will constitute a testing problem of the form:

To test: H0: Average lifetime of the bulbs is units

Against

H1: Average lifetime of the bulbs is not units.

Let a sample of size n be drawn randomly from the N bulb.

Now, if on calculation the average lifetime of the n bulbs attaints a value very close to (exact value can never be attained due to underlying errors), then the value of the calculated test statistic chosen will match the value of the statistic assumed under the conditions of the null hypothesis. In this case, the P-value will be close to 1 (but never equal to 1).

Suppose the average lifetime of the n bulbs differs significantly from, then the calculated value of the test statistic will also differ significantly from the value that the test statistic assumes under the conditions of the null hypothesis. In this case, the P-value will be close to 0 (but never equal to 0).

In a testing problem, the null hypothesis is not rejected in favor of the alternate hypothesis if the calculated value of the test statistic (denoted by Tcalc, say) chosen falls within the region of acceptance, denoted by W.

If the value of Tcalc falls outside W, then the null hypothesis is rejected in favor of the alternate hypothesis.

Type I Error

Such a case may arise wherein Tcalc W, still the null hypothesis gets rejected.

This type of error is known as type I error.

Definition:

The error committed by rejecting a true null hypothesis is known as a type I error.

Type II Error

It may also happen that Tcalc W, but still, the null hypothesis does not get rejected in favor of the alternate hypothesis. This type of error is known as type II error.

Definition:

The error committed by accepting a false null hypothesis is known as a type II error.

Situation






Decision

H0 True

H0 False

H0 Rejected

Type I Error

Correct Decision

H0 Not Rejected

Correct Decision

Type II Error

In a testing problem, the choice of the null hypothesis depends highly should be made keeping in mind both types of errors. A test is termed as good if both types of errors are kept under control since, for practical purposes, it is impossible to get rid of any errors.

Now, it is assumed that the commission of the errors is a random event. As such, the experimenters can easily calculate the probabilities associated with them.

Since the problem of hypothesis testing consists of a missing parameter (say ), the probabilities will also depend on it.

Probability Associated With Type I Error

The probability of type I error associated with is given by:

P [Type I Error] =P [(X1,X2, X3, ..., XN) W]= P(W), 0

Where

X1,X2, X3, ..., XN denotes the population under study

W denotes the acceptance region

0 denotes a specified proper subset of the parameter space

Let be any number such that 0<<1. This value indicates the level at which the probability of type I error should be kept for a good test. So we have,

P(W) = , 0 is known as a test's significance level.

Probability Associated With Type II Error

The probability of type Ii error associated with is given by:

P [Type II Error] =P [(X1,X2, X3, ..., XN) A]= P(A), -0

Where

X1,X2, X3, ..., XN denotes the population under study

A denotes the rejection region

-0 denotes a specified proper subset of the parameter space

Relationship Amid the Probabilities of Type I and Type II Error

The region of acceptance, W, and the rejection region A can be thought of as two sets in the cartesian plane. The culmination of these two sets forms the entire range of values for the test.

Both these regions are compliments of each other, i.e., W=AC

Where Ac is the set complimentary to A.

So, the probability of type II error can also be written as:

P(A) = P(WC)= 1-P(W)

For -0

The probability () =P(W) is a function of () is called the power function of the test.

We have:

() = the probability of type I error associated with , 0

() = 1 - the probability of type II error associated with , -0

The power function is used to judge the nature of the whole test.

Best Data Science Courses

Programs From Top Universities

upGrad's data science degrees offer an immersive learning experience. These data science certification courses are designed in collaboration with top universities, ensuring industry-relevant curriculum. Learners from our data science online classes gain insights into big data & ML technologies.

Data Science (0)

Filter

Loading...

upGrad Learner Support

Talk to our experts. We’re available 24/7.

text

Indian Nationals

1800 210 2020

text

Foreign Nationals

+918045604032

Disclaimer

  1. upGrad facilitates program delivery and is not a college/university in itself. Credits and credentials are awarded by the university. Please refer relevant terms and conditions before applying.

  2. Past record is no guarantee of future job prospects.