COURSES
MBAData Science & AnalyticsDoctorate Software & Tech AI | ML MarketingManagement
Professional Certificate Programme in HR Management and AnalyticsPost Graduate Certificate in Product ManagementExecutive Post Graduate Program in Healthcare ManagementExecutive PG Programme in Human Resource ManagementMBA in International Finance (integrated with ACCA, UK)Global Master Certificate in Integrated Supply Chain ManagementAdvanced General Management ProgramManagement EssentialsLeadership and Management in New Age BusinessProduct Management Online Certificate ProgramStrategic Human Resources Leadership Cornell Certificate ProgramHuman Resources Management Certificate Program for Indian ExecutivesGlobal Professional Certificate in Effective Leadership and ManagementCSM® Certification TrainingCSPO® Certification TrainingLeading SAFe® 5.1 Training (SAFe® Agilist Certification)SAFe® 5.1 POPM CertificationSAFe® 5.1 Scrum Master Certification (SSM)Implementing SAFe® 5.1 with SPC CertificationSAFe® 5 Release Train Engineer (RTE) CertificationPMP® Certification TrainingPRINCE2® Foundation and Practitioner Certification
Law
Job Linked
Bootcamps
Study Abroad
MS in Data AnalyticsMS in Project ManagementMS in Information TechnologyMasters Degree in Data Analytics and VisualizationMasters Degree in Artificial IntelligenceMBS in Entrepreneurship and MarketingMSc in Data AnalyticsMS in Data AnalyticsMS in Computer ScienceMaster of Science in Business AnalyticsMaster of Business Administration MS in Data ScienceMS in Information TechnologyMaster of Business AdministrationMS in Applied Data ScienceMaster of Business Administration | STEMMS in Data AnalyticsMaster of Business AdministrationMS in Information Technology and Administrative Management MS in Computer Science Master of Business Administration Master of Business Administration-90 ECTSMSc International Business ManagementMS Data Science Master of Business Administration MSc Business Intelligence and Data ScienceMS Data Analytics MS in Management Information SystemsMSc International Business and ManagementMS Engineering ManagementMS in Machine Learning EngineeringMS in Engineering ManagementMSc Data EngineeringMSc Artificial Intelligence EngineeringMPS in InformaticsMPS in Applied Machine IntelligenceMS in Project ManagementMPS in AnalyticsMS in Project ManagementMS in Organizational LeadershipMPS in Analytics - NEU CanadaMBA with specializationMPS in Informatics - NEU Canada Master in Business AdministrationMS in Digital Marketing and MediaMSc Sustainable Tourism and Event ManagementMSc in Circular Economy and Sustainable InnovationMSc in Impact Finance and Fintech ManagementMS Computer ScienceMBA in Technology, Innovation and EntrepreneurshipMSc Data Science with Work PlacementMSc Global Business Management with Work Placement MBA with Work PlacementMS in Robotics and Autonomous SystemsMS in Civil EngineeringMS in Internet of ThingsMSc International Logistics and Supply Chain ManagementMBA- Business InformaticsMSc International ManagementMBA in Strategic Data Driven ManagementMSc Digital MarketingMBA Business and MarketingMSc in Sustainable Global Supply Chain ManagementMSc Digital Business Analytics MSc in International HospitalityMSc Luxury and Innovation ManagementMaster of Business Administration-International Business ManagementMS in Computer EngineeringMS in Industrial and Systems EngineeringMaster in ManagementMSc MarketingMSc Global Supply Chain ManagementMS in Information Systems and Technology with Business Intelligence and Analytics ConcentrationMSc Corporate FinanceMSc Data Analytics for BusinessMaster of Business AdministrationMaster of Business AdministrationMaster of Business AdministrationMSc in International FinanceMSc in International Management and Global LeadershipMaster of Business AdministrationBachelor of BusinessBachelor of Business AnalyticsBachelor of Information TechnologyMaster of Business AdministrationMBA Business AnalyticsMSc in Marketing Analytics and Data IntelligenceMS Biotechnology Management and EntrepreneurshipMSc in Luxury and Fashion ManagementMaster of Business Administration (90 ECTS)Bachelor of Business Administration (180 ECTS)B.Sc. Computer Science (180 ECTS) MSc in International Corporate Finance MSc in Sustainable Luxury and Creative IndustriesMSc Digital MarketingMSc Global Supply Chain Management (PGMP)MSc Marketing (PGMP)MSc Corporate Finance (PGMP)MSc Data Analytics for Business (PGMP)MS Business AnalyticsMaster of Business AdministrationMS Quantitative FinanceMS Fintech ManagementMS Business Analytics PGMPState University of New York Bachelors Program - STEM
For College Students

ANOVA and F-Test In Linear Regression

$$/$$

In this segment, you will learn about ANOVA and F-test. If you don't understand these two concepts in depth, that is fine. You just need to know the basics of these two techniques to understand the model fit.

 

You looked at the t-test previously. It is also known as the one-sample t-test.

 

Let’s now focus on the two sample t-test. As the name suggests, this test is conducted on two sets of sample data, in order to compare the means of two samples.

 

Note: It should be noted that a two-sample t-test can be performed for multiple statistical parameters, but you are going to focus only on the two-sample t-test for means, where the standard deviations of both the samples are unknown.

 

The formula for the two-sample t-test is —

 

                                                                            

 

where,

                                                               df = smaller of  or 

 

Suppose that you want to come up with a hypothesis test regarding the mean age difference between men and women. You can use the two-sample t-test in such a case.

 

Two sample t-tests can validate a hypothesis containing only two groups at a time. For samples involving three or more groups, the t-test becomes tedious as you have to perform the tests for each combination of the groups. Also, type 1 error increases in this process. You use ANOVA in such cases.

 

Analysis of variance (ANOVA) can determine whether the means of three or more groups are different or not. ANOVA uses F-tests to statistically test the equality of means.

 

To understand how ANOVA is applied, let’s go over a simple case:  

 

A test was conducted in a workplace, and the feedback on the three e-commerce platforms was recorded in a dataset. Following is the dataset:
 

Amazon FlipkartSnapdeal
7.575
8.59.57.5
6108.5
1063
8.57.56
88.55
8107
66.5 
9.56.5 
109 
6.510 

 

To begin with, create a null hypothesis () for your ANOVA test. In this case, your null hypothesis will be — “All the platforms are equally popular”. The alternate hypothesis (), thus, becomes “At least one of the platforms has different popularity from the rest”.

 

Represent this information as —

 

H0:

where k is the number of different populations or groups or treatment levels, in your case, it’s 3.

 

By writing this, you suggest that the ‘mean’ of the different populations will be the same, which is your null hypothesis. If the statement above is proved at the end of your test, it will imply that all the platforms are equally popular. If not, then you accept your alternate hypothesis (HA).

 

There are a couple of things you should keep a note of while using ANOVA:

  • You must be thinking that it is a fairly simple problem for us. You calculate the means of the three groups and compare them to accept or reject the null hypothesis. Unfortunately, it’s not that simple after all because your hypothesis considers the mean of a particular ‘population’. But your dataset only has a ‘sample’ of that ‘population’. So, the mean you calculate will be of the sample and not of the population. For instance, in your case, the people who have given a feedback for, say, Amazon, are not the only ones who have used Amazon. There are many others too. But they form your sample, which is why the mean you will calculate here will be of this ‘sample’ and not of the ‘population’.

  • Another question that might come up in your mind is — why is the process called analysis of ‘variance’ when you are comparing ‘means’? This is because the math you will use later in the process will require the concept of variance to study the means of the groups. It will tell you how the means vary or differ.

 

Now, as variance is the central idea behind ANOVA, let’s briefly revisit the topic:

  1. Variance is the average squared deviation of a data point from the distribution mean. The distance between the sample mean and each data point is measured and squared. Then, you add it together and take the average. The formula is —

                                                                                 

 

Here,  represents the variance, x represents the sample data points,  represents the sample mean, and n represents the number of sample points.

  1. If you momentarily ignore the average part, what you are left with is the ‘sum of squares’. So, the sum of squares is the variance without finding the average of the sum of squared deviations.

 

Sum of squares is given by — 

                                                                             

 

Let’s have a look at the dataset again:

 

Amazon FlipkartSnapdeal
7.575
8.59.57.5
6108.5
1063
8.57.56
88.55
8107
66.5 
9.56.5 
109 
6.510 

 

 

We have talked about two kinds of calculations that you have to make in accordance with the variance. ‘Sum of squares between’ accounts for the variation between the groups, and ‘sum of squares within’ accounts for the variation within a group. The total sum of squares is the sum of all the variations that are there, and it gives us the deviation of each observation from the grand mean of the dataset.

 

To understand this more clearly, let’s look at your case:

  • SSB represents the variation of the mean feedback of a company, say Flipkart, from the grand mean of all the feedbacks.

  • SSW represents the variation of all the feedbacks in a company from the mean of its feedback.

  • TSS represents the variation of all the feedbacks in your dataset from the grand mean.

 

Let’s look at the basic formula you will be using:

Total sum of squares = Sum of squares between + sum of squares within the group

            (TSS)                                       (SSB)                                            (SSW)

 

 

 

 

(source: https://www.easycalculation.com/formulas/eta-squared-formula.html)


Here, ‘i’ represents the observations in a group or a treatment level, and ‘j’ refers to a particular group or a treatment level. In your case, ‘i’ will represent all the feedbacks of, say, Amazon, and ‘j’ refers to a particular group and can be Amazon, Flipkart, or Snapdeal.

 

 

: It represents the number of observations in a group. In your case, it will be the number of times feedback is received for, say, Snapdeal.

 

: It represents all the observations that have been recorded in the dataset.

 

: It represents the mean of a particular group or treatment.

 

: It represents the grand mean of all the observations. In your case, this will be the mean of all the feedbacks that have been collected.


 

Let’s now calculate the aforementioned measures for your data:

 

 

After you have calculated this data, the next step is to analyse the ANOVA table:                     

 

 

 

You have already calculated the sum of squares. Now, ‘df’ here represents the degrees of freedom.

  • Between groups, df = number of groups - 1

  • Within a group, df = total number of observations - the number of groups

  • Degrees of freedom for the complete dataset = total number of observations - 1    

 

Let’s calculate the degrees of freedom for your observations:

 

Source of VariationSum of SquaresDOFMean SquareF Ratio
Between24.41849532  
Within66.4090909126  
Total90.8275862128  

 

Mean Square = Sum of squares/df

 

Using this formula, you can find the mean square between the groups as well as within the group.

 

Let’s calculate the mean squares for your calculations

 

Source of VariationSum of SquaresDOFMean SquareF Ratio
Between24.4184953212.20924765 
Within66.40909091262.554195804 
Total90.8275862128  

 

Before moving any further, let’s first see what F-test is.

F-tests are named after the test statistic F, which was named in honor of Sir Ronald Fisher. The F-statistic is simply a ratio of two variances.

To use the F-test to determine whether group means are equal, all you need to do is include the correct variances in the ratio. In one-way ANOVA, the F-statistic is given by this ratio:

F = Variation between the sample means/variation within the samples

   = (MSB/MSW)

 

Now you have to calculate the critical F value using the F-distribution table for a given significance level and compare it with your calculated F value. In your case, p < 0.05. The table looks like this:

 

 

 

Where the rows represent 'Degrees of Freedom Denominator' and the columns represent 'Degrees of Freedom Numerator'.

 

Degrees of freedom of the numerator will be that of the df between the groups.

Degrees of freedom of the denominator will be that of the df within the group.

 

The intersection will give us the critical F value. Now, you compare your calculated F value with the critical F value.

  • If calculated F < critical F, you will accept the null hypothesis.

  • If calculated F > critical F, you will reject the null hypothesis.

                         

Let’s now do the final calculations in your case to see whether Amazon, Flipkart and Snapdeal are equally popular:

 

 

Your F critical value is 3.3690 and your calculated value comes out to be 4.78. Therefore, you will reject the null hypothesis and accept the alternate hypothesis. Therefore, Amazon, Flipkart, and Snapdeal are not equally popular.

 

Please note that if you don't understand these two concepts in depth, that is fine. You just need to know the basics of these two techniques to understand the model fit.


 

FREQUENTLY ASKED QUESTIONS (FAQ)