COURSES
MBAData Science & AnalyticsDoctorate Software & Tech AI | ML MarketingManagement
Professional Certificate Programme in HR Management and AnalyticsPost Graduate Certificate in Product ManagementExecutive Post Graduate Program in Healthcare ManagementExecutive PG Programme in Human Resource ManagementMBA in International Finance (integrated with ACCA, UK)Global Master Certificate in Integrated Supply Chain ManagementAdvanced General Management ProgramManagement EssentialsLeadership and Management in New Age BusinessProduct Management Online Certificate ProgramStrategic Human Resources Leadership Cornell Certificate ProgramHuman Resources Management Certificate Program for Indian ExecutivesGlobal Professional Certificate in Effective Leadership and ManagementCSM® Certification TrainingCSPO® Certification TrainingLeading SAFe® 5.1 Training (SAFe® Agilist Certification)SAFe® 5.1 POPM CertificationSAFe® 5.1 Scrum Master Certification (SSM)Implementing SAFe® 5.1 with SPC CertificationSAFe® 5 Release Train Engineer (RTE) CertificationPMP® Certification TrainingPRINCE2® Foundation and Practitioner Certification
Law
Job Linked
Bootcamps
Study Abroad
MS in Data AnalyticsMS in Project ManagementMS in Information TechnologyMasters Degree in Data Analytics and VisualizationMasters Degree in Artificial IntelligenceMBS in Entrepreneurship and MarketingMSc in Data AnalyticsMS in Data AnalyticsMS in Computer ScienceMaster of Science in Business AnalyticsMaster of Business Administration MS in Data ScienceMS in Information TechnologyMaster of Business AdministrationMS in Applied Data ScienceMaster of Business Administration | STEMMS in Data AnalyticsMaster of Business AdministrationMS in Information Technology and Administrative Management MS in Computer Science Master of Business Administration Master of Business Administration-90 ECTSMSc International Business ManagementMS Data Science Master of Business Administration MSc Business Intelligence and Data ScienceMS Data Analytics MS in Management Information SystemsMSc International Business and ManagementMS Engineering ManagementMS in Machine Learning EngineeringMS in Engineering ManagementMSc Data EngineeringMSc Artificial Intelligence EngineeringMPS in InformaticsMPS in Applied Machine IntelligenceMS in Project ManagementMPS in AnalyticsMS in Project ManagementMS in Organizational LeadershipMPS in Analytics - NEU CanadaMBA with specializationMPS in Informatics - NEU Canada Master in Business AdministrationMS in Digital Marketing and MediaMSc Sustainable Tourism and Event ManagementMSc in Circular Economy and Sustainable InnovationMSc in Impact Finance and Fintech ManagementMS Computer ScienceMBA in Technology, Innovation and EntrepreneurshipMSc Data Science with Work PlacementMSc Global Business Management with Work Placement MBA with Work PlacementMS in Robotics and Autonomous SystemsMS in Civil EngineeringMS in Internet of ThingsMSc International Logistics and Supply Chain ManagementMBA- Business InformaticsMSc International ManagementMBA in Strategic Data Driven ManagementMSc Digital MarketingMBA Business and MarketingMSc in Sustainable Global Supply Chain ManagementMSc Digital Business Analytics MSc in International HospitalityMSc Luxury and Innovation ManagementMaster of Business Administration-International Business ManagementMS in Computer EngineeringMS in Industrial and Systems EngineeringMaster in ManagementMSc MarketingMSc Global Supply Chain ManagementMS in Information Systems and Technology with Business Intelligence and Analytics ConcentrationMSc Corporate FinanceMSc Data Analytics for BusinessMaster of Business AdministrationMaster of Business AdministrationMaster of Business AdministrationMSc in International FinanceMSc in International Management and Global LeadershipMaster of Business AdministrationBachelor of BusinessBachelor of Business AnalyticsBachelor of Information TechnologyMaster of Business AdministrationMBA Business AnalyticsMSc in Marketing Analytics and Data IntelligenceMS Biotechnology Management and EntrepreneurshipMSc in Luxury and Fashion ManagementMaster of Business Administration (90 ECTS)Bachelor of Business Administration (180 ECTS)B.Sc. Computer Science (180 ECTS) MSc in International Corporate Finance MSc in Sustainable Luxury and Creative IndustriesMSc Digital MarketingMSc Global Supply Chain Management (PGMP)MSc Marketing (PGMP)MSc Corporate Finance (PGMP)MSc Data Analytics for Business (PGMP)MS Business AnalyticsMaster of Business AdministrationMS Quantitative FinanceMS Fintech ManagementMS Business Analytics PGMPState University of New York Bachelors Program - STEM MSc Business Intelligence and Data Science (PGMP)MSc International Logistics and Supply Chain Management ( PGMP)MSc International Management (PGMP)MSc Psychology & Management (PGMP)MSc Finance (PGMP)State University of New York Bachelor's Year 1 Program
For College Students

Building a Linear Model - Step by Step Guide

$$/$$

Since 'TV' is very strongly correlated to 'Sales', let's first build a simple linear regression model with ‘TV’ as the predictor variable.

$$/$$

The first important step before building a model is to perform the test-train split. To split the model, you use the train_test_split function.

 

from sklearn.model_selection import train_test_split
X_train_lm, X_test_lm, y_train_lm, y_test_lm = train_test_split(X, y, train_size = 0.7, test_size = 0.3, random_state = 100)

From now on, you will always use the SKLearn library to perform a test-train split before fitting a model on any data.

$$/$$

After you import the statsmodel.api, you can create a simple linear regression model in just few steps.

 

import statsmodels.api as sm
X_train_sm = sm.add_constant(X_train)
lr = sm.OLS(y_train, X_train_sm)
lr_model=lr.fit()

Here, OLS stands for Ordinary Least Squares, which is the method that 'statsmodels' use to fit the line. You use the command 'add_constant' so that statsmodels also fits an intercept. If you don't use this command, it will fit a line passing through the origin by default.

$$/$$

Now, let's take a look again at the summary statistics that was outputted by the model.

 

Summary Statistics

Now, let's take a look at the summary statistics that was outputted by the model again.

Summary Statistic
$$/$$

F-statistic

You were introduced to a new term named F-statistic and Prob(F-statistic). Now, recall that in the last segment, you did a hypothesis test for beta to determine whether or not the coefficient  outputted by the model was significant or not. Now, F-statistic is similar in the sense that now instead of testing the significance of each of the betas, it tells you whether the overall model fit is significant or not. This parameter is examined because many a time it happens that even though all of your betas are significant, but your overall model fit might happen just by chance.

 

The heuristic is similar to what you learnt in the normal p-value calculation as well. If the 'Prob (F-statistic)' is less than 0.05, you can conclude that the overall model fit is significant. If it is greater than 0.05, you might need to review your model as the fit might be by chance, i.e. the line may have just luckily fit the data. In the image above, you can see that the p-value of the F-statistic is 1.52e-52  which is practically a zero value. This means that the model for which this was calculated is definitely significant since it is less than 0.05.

 

This will be more appreciable when you study multiple linear regression since there you have a lot of betas for the different predictor variables and thus there it is very helpful in determining if all the predictor variables together as a whole are significant or not or simply put, it tells you whether the model fit as a whole is significant or not. 

 

R-squared

Like you studied earlier as well, R-squared value tells you exactly how much variance in the data has been explained by the model. In our case, the R-squared is about 0.816 which means that the model is able to explain 81.6% of the variance which is pretty good.

 

Coefficients and p-values:

The p-values of the coefficients (in this case just one coefficient for TV) tell you whether the coefficient is significant or not. In this case, the coefficient of TV came out to be 0.0545 with a standard error of about 0.002. Thus, you got a t-value of 24.722 which lead to a practically zero p-value. Hence, you can say that your coefficient is indeed significant. 

 

Apart from this, the summary statistics outputs a few more metrics which are not of any use as of now. But you'll learn about some more of them in multiple linear regression.

 

Let's see how the model actually looks by plotting it.

$$/$$

You visualised the predicted regression line on the scatter plot of the training data which is one of the things you should do as a part of model evaluation.

$$/$$

Additional Reading

The calculation of F-statistic is a complex task and is not required. Hence it is out of the scope of this course. But interested students can check out this link.