COURSES
MBAData Science & AnalyticsDoctorate Software & Tech AI | ML MarketingManagement
Professional Certificate Programme in HR Management and AnalyticsPost Graduate Certificate in Product ManagementExecutive Post Graduate Program in Healthcare ManagementExecutive PG Programme in Human Resource ManagementMBA in International Finance (integrated with ACCA, UK)Global Master Certificate in Integrated Supply Chain ManagementAdvanced General Management ProgramManagement EssentialsLeadership and Management in New Age BusinessProduct Management Online Certificate ProgramStrategic Human Resources Leadership Cornell Certificate ProgramHuman Resources Management Certificate Program for Indian ExecutivesGlobal Professional Certificate in Effective Leadership and ManagementCSM® Certification TrainingCSPO® Certification TrainingLeading SAFe® 5.1 Training (SAFe® Agilist Certification)SAFe® 5.1 POPM CertificationSAFe® 5.1 Scrum Master Certification (SSM)Implementing SAFe® 5.1 with SPC CertificationSAFe® 5 Release Train Engineer (RTE) CertificationPMP® Certification TrainingPRINCE2® Foundation and Practitioner Certification
Law
Job Linked
Bootcamps
Study Abroad
MS in Data AnalyticsMS in Project ManagementMS in Information TechnologyMasters Degree in Data Analytics and VisualizationMasters Degree in Artificial IntelligenceMBS in Entrepreneurship and MarketingMSc in Data AnalyticsMS in Data AnalyticsMS in Computer ScienceMaster of Science in Business AnalyticsMaster of Business Administration MS in Data ScienceMS in Information TechnologyMaster of Business AdministrationMS in Applied Data ScienceMaster of Business Administration | STEMMS in Data AnalyticsMaster of Business AdministrationMS in Information Technology and Administrative Management MS in Computer Science Master of Business Administration Master of Business Administration-90 ECTSMSc International Business ManagementMS Data Science Master of Business Administration MSc Business Intelligence and Data ScienceMS Data Analytics MS in Management Information SystemsMSc International Business and ManagementMS Engineering ManagementMS in Machine Learning EngineeringMS in Engineering ManagementMSc Data EngineeringMSc Artificial Intelligence EngineeringMPS in InformaticsMPS in Applied Machine IntelligenceMS in Project ManagementMPS in AnalyticsMS in Project ManagementMS in Organizational LeadershipMPS in Analytics - NEU CanadaMBA with specializationMPS in Informatics - NEU Canada Master in Business AdministrationMS in Digital Marketing and MediaMSc Sustainable Tourism and Event ManagementMSc in Circular Economy and Sustainable InnovationMSc in Impact Finance and Fintech ManagementMS Computer ScienceMBA in Technology, Innovation and EntrepreneurshipMSc Data Science with Work PlacementMSc Global Business Management with Work Placement MBA with Work PlacementMS in Robotics and Autonomous SystemsMS in Civil EngineeringMS in Internet of ThingsMSc International Logistics and Supply Chain ManagementMBA- Business InformaticsMSc International ManagementMBA in Strategic Data Driven ManagementMSc Digital MarketingMBA Business and MarketingMSc in Sustainable Global Supply Chain ManagementMSc Digital Business Analytics MSc in International HospitalityMSc Luxury and Innovation ManagementMaster of Business Administration-International Business ManagementMS in Computer EngineeringMS in Industrial and Systems EngineeringMaster in ManagementMSc MarketingMSc Global Supply Chain ManagementMS in Information Systems and Technology with Business Intelligence and Analytics ConcentrationMSc Corporate FinanceMSc Data Analytics for BusinessMaster of Business AdministrationMaster of Business AdministrationMaster of Business AdministrationMSc in International FinanceMSc in International Management and Global LeadershipMaster of Business AdministrationBachelor of BusinessBachelor of Business AnalyticsBachelor of Information TechnologyMaster of Business AdministrationMBA Business AnalyticsMSc in Marketing Analytics and Data IntelligenceMS Biotechnology Management and EntrepreneurshipMSc in Luxury and Fashion ManagementMaster of Business Administration (90 ECTS)Bachelor of Business Administration (180 ECTS)B.Sc. Computer Science (180 ECTS) MSc in International Corporate Finance MSc in Sustainable Luxury and Creative IndustriesMSc Digital MarketingMSc Global Supply Chain Management (PGMP)MSc Marketing (PGMP)MSc Corporate Finance (PGMP)MSc Data Analytics for Business (PGMP)MS Business AnalyticsMaster of Business AdministrationMS Quantitative FinanceMS Fintech ManagementMS Business Analytics PGMPState University of New York Bachelors Program - STEM MSc Business Intelligence and Data Science (PGMP)MSc International Logistics and Supply Chain Management ( PGMP)MSc International Management (PGMP)MSc Psychology & Management (PGMP)MSc Finance (PGMP)State University of New York Bachelor's Year 1 Program
For College Students

Nuances of Logistic Regression - Variable Transformation-II

$$/$$

So, creating dummy variables is one way of transforming variables. Let’s now move on to another technique commonly used for transforming variables — Weight of evidence (WOE) analysis.

$$/$$

So, to summarise, you learnt three important things in this lecture:

  1. Calculating woe values for fine binning and coarse binning

  2. The importance of woe for fine binning and coarse binning

  3. The usage of woe transformation

 

WOE can be calculated using the following equation:

 

 

Or, it can be expressed as: 

 


Once you've calculated woe values, it is also important to note that they should follow an increasing or decreasing trend across bins. If the trend is not monotonic, then you would need to compress the buckets/ bins (coarse buckets) of that variable and then calculate the WOE values again.

 

As mentioned in the lecture, there are two main advantages of WOE:

  1. WOE reflects group identity: This means it captures the general trend of distribution of good and bad customers. E.g. the difference between customers with 30% credit card utilisation and 45% credit card utilisation is not the same as the difference between customers with 45% credit card utilisation and customers with 60% credit card utilisation. This is captured by transforming the variable credit card utilisation using WOE.
  2. WOE helps you in treating missing values logically for both types of variables — categorical and continuous. E.g. in the credit card case, if you replace the continuous variable credit card utilisation with WOE values, you would replace all categories mentioned above (0%-45%, 45% - 60%, etc.) with certain specific values, and that would include the category "missing" as well, which would also be replaced with a WOE value.

Let’s also understand the positives and negatives of woe transformation from Hindol.

$$/$$

So, basically, the pros and cons of a WOE transformation are similar to dummy variables.

  1. Pros: The model becomes more stable because small changes in the continuous variables will not impact the input so much.
  2. Cons: You may end up doing some score clumping.

 

This is because when you are using WOE values in your model, you are doing something similar to creating dummy variables — you are replacing a range of values with an indicative variable. It is just that, instead of replacing it with a simple 1 or 0, which was not thought out at all, you are replacing it with a well thought out WOE value. Hence, the chances of undesired score clumping will be a lot less here.

 

Let's now move on to IV (Information Value), which is a very important concept.

$$/$$

So, information value can be calculated using the following expression:

 

 

Or it can expressed as:

 

 

It is an important indicator of predictive power.

 

Mainly, it helps you understand how the binning of variables should be done. The binning should be done such that the WOE trend across bins is monotonic — either increasing all the time or decreasing all the time. But one more thing that needs to be taken care of is that IV (infomation value) should be high.

 

Comprehension 1: WOE and Information Value Analysis

 

You are required to download the data set from below for answering the questions that follow:

$$/$$

In the attached file, there are three sheets. The first sheet contains three variables (Tenure, Second Contract and Churn) from the telecom data. The second sheet contains the distribution of the binned tenure variable. The third sheet contains the distribution of goods and bad information of the contract variable.