In the last session, you learnt how to perform segmented univariate analysis, e.g., how gender or father’s education impacts a student’s percentage in science, maths and reading. But you can also use data for several other purposes. Often, you can get interesting insights by analysing pairs of continuous variables at a time. For example, how the sales figures depend on marketing spends, or, for that matter, how any two continuous variables depend on each other. But is there a way or concept to identify the relationship between two variables?
Let’s start with the bivariate analysis on pairs of continuous variables to answer these questions.
To summarise, correlation is a number between -1 and 1, which quantifies the extent to which two variables ‘correlate’ with each other.
If one variable increases as the other increases, the correlation is positive.
If one variable decreases as the other increases, the correlation is negative
If one variable stays constant as the other varies, the correlation is zero.
In general, a positive correlation means that two variables will increase together and decrease together, for example, an increase in rain is accompanied by an increase in humidity. A negative correlation means that if one variable increases the other decreases, for example, in some cases, as the price of a commodity decreases, its demand increases.
A perfect positive correlation means that the correlation coefficient is exactly 1. This implies that as one variable moves either up or down, the other one moves in the same direction with the fixed proportion. Similarly, a perfect negative correlation means that two variables move in opposite directions with the fixed proportion, while a zero correlation implies no relationship at all.
So, now you have an idea of how correlation is useful for deriving useful insights from continuous variables. Usage of correlation is widely prevalent in industry nowadays, and there are various challenges faced by organisations when it comes to representing the way of calculating the correlation for a large number of variables at a time. And this is what you will learn in the next lecture: How industries solve business problems just by using correlation analysis.