In the previous session, you learnt how the different types of data visualisation tools can help you to draw specific information about data, such as the response of a marketing campaign.
Now, let us begin the segment with a situation for you to imagine: You have to audit two retail stores for cell phone sales over the past 80 days. You want to compare the performance of the two stores in terms of the overall units sold. The histograms below plot the number of phones sold by two stores in a city.
Do you think that the information presented by the histograms is enough? No, Data visualisation alone would not suffice in this situation.
Would it not be easier to compare the performance of the two stores in terms of their sales performance if you simply had to compare two numbers? Well, in statistical terms, you refer to this as measuring and comparing the central tendency metric of each sample.
Now, in the forthcoming video, you will listen to Thomas as he shares his views on the measures of central tendency.
So, in the video, you learnt from Thomas the importance of using central tendencies for representing a set of data. Further, you learnt how the mean, median and mode are computed.
You also learnt that depending upon the type of sample, you can choose one of the following as a measure of central tendency:
Mean
Mean is calculated by dividing the sum of all the data values by the total number of sample values.
It is commonly represented by the symbol 𝝁.
Median
Median is the value in the middle if you arrange the sample data in ascending order of frequency, from left to right.
For an odd number of values, we have one median.
For an even number of values, the median is the average of the two central values.
Mode
In a data set, the value with the highest frequency is the mode.
For qualitative data, it is not possible to measure the mean or median values, as there are no numerical values.
Thus, the variable with the highest frequency is considered the measure of central tendency in such cases.
Now, in the upcoming video, Thomas will show how to calculate the mean and median of the delivery times of two delivery partners of a kid’s apparel company. The data set used is that of a kid’s apparel company.
The dataset can be downloaded from the below link. Please save this dataset as it will be used in future segments of this course.
So, from the example in the video, you learnt how you can calculate the mean and median measures using Excel:
Mean can be calculated using the Excel function AVERAGE(A1: A20) if the data is distributed over cells A1: A20 in the Excel workbook.
Median can be calculated using the Excel function MEDIAN(A1: A20) if the data is distributed over cells A1: A20 in the Excel workbook.
Not covered in the video, but for your knowledge, we would like to share that the mean is used in applications where it is permissible for high or low numbers to affect the value, especially if you would want to use that value as a benchmark, for example, the average number of minutes taken for commuting to work.
The median is used when you want to have a clearer idea of the 'middle group' in a data set. For example, the median of salaries offered to graduates for a certain position.
The mode can be used if you are looking for a number that contains the value which is represented by the maximum number of data points in a data set. For example, if 5 kids have 2,2,2,3,4 candies respectively, the mode of this data set is '2'.
Now let us assume we know the mean value of a data set, then can we say that the data points in that data set are near the mean value? It may or may not be true. For example, let us consider two simple groups of data:
Group 1: 5, 30, 20, 5
Group 2: 15, 15, 15, 15
The mean of Group 1 is 15 and the mean of Group 2 is 15 as well. Although the mean is the same for both the groups, by looking at the data, you can see that the values of the individual data points differ from the value of the mean.
The measures of dispersion become important in cases in which two data sets may have the same mean but their data values differ from the mean value.
In cases where outliers are present in one end of a dataset (i.e. concentrated), the mean and median values of that dataset can hugely differ. But if there were outliers spread evenly in the dataset, then, maybe the mean and median values may not differ much.
The measures of dispersion help with measuring the dispersion or the scatter of the data points from the central tendency value. You will learn more about the measures of dispersion in the next segment.