Welcome to the first session on inferential statistics. This will be a very interactive session, with a lot of questions that will compel you to think about a concept, helping you explore it more actively.
So, let’s get started.
Recall the original question: In the long run (i.e., if it is played a lot of times), is this game profitable for the players or for the house? Or, will everybody break even in the long run?
Recall that we established a three-step process for answering this question:
Find all the possible combinations.
Find the probability of each combination.
Use the probabilities to estimate the profit/loss per player.
We have completed step 1, which involves finding all the possible combinations. Now, let’s proceed to step 2, which involves finding the probability of each combination. What are the steps involved in finding the probability? Let’s hear more from Professor Tricha on this.
So, the random variable X converts the outcomes of experiments to measurable values.
For example, let’s say as a data analyst at a bank, you are trying to find out which of the customers will default on their loan, i.e., stop paying their loans. Based on some data, you have been able to make the following predictions:
Customer Number | Yearly Income (in ₹) | Amount of Loan Due (in ₹) | Number of Dependants | Default Prediction (Yes/No) |
1 | 10 lakh | 75 lakh | 3 | Yes |
2 | 15 lakh | 50 lakh | 2 | No |
3 | 20 lakh | 40 lakh | 1 | No |
Now, instead of processing the yes/no response, it will be much easier if you define a random variable X to indicate whether the customer is predicted to default or not. The values will be assigned according to the following rule:
X = 1, if the customer defaults;
X = 0, if the customer does not default.
Now, the data changes to the following:
Customer Number | Yearly Income (in ₹) | Amount of Loan Due (in ₹) | Number of Dependants | X (Random Variable) |
1 | 10 lakh | 75 lakh | 3 | 1 |
2 | 15 lakh | 50 lakh | 2 | 0 |
3 | 20 lakh | 40 lakh | 1 | 0 |
Now, in this form, the table is entirely quantified, i.e., converted to numbers. Now that the data is entirely in quantitative terms, it becomes possible to perform a number of different kinds of statistical analyses on it.