In the previous segment, you learnt about the five simple patterns through which you can classify any insight into. Now you’ll learn the methods through which you can actually generate those insights.
As explained by Anand above, the five patterns are sufficient in categorising the insights. But, the dataset that you might have may not necessarily be in the format where you would be applying the patterns directly. Therefore, you also need to learn a few techniques to analyse your data so that you can apply the five patterns on them and start generating insights. These analyses can be categorised into two types: exploratory data analysis and hypothesis-driven analysis. Furthermore, these analysis techniques utilise 2 methods of data manipulation in order to extract insights - creating new columns and reducing the number of rows.
Let’s start with the first method through which you can analyse the data to extract insights - by deriving new columns
The first method through which you can analyse the data is by deriving new columns. For example, if you have a dataset, where only the revenue and cost information is available, you can go ahead and create a separate column where you would be calculating the profit and applying the five patterns on them. As discussed in the video, some of the most common ways of adding new columns to the data are:
These methods may not seem exhaustive as a variety of analysis procedures can be used to derive new columns. But they more or less encompass a standard way through which you should proceed once you have the data with you to check for new insights.
Another nifty way of creating new columns and the one which is heavily used right now is through the use of machine learning. Watch the following video to understand the different ways in which we can derive new information from a given dataset.
As explained above, the various machine learning techniques that you can use to create new columns are as follows.
(Note: You would be learning some of these techniques later in the course)
To summarise, there are broadly two techniques through which you can create new columns - By performing calculations and through models- either statistical or machine learning. The statistical models often take a sample of the original data and infer from it the behaviour of the entire population whereas in machine learning models you run algorithms on a set of predefined data called "train data" to formulate the model and then run it again on another set of data called "test data" to test the model's accuracy and precision. If some of the terms in the previous sentence seem like some kind of jargon to you, then don't worry. You'll be learning these concepts in detail in the next two courses. For the time being, a cursory understanding of the difference is sufficient.
After deriving these new columns, you can go ahead and apply the five patterns that you learnt earlier to generate insights. In the next segment, you’ll learn about the other way of analysing the given data- that is through summarising the rows.