Features are the distinctive attributes of any data point. For example, you have a dataset containing thousands of movies. Each movie is represented by its genre, actors, director, producer etc. The attributes genre, actors, director, producer are the features of this dataset.
In the upcoming video, Ujjyaini will share the importance of features.
Why is it important to engineer features for your dataset? Sometimes genre, actors, director, producer are not enough to represent a movie data. You have to create more features to extract relevant information.
Consider the case of an e-commerce website. If you have in-time and out-time as features, the time spent on the platform can be calculated by subtracting the two features. The time spend is now an engineered feature - a feature that is created using existing features.
Data cleaning, feature engineering all fall under the broad category of data preparation. Once the data is prepared, you do univariate data analysis on it. The univariate data analysis is used to analyse a single variable. You want to analyse iPhone sales in India over the last two years. You will look for the pattern, look for any deviations from the expected curve and make assumptions.
However, in any real dataset, bivariate data analysis is preferred as the features generally have a fair bit of correlation. Consider the two variables (features) - discount and iPhone sales. I want to understand if and how these two are related, i.e. if I give more discount does the iPhone sales increase. Another could be to look for Samsung sales and iPhone sales. Do they move in the same direction or the opposite direction? There are a lot of interesting questions you can ask on any dataset and get the answers using bivariate analysis.
Here, you can read more about Feature Engineering and Data Preparation.