In the previous segment, you learnt how to standardise values. When standardising values, you do not really pay attention to the validity of the actual values of the variables. This is what we will discuss now as you learn how to fix invalid values.
A data set can contain invalid values in various forms. Some of the values could be truly invalid, e.g., a string 'tr8ml' in a column containing mobile numbers would make no sense and, hence, should be removed. Similarly, a height of 11 feet would be an invalid value in a set containing heights of children.
On the other hand, some invalid values can be corrected. For example, a numeric value with a data type of string could be converted to its original numeric type.
Let's gain more insights into fixing invalid values.
If you have an invalid value problem and you do not know what accurate values could replace the invalid values, it is recommended that you treat these values as missing. For example, in the case of a string 'tr8ml' in a Contact column, it is recommended to remove the invalid value and treat it as a missing value.
Let’s summarise what you learnt about fixing invalid values. You could use this as a checklist for future data cleaning exercises.
In the next lecture, you will learn how to filter data for the ease of analysis.