Data Cleaning plays a significant role in machine learning. Data Cleaning is a process of removing or modifying unnecessary data like redundant data, incomplete information, irrelevant data from a dataset. These type of data harm to a machine learning algorithm. So, It’s better to clean these data to make it better using several techniques.
Data Cleaning is not just detecting the bad data and removing it from the dataset, but also correcting the bad data and get better results. Data Correcting is itself is a very challenging task. The main goal of the Data Cleaning is to make data standardize and consistent to allow Machine Learning algorithm to analyze data easily.
Here, I have used Kaggle’s competition dataset for illustration.
Link: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview
My Github Link: https://github.com/bhavikapanara/Data-Cleaning