What is Data Cleaning?

Data Cleaning plays a significant role in machine learning. Data Cleaning is a process of removing or modifying unnecessary data like redundant data, incomplete information, irrelevant data from a dataset. These type of data harm to a machine learning algorithm. So, It’s better to clean these data to make it better using several techniques. 

Data Cleaning is not just detecting the bad data and removing it from the dataset, but also correcting the bad data and get better results. Data Correcting is itself is a very challenging task. The main goal of the Data Cleaning is to make data standardize and consistent to allow Machine Learning algorithm to analyze data easily.

Here, I have used Kaggle’s competition dataset for illustration.

Link: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview

My Github Link: https://github.com/bhavikapanara/Data-Cleaning

Leave a Reply

Your email address will not be published. Required fields are marked *

Data Preprocessing Tutorials

Target Encoding for categorical feature

Handle Data Outlier in Machine Learning

Feature Preprocessing for Numerical Features

Handle the Datetime and coordinates Features

Different Label Encoding Methods for Categorical Features

Handle Missing Data in Python