Pandas – How to remove DataFrame columns with only one distinct value?

Let’s create a Pandas DataFrame that contains features with distinct values.

import pandas as pd
import numpy as np

data = {'Student_Id':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'Name':['Mark', 'Juli', 'Alexa', 'Kevin', 'John', 'Devid', 'Mark', 'Michael', 'Johnson', 'Kevin'],
        'Age':[27, 31, 45, np.nan, 34, 48, np.nan, 31, np.nan, 27],
        'Location':['USA', 'UK', np.nan, 'France', 'Germany', 'USA', 'Germany', np.nan, 'USA', 'Italy']}
df = pd.DataFrame(data)
df.head(10)

Output:

   Student_Id     Name   Age Location
0           1     Mark  27.0      USA
1           2     Juli  31.0       UK
2           3    Alexa  45.0      NaN
3           4    Kevin   NaN   France
4           5     John  34.0  Germany
5           6    Devid  48.0      USA
6           7     Mark   NaN  Germany
7           8  Michael  31.0      NaN
8           9  Johnson   NaN      USA
9          10    Kevin  27.0    Italy

Here, Student_Id  column contains all distinct values. This feature won’t useful for making the prediction of the target variable as it doesn’t provide any useful insights of the data. Hence, It is better to remove this kind of features.

# Function to return the distinct value columns of a given DataFrame
def remove_distinct_value_features(df):
    return [e for e in df.columns if df[e].nunique() == df.shape[0]]
drop_col = remove_distinct_value_features(df)
drop_col

Output:

['Student_Id']

Let’s remove distinct value columns and create new DataFrame.

# Create new DataFrame
new_df_columns = [e for e in df.columns if e not in drop_col]
new_df = df[new_df_columns]
new_df
      Name   Age Location
0     Mark  27.0      USA
1     Juli  31.0       UK
2    Alexa  45.0      NaN
3    Kevin   NaN   France
4     John  34.0  Germany
5    Devid  48.0      USA
6     Mark   NaN  Germany
7  Michael  31.0      NaN
8  Johnson   NaN      USA
9    Kevin  27.0    Italy

You can also remove columns using Pandas’ df.drop().

# This will drop the columns inplace.
df.drop(drop_col,axis=1,inplace=True)     # inplace=True

# This will create new DataFrame, but the original DataFrame remain same
new_df = df.drop(drop_col,axis=1)         # default inplace=False

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Python Pandas Tutorials

Pandas – How to remove DataFrame columns with constant (same) values?

Pandas – Count unique values for each column of a DataFrame

Pandas – Count missing values (NaN) for each columns in DataFrame

Pandas – MultiIndex

Pandas – Applymap

Pandas – Apply

Pandas – Map

Pandas – Missing Data

Difference between Merge, join, and concatenate

Pandas – Join

pandas : Handling Duplicate Data

Pandas : Handling Categorical Data

Pandas : Data Types

Appending a row to DataFrame

Python Pandas – Merge

Python Pandas – Concatenation & append

Python Pandas – GroupBy

Python Pandas – Visualization

Python Pandas – Options and Customization

Python Pandas – Descriptive Statistics

Python Pandas – Basic functions

Python Pandas – DataFrame

Python Pandas – Series

Python Pandas – Introduction