Pandas – How to remove DataFrame columns with only one distinct value?

Let’s create a Pandas DataFrame that contains features with distinct values.

import pandas as pd
import numpy as np

data = {'Student_Id':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'Name':['Mark', 'Juli', 'Alexa', 'Kevin', 'John', 'Devid', 'Mark', 'Michael', 'Johnson', 'Kevin'],
        'Age':[27, 31, 45, np.nan, 34, 48, np.nan, 31, np.nan, 27],
        'Location':['USA', 'UK', np.nan, 'France', 'Germany', 'USA', 'Germany', np.nan, 'USA', 'Italy']}
df = pd.DataFrame(data)
df.head(10)

Output:

   Student_Id     Name   Age Location
0           1     Mark  27.0      USA
1           2     Juli  31.0       UK
2           3    Alexa  45.0      NaN
3           4    Kevin   NaN   France
4           5     John  34.0  Germany
5           6    Devid  48.0      USA
6           7     Mark   NaN  Germany
7           8  Michael  31.0      NaN
8           9  Johnson   NaN      USA
9          10    Kevin  27.0    Italy

Here, Student_Id  column contains all distinct values. This feature won’t useful for making the prediction of the target variable as it doesn’t provide any useful insights of the data. Hence, It is better to remove this kind of features.

# Function to return the distinct value columns of a given DataFrame
def remove_distinct_value_features(df):
    return [e for e in df.columns if df[e].nunique() == df.shape[0]]
drop_col = remove_distinct_value_features(df)
drop_col

Output:

['Student_Id']

Let’s remove distinct value columns and create new DataFrame.

# Create new DataFrame
new_df_columns = [e for e in df.columns if e not in drop_col]
new_df = df[new_df_columns]
new_df
      Name   Age Location
0     Mark  27.0      USA
1     Juli  31.0       UK
2    Alexa  45.0      NaN
3    Kevin   NaN   France
4     John  34.0  Germany
5    Devid  48.0      USA
6     Mark   NaN  Germany
7  Michael  31.0      NaN
8  Johnson   NaN      USA
9    Kevin  27.0    Italy

You can also remove columns using Pandas’ df.drop().

# This will drop the columns inplace.
df.drop(drop_col,axis=1,inplace=True)     # inplace=True

# This will create new DataFrame, but the original DataFrame remain same
new_df = df.drop(drop_col,axis=1)         # default inplace=False

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Python Pandas Tutorials