Pandas – How to remove DataFrame columns with constant (same) values?

Let’s create a Pandas DataFrame that contains features with distinct values.

import pandas as pd
import numpy as np

data = {'Student_Id':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'Name':['Mark', 'Juli', 'Alexa', 'Kevin', 'John', 'Devid', 'Mark', 'Michael', 'Johnson', 'Kevin'],
        'Age':[27, 31, 45, np.nan, 34, 48, np.nan, 31, np.nan, 27],
        'Location':['USA', 'UK', np.nan, 'France', 'Germany', 'USA', 'Germany', np.nan, 'USA', 'Italy'],
        'Program':['Master','Master','Master','Master','Master','Master','Master','Master','Master','Master'] }
df = pd.DataFrame(data)
df.head(10)

Output:

   Student_Id     Name   Age Location Program
0           1     Mark  27.0      USA  Master
1           2     Juli  31.0       UK  Master
2           3    Alexa  45.0      NaN  Master
3           4    Kevin   NaN   France  Master
4           5     John  34.0  Germany  Master
5           6    Devid  48.0      USA  Master
6           7     Mark   NaN  Germany  Master
7           8  Michael  31.0      NaN  Master
8           9  Johnson   NaN      USA  Master
9          10    Kevin  27.0    Italy  Master

Here, Program column contains same constant values. This feature won’t useful for making the prediction of the target variable as it doesn’t provide any useful insights of the data. Hence, It is better to remove this kind of features.

# Function to return the constant value columns of a given DataFrame
def remove_constant_value_features(df):
    return [e for e in df.columns if df[e].nunique() == 1]
drop_col = remove_constant_value_features(df)
drop_col

Output:

['Program']

Let’s create new DataFrame with non-constant value columns.

new_df_columns = [e for e in df.columns if e not in drop_col]
new_df = df[new_df_columns]
new_df
   Student_Id     Name   Age Location
0           1     Mark  27.0      USA
1           2     Juli  31.0       UK
2           3    Alexa  45.0      NaN
3           4    Kevin   NaN   France
4           5     John  34.0  Germany
5           6    Devid  48.0      USA
6           7     Mark   NaN  Germany
7           8  Michael  31.0      NaN
8           9  Johnson   NaN      USA
9          10    Kevin  27.0    Italy

You can also remove columns using Pandas’ df.drop().

# This will drop the columns inplace.
df.drop(drop_col,axis=1,inplace=True)     # inplace=True

# This will create new DataFrame, but the original DataFrame remain same
new_df = df.drop(drop_col,axis=1)         # default inplace=False

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Python Pandas Tutorials