Pandas – How to remove DataFrame columns with constant (same) values?

Let’s create a Pandas DataFrame that contains features with distinct values.

import pandas as pd
import numpy as np

data = {'Student_Id':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'Name':['Mark', 'Juli', 'Alexa', 'Kevin', 'John', 'Devid', 'Mark', 'Michael', 'Johnson', 'Kevin'],
        'Age':[27, 31, 45, np.nan, 34, 48, np.nan, 31, np.nan, 27],
        'Location':['USA', 'UK', np.nan, 'France', 'Germany', 'USA', 'Germany', np.nan, 'USA', 'Italy'],
        'Program':['Master','Master','Master','Master','Master','Master','Master','Master','Master','Master'] }
df = pd.DataFrame(data)
df.head(10)

Output:

   Student_Id     Name   Age Location Program
0           1     Mark  27.0      USA  Master
1           2     Juli  31.0       UK  Master
2           3    Alexa  45.0      NaN  Master
3           4    Kevin   NaN   France  Master
4           5     John  34.0  Germany  Master
5           6    Devid  48.0      USA  Master
6           7     Mark   NaN  Germany  Master
7           8  Michael  31.0      NaN  Master
8           9  Johnson   NaN      USA  Master
9          10    Kevin  27.0    Italy  Master

Here, Program column contains same constant values. This feature won’t useful for making the prediction of the target variable as it doesn’t provide any useful insights of the data. Hence, It is better to remove this kind of features.

# Function to return the constant value columns of a given DataFrame
def remove_constant_value_features(df):
    return [e for e in df.columns if df[e].nunique() == 1]
drop_col = remove_constant_value_features(df)
drop_col

Output:

['Program']

Let’s create new DataFrame with non-constant value columns.

new_df_columns = [e for e in df.columns if e not in drop_col]
new_df = df[new_df_columns]
new_df
   Student_Id     Name   Age Location
0           1     Mark  27.0      USA
1           2     Juli  31.0       UK
2           3    Alexa  45.0      NaN
3           4    Kevin   NaN   France
4           5     John  34.0  Germany
5           6    Devid  48.0      USA
6           7     Mark   NaN  Germany
7           8  Michael  31.0      NaN
8           9  Johnson   NaN      USA
9          10    Kevin  27.0    Italy

You can also remove columns using Pandas’ df.drop().

# This will drop the columns inplace.
df.drop(drop_col,axis=1,inplace=True)     # inplace=True

# This will create new DataFrame, but the original DataFrame remain same
new_df = df.drop(drop_col,axis=1)         # default inplace=False

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Python Pandas Tutorials

Pandas – How to remove DataFrame columns with only one distinct value?

Pandas – Count unique values for each column of a DataFrame

Pandas – Count missing values (NaN) for each columns in DataFrame

Pandas – MultiIndex

Pandas – Applymap

Pandas – Apply

Pandas – Map

Pandas – Missing Data

Difference between Merge, join, and concatenate

Pandas – Join

pandas : Handling Duplicate Data

Pandas : Handling Categorical Data

Pandas : Data Types

Appending a row to DataFrame

Python Pandas – Merge

Python Pandas – Concatenation & append

Python Pandas – GroupBy

Python Pandas – Visualization

Python Pandas – Options and Customization

Python Pandas – Descriptive Statistics

Python Pandas – Basic functions

Python Pandas – DataFrame

Python Pandas – Series

Python Pandas – Introduction