Let’s create a Pandas DataFrame that contains features with distinct values.
import pandas as pd import numpy as np data = {'Student_Id':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'Name':['Mark', 'Juli', 'Alexa', 'Kevin', 'John', 'Devid', 'Mark', 'Michael', 'Johnson', 'Kevin'], 'Age':[27, 31, 45, np.nan, 34, 48, np.nan, 31, np.nan, 27], 'Location':['USA', 'UK', np.nan, 'France', 'Germany', 'USA', 'Germany', np.nan, 'USA', 'Italy'], 'Program':['Master','Master','Master','Master','Master','Master','Master','Master','Master','Master'] } df = pd.DataFrame(data) df.head(10)
Output:
Student_Id Name Age Location Program 0 1 Mark 27.0 USA Master 1 2 Juli 31.0 UK Master 2 3 Alexa 45.0 NaN Master 3 4 Kevin NaN France Master 4 5 John 34.0 Germany Master 5 6 Devid 48.0 USA Master 6 7 Mark NaN Germany Master 7 8 Michael 31.0 NaN Master 8 9 Johnson NaN USA Master 9 10 Kevin 27.0 Italy Master
Here, Program column contains same constant values. This feature won’t useful for making the prediction of the target variable as it doesn’t provide any useful insights of the data. Hence, It is better to remove this kind of features.
# Function to return the constant value columns of a given DataFrame def remove_constant_value_features(df): return [e for e in df.columns if df[e].nunique() == 1]
drop_col = remove_constant_value_features(df) drop_col
Output:
['Program']
Let’s create new DataFrame with non-constant value columns.
new_df_columns = [e for e in df.columns if e not in drop_col] new_df = df[new_df_columns] new_df
Student_Id Name Age Location 0 1 Mark 27.0 USA 1 2 Juli 31.0 UK 2 3 Alexa 45.0 NaN 3 4 Kevin NaN France 4 5 John 34.0 Germany 5 6 Devid 48.0 USA 6 7 Mark NaN Germany 7 8 Michael 31.0 NaN 8 9 Johnson NaN USA 9 10 Kevin 27.0 Italy
You can also remove columns using Pandas’ df.drop().
# This will drop the columns inplace. df.drop(drop_col,axis=1,inplace=True) # inplace=True # This will create new DataFrame, but the original DataFrame remain same new_df = df.drop(drop_col,axis=1) # default inplace=False
. . .