In this tutorial, you will get to know about unique values in a DataFrame. The real-life dataset often contains duplicate values.

Let’s create a Pandas DataFrame that contains duplicate values.
import pandas as pd
import numpy as np
data = {'Id':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Name':['Mark', 'Juli', 'Alexa', 'Kevin', 'John', 'Devid', 'Mark', 'Michael', 'Johnson', 'Kevin'],
'Age':[27, 31, 45, np.nan, 34, 48, np.nan, 31, np.nan, 27],
'Location':['USA', 'UK', np.nan, 'France', 'Germany', 'USA', 'Germany', np.nan, 'USA', 'Italy']}
df = pd.DataFrame(data)
df.head(10)
Output:
Id Name Age Location 0 1 Mark 27.0 USA 1 2 Juli 31.0 UK 2 3 Alexa 45.0 NaN 3 4 Kevin NaN France 4 5 John 34.0 Germany 5 6 Devid 48.0 USA 6 7 Mark NaN Germany 7 8 Michael 31.0 NaN 8 9 Johnson NaN USA 9 10 Kevin 27.0 Italy
Count Unique Values
Pandas provides df.nunique() method to count distinct observation over requested axis.
DataFrame.nunique(self, axis=0, dropna=True)
Parameters
axis : 0 {0 or ‘index’, 1 or ‘columns’}, default 0
dropna : bool, default True (Don’t include NaN in the counts.)
Let’s define the function that counts the total number of unique values for each column in a DataFrame.
# Function to count the unique values for each column in a DataFrame
def count_unique_values(data):
total = data.count()
temp = pd.DataFrame(total)
temp.columns = ['Total'] # Count total number of non-null values
uniques = []
for col in data.columns:
unique = data[col].nunique() # Get unique values for each column
uniques.append(unique)
temp['Uniques'] = uniques
return(np.transpose(temp))
count_unique_values(df)
Output:
Id Name Age Location Total 10 10 7 8 Uniques 10 8 5 5
Unique Values
Pandas also provide pd.unique() function that returns unique value list of the input column/Series.
Example:
>>> df = pd.DataFrame({'name':['Huli', 'bee', 'Mark'], 'age':[1, 3, 3]})
>>> print(df)
name age
0 Huli 11
1 bee 30
2 Mark 30
>>> df['name'].unique()
array(['Huli', 'bee', 'Mark'], dtype=object)
>>> df['age'].unique()
array([11, 30], dtype=int64)
. . .