Pandas – Count unique values for each column of a DataFrame

In this tutorial, you will get to know about unique values in a DataFrame. The real-life dataset often contains duplicate values.

 

Let’s create a Pandas DataFrame that contains duplicate values.

import pandas as pd
import numpy as np

data = {'Id':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'Name':['Mark', 'Juli', 'Alexa', 'Kevin', 'John', 'Devid', 'Mark', 'Michael', 'Johnson', 'Kevin'],
        'Age':[27, 31, 45, np.nan, 34, 48, np.nan, 31, np.nan, 27],
        'Location':['USA', 'UK', np.nan, 'France', 'Germany', 'USA', 'Germany', np.nan, 'USA', 'Italy']}
df = pd.DataFrame(data)
df.head(10)

Output:

   Id     Name   Age Location
0   1     Mark  27.0      USA
1   2     Juli  31.0       UK
2   3    Alexa  45.0      NaN
3   4    Kevin   NaN   France
4   5     John  34.0  Germany
5   6    Devid  48.0      USA
6   7     Mark   NaN  Germany
7   8  Michael  31.0      NaN
8   9  Johnson   NaN      USA
9  10    Kevin  27.0    Italy

Count Unique Values

Pandas provides df.nunique() method to count distinct observation over requested axis.

DataFrame.nunique(self, axis=0, dropna=True)

Parameters 
axis : 0 {0 or ‘index’, 1 or ‘columns’}, default 0 
dropna : bool, default True (Don’t include NaN in the counts.)

Let’s define the function that counts the total number of unique values for each column in a DataFrame.

# Function to count the unique values for each column in a DataFrame
def count_unique_values(data):
    
    total = data.count()
    temp = pd.DataFrame(total)
    temp.columns = ['Total']            # Count total number of non-null values
    
    uniques = []
    for col in data.columns:
        unique = data[col].nunique()    # Get unique values for each column
        uniques.append(unique)
    temp['Uniques'] = uniques
    
    return(np.transpose(temp))
count_unique_values(df)

Output:

         Id  Name  Age  Location
Total    10    10    7         8
Uniques  10     8    5         5

Unique Values

Pandas also provide pd.unique() function that returns unique value list of the input column/Series.

Example:

>>> df = pd.DataFrame({'name':['Huli', 'bee', 'Mark'], 'age':[1, 3, 3]})
>>> print(df)
   name  age
0  Huli    11
1   bee    30
2  Mark    30

>>> df['name'].unique()
array(['Huli', 'bee', 'Mark'], dtype=object)

>>> df['age'].unique()
array([11, 30], dtype=int64)

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Python Pandas Tutorials