Pandas – MultiIndex

Multi-level/Hierarchical indexing is very exciting as it allows you to perform some quite sophisticated data analysis and manipulation with higher dimensional data.

In this tutorial, you will discover the hierarchical/multi-level indexing.


In [1]:
# Let's define the DataFrame
import pandas as pd
data = [['Mark','Test_1','Maths',75], ['Mark','Test_2','Science',85],
df = pd.DataFrame(data, columns=['Name','Test','Subject','Score'])

    Name    Test  Subject  Score
0   Mark  Test_1    Maths     75
1   Mark  Test_2  Science     85
2   Juli  Test_1  Physics     65
3   Juli  Test_2    Maths     70
4  Kevin  Test_1  Science     80
5  Kevin  Test_2  History     90

Pandas set_index() method provides the functionality to set the DataFrame index using existing columns.



keys - label or array-like or list of labels/arrays
drop - (default True) Delete columns to be used as the new index.
append - (default False) Whether to append columns to existing index.
inplace - (default False) Modify the DataFrame in place (do not create a new object).
verify_integrity - (default False) Check the new index for duplicates.

Set the Name column as the index of the DataFrame.

In [2]: df.set_index(['Name'])
         Test  Subject  Score
Mark   Test_1    Maths     75
Mark   Test_2  Science     85
Juli   Test_1  Physics     65
Juli   Test_2    Maths     70
Kevin  Test_1  Science     80
Kevin  Test_2  History     90

Create the Multi-level index using columns ‘Name’ and ‘Test’

In [3]: 

              Subject  Score
Name  Test                  
Mark  Test_1    Maths     75
      Test_2  Science     85
Juli  Test_1  Physics     65
      Test_2    Maths     70
Kevin Test_1  Science     80
      Test_2  History     90

Extract Specific values

You can extract specific values from the DataFrame by specifying condition using .loc[].

Let’s see the example to get the Test_2 exam score of the Mark.

In [4]: df.loc[(df.index.get_level_values('Name') == 'Mark') & 
               (df.index.get_level_values("Test") == 'Test_2')]
             Subject  Score
Name Test                  
Mark Test_2  Science     85


It will return an Index of values for the requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.



level - It is either the integer position or the name of the level.


# Get the values by name of the level

In [5]: df.index.get_level_values('Name')
Index(['Mark', 'Mark', 'Juli', 'Juli', 'Kevin', 'Kevin'], dtype='object', name='Name')

# Get the values by level number

In [6]: df.index.get_level_values(level=1)
Index(['Test_1', 'Test_2', 'Test_1', 'Test_2', 'Test_1', 'Test_2'], dtype='object', name='Test')

Iterate over DataFrame with MultiIndex

In [7]: df
              Subject  Score
Name  Test                  
Mark  Test_1    Maths     75
      Test_2  Science     85
Juli  Test_1  Physics     65
      Test_2    Maths     70
Kevin Test_1  Science     80
      Test_2  History     90

In [8]: 
for key,data in df.groupby(level=0):

             Subject  Score
Name Test                  
Juli Test_1  Physics     65
     Test_2    Maths     70
              Subject  Score
Name  Test                  
Kevin Test_1  Science     80
      Test_2  History     90
             Subject  Score
Name Test                  
Mark Test_1    Maths     75
     Test_2  Science     85

.     .     .

Multilevel Columns

Create the DataFrame with multi-level Columns.

In [9]:
columns=[('c','a'),('c','b'),('d','x')]    # define the list of tuple

   c     d
   a  b  x
0  1  4  7
1  2  5  8
2  3  6  9

Basic Indexing with MultiIndex

You can select data by defining the column label.

# Select data using single level label

In [10]: df['c']          # print the subgroup of the label 'c'
   a  b
0  1  4
1  2  5
2  3  6

# Select data using multilevel label

In [11]: df['c','b']       # print the column of the label 'c' & 'b'
0    4
1    5
2    6
Name: (c, b), dtype: int64

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Python Pandas Tutorials

Pandas – How to remove DataFrame columns with constant (same) values?

Pandas – How to remove DataFrame columns with only one distinct value?

Pandas – Count unique values for each column of a DataFrame

Pandas – Count missing values (NaN) for each columns in DataFrame

Pandas – Applymap

Pandas – Apply

Pandas – Map

Pandas – Missing Data

Difference between Merge, join, and concatenate

Pandas – Join

pandas : Handling Duplicate Data

Pandas : Handling Categorical Data

Pandas : Data Types

Appending a row to DataFrame

Python Pandas – Merge

Python Pandas – Concatenation & append

Python Pandas – GroupBy

Python Pandas – Visualization

Python Pandas – Options and Customization

Python Pandas – Descriptive Statistics

Python Pandas – Basic functions

Python Pandas – DataFrame

Python Pandas – Series

Python Pandas – Introduction

Study Machine Learning