Multi-level/Hierarchical indexing is very exciting as it allows you to perform some quite sophisticated data analysis and manipulation with higher dimensional data.
In this tutorial, you will discover the hierarchical/multi-level indexing.
Example:
In [1]:
# Let's define the DataFrame
import pandas as pd
data = [['Mark','Test_1','Maths',75], ['Mark','Test_2','Science',85],
['Juli','Test_1','Physics',65],['Juli','Test_2','Maths',70],
['Kevin','Test_1','Science',80],['Kevin','Test_2','History',90]]
df = pd.DataFrame(data, columns=['Name','Test','Subject','Score'])
df
Out[1]:
Name Test Subject Score
0 Mark Test_1 Maths 75
1 Mark Test_2 Science 85
2 Juli Test_1 Physics 65
3 Juli Test_2 Maths 70
4 Kevin Test_1 Science 80
5 Kevin Test_2 History 90
Pandas set_index() method provides the functionality to set the DataFrame index using existing columns.
DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)
Parameters:
keys - label or array-like or list of labels/arrays
drop - (default True) Delete columns to be used as the new index.
append - (default False) Whether to append columns to existing index.
inplace - (default False) Modify the DataFrame in place (do not create a new object).
verify_integrity - (default False) Check the new index for duplicates.
Set the Name column as the index of the DataFrame.
In [2]: df.set_index(['Name'])
Out[2]:
Test Subject Score
Name
Mark Test_1 Maths 75
Mark Test_2 Science 85
Juli Test_1 Physics 65
Juli Test_2 Maths 70
Kevin Test_1 Science 80
Kevin Test_2 History 90
Create the Multi-level index using columns ‘Name’ and ‘Test’
In [3]:
df.set_index(['Name','Test'],inplace=True)
df
Out[3]:
Subject Score
Name Test
Mark Test_1 Maths 75
Test_2 Science 85
Juli Test_1 Physics 65
Test_2 Maths 70
Kevin Test_1 Science 80
Test_2 History 90
Extract Specific values
You can extract specific values from the DataFrame by specifying condition using .loc[].
Let’s see the example to get the Test_2 exam score of the Mark.
In [4]: df.loc[(df.index.get_level_values('Name') == 'Mark') &
(df.index.get_level_values("Test") == 'Test_2')]
Out[4]:
Subject Score
Name Test
Mark Test_2 Science 85
pandas.Index.get_level_values
It will return an Index of values for the requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
Index.get_level_values(self, level)
Parameters
level - It is either the integer position or the name of the level.
Examples:
# Get the values by name of the level
In [5]: df.index.get_level_values('Name')
Out[5]:
Index(['Mark', 'Mark', 'Juli', 'Juli', 'Kevin', 'Kevin'], dtype='object', name='Name')
# Get the values by level number
In [6]: df.index.get_level_values(level=1)
Out[6]:
Index(['Test_1', 'Test_2', 'Test_1', 'Test_2', 'Test_1', 'Test_2'], dtype='object', name='Test')
Iterate over DataFrame with MultiIndex
In [7]: df
Out[7]:
Subject Score
Name Test
Mark Test_1 Maths 75
Test_2 Science 85
Juli Test_1 Physics 65
Test_2 Maths 70
Kevin Test_1 Science 80
Test_2 History 90
In [8]:
for key,data in df.groupby(level=0):
print(key)
print(data)
print("*"*30)
Out[8]:
Juli
Subject Score
Name Test
Juli Test_1 Physics 65
Test_2 Maths 70
******************************
Kevin
Subject Score
Name Test
Kevin Test_1 Science 80
Test_2 History 90
******************************
Mark
Subject Score
Name Test
Mark Test_1 Maths 75
Test_2 Science 85
******************************
. . .
Multilevel Columns
Create the DataFrame with multi-level Columns.
In [9]:
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6],'x':[7,8,9]})
columns=[('c','a'),('c','b'),('d','x')] # define the list of tuple
df.columns=pd.MultiIndex.from_tuples(columns)
df
Out[9]:
c d
a b x
0 1 4 7
1 2 5 8
2 3 6 9
Basic Indexing with MultiIndex
You can select data by defining the column label.
# Select data using single level label In [10]: df['c'] # print the subgroup of the label 'c' Out[10]: a b 0 1 4 1 2 5 2 3 6 # Select data using multilevel label In [11]: df['c','b'] # print the column of the label 'c' & 'b' Out[11]: 0 4 1 5 2 6 Name: (c, b), dtype: int64
. . .