Multi-level/Hierarchical indexing is very exciting as it allows you to perform some quite sophisticated data analysis and manipulation with higher dimensional data.
In this tutorial, you will discover the hierarchical/multi-level indexing.
Example:
In [1]: # Let's define the DataFrame import pandas as pd data = [['Mark','Test_1','Maths',75], ['Mark','Test_2','Science',85], ['Juli','Test_1','Physics',65],['Juli','Test_2','Maths',70], ['Kevin','Test_1','Science',80],['Kevin','Test_2','History',90]] df = pd.DataFrame(data, columns=['Name','Test','Subject','Score']) df Out[1]: Name Test Subject Score 0 Mark Test_1 Maths 75 1 Mark Test_2 Science 85 2 Juli Test_1 Physics 65 3 Juli Test_2 Maths 70 4 Kevin Test_1 Science 80 5 Kevin Test_2 History 90
Pandas set_index() method provides the functionality to set the DataFrame index using existing columns.
DataFrame.
set_index
(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)
Parameters:
keys - label or array-like or list of labels/arrays
drop - (default True) Delete columns to be used as the new index.
append - (default False) Whether to append columns to existing index.
inplace - (default False) Modify the DataFrame in place (do not create a new object).
verify_integrity - (default False) Check the new index for duplicates.
Set the Name column as the index of the DataFrame.
In [2]: df.set_index(['Name']) Out[2]: Test Subject Score Name Mark Test_1 Maths 75 Mark Test_2 Science 85 Juli Test_1 Physics 65 Juli Test_2 Maths 70 Kevin Test_1 Science 80 Kevin Test_2 History 90
Create the Multi-level index using columns ‘Name’ and ‘Test’
In [3]: df.set_index(['Name','Test'],inplace=True) df Out[3]: Subject Score Name Test Mark Test_1 Maths 75 Test_2 Science 85 Juli Test_1 Physics 65 Test_2 Maths 70 Kevin Test_1 Science 80 Test_2 History 90
Extract Specific values
You can extract specific values from the DataFrame by specifying condition using .loc[].
Let’s see the example to get the Test_2 exam score of the Mark.
In [4]: df.loc[(df.index.get_level_values('Name') == 'Mark') & (df.index.get_level_values("Test") == 'Test_2')] Out[4]: Subject Score Name Test Mark Test_2 Science 85
pandas.Index.get_level_values
It will return an Index of values for the requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
Index.
get_level_values
(self, level)
Parameters
level - It is either the integer position or the name of the level.
Examples:
# Get the values by name of the level In [5]: df.index.get_level_values('Name') Out[5]: Index(['Mark', 'Mark', 'Juli', 'Juli', 'Kevin', 'Kevin'], dtype='object', name='Name') # Get the values by level number In [6]: df.index.get_level_values(level=1) Out[6]: Index(['Test_1', 'Test_2', 'Test_1', 'Test_2', 'Test_1', 'Test_2'], dtype='object', name='Test')
Iterate over DataFrame with MultiIndex
In [7]: df Out[7]: Subject Score Name Test Mark Test_1 Maths 75 Test_2 Science 85 Juli Test_1 Physics 65 Test_2 Maths 70 Kevin Test_1 Science 80 Test_2 History 90 In [8]: for key,data in df.groupby(level=0): print(key) print(data) print("*"*30) Out[8]: Juli Subject Score Name Test Juli Test_1 Physics 65 Test_2 Maths 70 ****************************** Kevin Subject Score Name Test Kevin Test_1 Science 80 Test_2 History 90 ****************************** Mark Subject Score Name Test Mark Test_1 Maths 75 Test_2 Science 85 ******************************
. . .
Multilevel Columns
Create the DataFrame with multi-level Columns.
In [9]: df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6],'x':[7,8,9]}) columns=[('c','a'),('c','b'),('d','x')] # define the list of tuple df.columns=pd.MultiIndex.from_tuples(columns) df Out[9]: c d a b x 0 1 4 7 1 2 5 8 2 3 6 9
Basic Indexing with MultiIndex
You can select data by defining the column label.
# Select data using single level label In [10]: df['c'] # print the subgroup of the label 'c' Out[10]: a b 0 1 4 1 2 5 2 3 6 # Select data using multilevel label In [11]: df['c','b'] # print the column of the label 'c' & 'b' Out[11]: 0 4 1 5 2 6 Name: (c, b), dtype: int64
. . .