Pandas – MultiIndex – Study Machine Learning

Multi-level/Hierarchical indexing is very exciting as it allows you to perform some quite sophisticated data analysis and manipulation with higher dimensional data.

In this tutorial, you will discover the hierarchical/multi-level indexing.

Example:

In [1]:
# Let's define the DataFrame
import pandas as pd
data = [['Mark','Test_1','Maths',75], ['Mark','Test_2','Science',85],
        ['Juli','Test_1','Physics',65],['Juli','Test_2','Maths',70],
        ['Kevin','Test_1','Science',80],['Kevin','Test_2','History',90]]
df = pd.DataFrame(data, columns=['Name','Test','Subject','Score'])
df

Out[1]:
    Name    Test  Subject  Score
0   Mark  Test_1    Maths     75
1   Mark  Test_2  Science     85
2   Juli  Test_1  Physics     65
3   Juli  Test_2    Maths     70
4  Kevin  Test_1  Science     80
5  Kevin  Test_2  History     90

Pandas set_index() method provides the functionality to set the DataFrame index using existing columns.

DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)

Parameters:

keys - label or array-like or list of labels/arrays
drop - (default True) Delete columns to be used as the new index.
append - (default False) Whether to append columns to existing index.
inplace - (default False) Modify the DataFrame in place (do not create a new object).
verify_integrity - (default False) Check the new index for duplicates.

Set the Name column as the index of the DataFrame.

In [2]: df.set_index(['Name'])
Out[2]:
         Test  Subject  Score
Name                         
Mark   Test_1    Maths     75
Mark   Test_2  Science     85
Juli   Test_1  Physics     65
Juli   Test_2    Maths     70
Kevin  Test_1  Science     80
Kevin  Test_2  History     90

Create the Multi-level index using columns ‘Name’ and ‘Test’

In [3]: 
df.set_index(['Name','Test'],inplace=True)
df

Out[3]:
              Subject  Score
Name  Test                  
Mark  Test_1    Maths     75
      Test_2  Science     85
Juli  Test_1  Physics     65
      Test_2    Maths     70
Kevin Test_1  Science     80
      Test_2  History     90

Extract Specific values

You can extract specific values from the DataFrame by specifying condition using .loc[].

Let’s see the example to get the Test_2 exam score of the Mark.

In [4]: df.loc[(df.index.get_level_values('Name') == 'Mark') & 
               (df.index.get_level_values("Test") == 'Test_2')]
Out[4]:
             Subject  Score
Name Test                  
Mark Test_2  Science     85

pandas.Index.get_level_values

It will return an Index of values for the requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Index.get_level_values(self, level)

Parameters

level - It is either the integer position or the name of the level.

Examples:

# Get the values by name of the level

In [5]: df.index.get_level_values('Name')
Out[5]:
Index(['Mark', 'Mark', 'Juli', 'Juli', 'Kevin', 'Kevin'], dtype='object', name='Name')

# Get the values by level number

In [6]: df.index.get_level_values(level=1)
Out[6]: 
Index(['Test_1', 'Test_2', 'Test_1', 'Test_2', 'Test_1', 'Test_2'], dtype='object', name='Test')

Iterate over DataFrame with MultiIndex

In [7]: df
Out[7]:
              Subject  Score
Name  Test                  
Mark  Test_1    Maths     75
      Test_2  Science     85
Juli  Test_1  Physics     65
      Test_2    Maths     70
Kevin Test_1  Science     80
      Test_2  History     90

In [8]: 
for key,data in df.groupby(level=0):
    print(key)
    print(data)
    print("*"*30)

Out[8]:
Juli
             Subject  Score
Name Test                  
Juli Test_1  Physics     65
     Test_2    Maths     70
******************************
Kevin
              Subject  Score
Name  Test                  
Kevin Test_1  Science     80
      Test_2  History     90
******************************
Mark
             Subject  Score
Name Test                  
Mark Test_1    Maths     75
     Test_2  Science     85
******************************

. . .

Multilevel Columns

Create the DataFrame with multi-level Columns.

In [9]:
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6],'x':[7,8,9]})
columns=[('c','a'),('c','b'),('d','x')]    # define the list of tuple
df.columns=pd.MultiIndex.from_tuples(columns)
df

Out[9]:
   c     d
   a  b  x
0  1  4  7
1  2  5  8
2  3  6  9

Basic Indexing with MultiIndex

You can select data by defining the column label.

# Select data using single level label

In [10]: df['c']          # print the subgroup of the label 'c'
Out[10]:
   a  b
0  1  4
1  2  5
2  3  6

# Select data using multilevel label

In [11]: df['c','b']       # print the column of the label 'c' & 'b'
Out[11]:
0    4
1    5
2    6
Name: (c, b), dtype: int64

. . .

Pandas – MultiIndex

Extract Specific values

pandas.Index.get_level_values

Iterate over DataFrame with MultiIndex

Multilevel Columns

Basic Indexing with MultiIndex

Leave a Reply Cancel reply

Python Pandas Tutorials