Pandas – Join

The DataFrame.join() method used to join the columns of another Dataframe either on index or on a key column.

DataFrame.join(selfotheron=Nonehow='left'lsuffix=''rsuffix=''sort=False)

Parameters:

other - DataFrame, Series, or list of DataFrame.
        Index should be similar to one of the columns in this one.
on - str, list of str (optional)
how - {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’

      left: use calling frame’s index (or column if on is specified)
      right: use other’s index.
      outer: form union of calling frame’s index (or column if on is specified)
             with other’s index, and sort it. lexicographically.
      inner: form intersection of calling frame’s index (or column if on is specified)
             with other’s index, preserving the order of the calling’s one.

lsuffix - Suffix to use from left frame’s overlapping columns.(default ‘’)
rsuffix - Suffix to use from right frame’s overlapping columns.(default ‘’)
sort - (default False)

Example

In [1]:
# Let's define the DataFrames
import pandas as pd
df1 = pd.DataFrame({'name': ['Mark', 'Juli', 'Denial'], 'Place': ['Paris', 'London', 'Moscow']}, index=['K0', 'K1', 'K2'])
df2 = pd.DataFrame({'age': [25, 35, 28], 'Gender': ['M', 'F', 'M']}, index=['K0', 'K2', 'K3'])

Join the DataFrame df1 and df2.

In [2]: df1.join(df2)
Out[2]:
      name   Place   age Gender
K0    Mark   Paris  25.0      M
K1    Juli  London   NaN    NaN
K2  Denial  Moscow  35.0      F

In [3]: df1.join(df2, how='outer')         # how='outer'
Out[3]:
      name   Place   age Gender
K0    Mark   Paris  25.0      M
K1    Juli  London   NaN    NaN
K2  Denial  Moscow  35.0      F
K3     NaN     NaN  28.0      M

In [4]: df1.join(df2, how='inner')           # how='inner'
Out[4]: 
      name   Place  age Gender
K0    Mark   Paris   25      M
K2  Denial  Moscow   35      F

Joining key columns on an index

join() takes an optional on argument which may be a column or multiple column names, which specifies that the passed DataFrame is to be aligned on that column in the DataFrame.

In [5]: 
df1 = pd.DataFrame({'key':['k0','k1','k0'] , 'name': ['Mark', 'Juli', 'Denial'] , 
                    'Place': ['Paris', 'London', 'Moscow']})
In [6]:
df2 = pd.DataFrame({'age': [25, 35, 28], 'Gender': ['M', 'F', 'M']}, 
                   index=['k0', 'k1', 'k2'])

In [7]: Result = df1.join(df2,on='key')

Joining Multiple DataFrames

In [8]:
df1 = pd.DataFrame({'A': [1, 2, 3]}, index=['K0', 'K1', 'K2'])
df2 = pd.DataFrame({'A': [4, 5, 6]}, index=['K0', 'K0', 'K3'])
df3 = pd.DataFrame({'A': [7, 8, 9]}, index=['K1', 'K1', 'K2'])

In [9]: Result = df1.join([df2, df3])

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Python Pandas Tutorials