The DataFrame.join() method used to join the columns of another Dataframe either on index or on a key column.
DataFrame.
join
(self, other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
Parameters:
other - DataFrame, Series, or list of DataFrame. Index should be similar to one of the columns in this one. on - str, list of str (optional) how - {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ left: use calling frame’s index (or column if on is specified) right: use other’s index. outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it. lexicographically. inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one. lsuffix - Suffix to use from left frame’s overlapping columns.(default ‘’) rsuffix - Suffix to use from right frame’s overlapping columns.(default ‘’) sort - (default False)
Example
In [1]: # Let's define the DataFrames import pandas as pd df1 = pd.DataFrame({'name': ['Mark', 'Juli', 'Denial'], 'Place': ['Paris', 'London', 'Moscow']}, index=['K0', 'K1', 'K2']) df2 = pd.DataFrame({'age': [25, 35, 28], 'Gender': ['M', 'F', 'M']}, index=['K0', 'K2', 'K3'])
Join the DataFrame df1 and df2.
In [2]: df1.join(df2) Out[2]: name Place age Gender K0 Mark Paris 25.0 M K1 Juli London NaN NaN K2 Denial Moscow 35.0 F In [3]: df1.join(df2, how='outer') # how='outer' Out[3]: name Place age Gender K0 Mark Paris 25.0 M K1 Juli London NaN NaN K2 Denial Moscow 35.0 F K3 NaN NaN 28.0 M In [4]: df1.join(df2, how='inner') # how='inner' Out[4]: name Place age Gender K0 Mark Paris 25 M K2 Denial Moscow 35 F
Joining key columns on an index
join()
takes an optional on
argument which may be a column or multiple column names, which specifies that the passed DataFrame
is to be aligned on that column in the DataFrame
.
In [5]: df1 = pd.DataFrame({'key':['k0','k1','k0'] , 'name': ['Mark', 'Juli', 'Denial'] , 'Place': ['Paris', 'London', 'Moscow']}) In [6]: df2 = pd.DataFrame({'age': [25, 35, 28], 'Gender': ['M', 'F', 'M']}, index=['k0', 'k1', 'k2']) In [7]: Result = df1.join(df2,on='key')
Joining Multiple DataFrames
In [8]: df1 = pd.DataFrame({'A': [1, 2, 3]}, index=['K0', 'K1', 'K2']) df2 = pd.DataFrame({'A': [4, 5, 6]}, index=['K0', 'K0', 'K3']) df3 = pd.DataFrame({'A': [7, 8, 9]}, index=['K1', 'K1', 'K2']) In [9]: Result = df1.join([df2, df3])
. . .