The Pandas’s Concatenation function provides a verity of facilities to concating series or DataFrame along an axis.
pandas.concat
(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)
Parameters:
- objs : a sequence or mapping of Series or DataFrame objects
- axis : The axis to concatenate along. {0/’index’, 1/’columns’}, default 0
- join : How to handle indexes on other axes. {‘inner’, ‘outer’}, default ‘outer’
- ignore_index : bool, default False
-
- If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n – 1.
-
- keys : sequence, default None
-
- Construct hierarchical index using the passed keys as the outermost level.
-
- levels : list of sequences, default None
-
- Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys.
-
- names : Names for the levels in the resulting hierarchical index.
- verify_integrity : bool, default False.Check whether the new concatenated axis contains duplicates.
- sort : bool, default None
- copy : bool, default True
-
- If False, do not copy data unnecessarily.
-
. . .
Example
import pandas as pd data1 = {'name' :['mark','juli'],'city':['New York','Paris']} data2 = {'name' :['john','alex'],'city':['London','Tokyo'],'age':[28,56]} data3 = {'name' :['Saty','Jonathan'],'city':['germany','Moscow']} df1 = pd.DataFrame(data1,index = [0,1]) df2 = pd.DataFrame(data2,index=[2,3]) df3 = pd.DataFrame(data3,index=[1,2])
Let’s concatenate the DataFrames df1 and df3.
In [1]: pd.concat([df1,df3]) Out[1]: city name 0 New York mark 1 Paris juli # Here, the index is duplicated. 1 germany Saty 2 Moscow Jonathan # To avoid the duplicate index, use the parameter ignore_index=True. In [2]: pd.concat([df1,df3],ignore_index=True) Out[2]: name city 0 mark New York 1 juli Paris 2 Saty germany 3 Jonathan Moscow
You can also concatenate the multiple DataFrames.
In [3]: pd.concat([df1,df2,df3]) Out[3]: age city name 0 NaN New York mark 1 NaN Paris juli 2 28.0 London john 3 56.0 Tokyo alex 1 NaN germany Saty 2 NaN Moscow Jonathan
Concatenate the DataFrames Horizontally
By default, the Pandas’ concat() method concatenate the DataFrames vertically as the parameter axis=1 is defined. However, you can also merge the DataFrames horizontally by specifying the parameter axis=0.
In [4]: pd.concat([df1,df2],axis=1) # axis=1 (concatenate horizontally) Out[4]: name city name city age 0 mark New York NaN NaN NaN 1 juli Paris NaN NaN NaN 2 NaN NaN john London 28.0 3 NaN NaN alex Tokyo 56.0 In [5]: pd.concat([df1,df3],axis=1,join='inner') # join = 'inner' Out[5]: name city name city 1 juli Paris Saty germany
In [6]: pd.concat([df1, df3], axis=1).reindex(df1.index) Out[6]: city name city name 0 New York mark NaN NaN 1 Paris juli germany Saty
Construct hierarchical indexing
By defining the parameter keys, you can construct the hierarchical indexing.
In [7]: result = pd.concat([df1,df3],keys=['x','y']) In [8]: result Out[8]: name city x 0 mark New York 1 juli Paris y 1 Saty germany 2 Jonathan Moscow In [9]: result.loc['y'] Out[9]: city name 2 London john 3 Tokyo alex
. . .
Concatenating Using append
A useful shortcut to concat() are the append() instance methods on Series and DataFrame. These methods actually predated concat. They concatenate along axis=0, namely the index.
In [10]: df1.append(df3) Out[10]: city name 0 New York mark 1 Paris juli 1 germany Saty 2 Moscow Jonathan In [11]: df1.append([df3,df2]) Out[11]: city name 0 New York mark 1 Paris juli 1 germany Saty 2 Moscow Jonathan 2 London john 3 Tokyo alex
. . .