This tutorial has explained about the data type of the columns of Pandas DataFrame/Series. Type-casting methods are also described. You can also select the columns of the DataFrame by using select_dtype() method.
In [1]:
import pandas as pd
d = {'A' : [10,20,30],'B' : ['a','b','c'],'C':[1.0,2.0,3.0]}
df = pd.DataFrame(d)
df
Out[1]:
A B C
0 10 a 1.0
1 20 b 2.0
2 30 c 3.0
Pandas df.dtypes attribute is used to get the data type of each column of DataFrame. Let’s get the data type of each column:
In [2]: df.dtypes Out[2]: A int64 B object C float64 dtype: object
Note – the string data type in Pandas is recognized as object data type.
Type Casting
Pandas’ astype() function used to change the data type.
In [3]:
df['C'] = df['C'].astype('int') # Change Data type of column C to integer
df['C'].dtypes
Out[3]:
dtype('int64')
In [4]:
df['A'] = df['A'].astype('str') # Change Data type of column A to string
df.dtypes
Out[4]:
A object
B object
C int64
dtype: object
Change the data type to numeric
panda.to_numeric() method used to change the data type to numeric type.
Syntax:
pd.to_numeric(arg, errors, downcast)
arg - scalar, list, tuple, 1-d array, or Series
errors - {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
'raise' : invalid parsing will raise an exception
‘coerce’: invalid parsing will be set as NaN
‘ignore’: invalid parsing will return the input
downcast - {‘integer’, ‘signed’, ‘unsigned’, ‘float’} , default None
In [5]:
df = pd.DataFrame({'A' : [10,20,'x'],'B' : ['a','b','c'],'C':['1','2','3']})
pd.to_numeric(df['C'])
Out[5]:
0 1
1 2
2 3
Name: C, dtype: int64
Note: This method raises an error is the parsed input can not be converted to numeric. Let’s convert the column A to numeric.
In [6]: pd.to_numeric(df['A'])
Out[6]: ValueError: Unable to parse string "x" at position 2
Here, the error has risen as the method not able to convert an alphabetic character string to the numeric type. To avoid the error, use parameter errors=’ignore’ or errors=’coerce’.
In [7]: pd.to_numeric(df['A'],errors='ignore') # set parameter errors='ignore' Out[7]: 0 10 1 20 2 x Name: A, dtype: object In [8]: pd.to_numeric(df['A'],errors='coerce') # set parameter errors='coerce' Out[8]: 0 10.0 1 20.0 2 NaN Name: A, dtype: float64
Selecting columns based on data types
Pandas’ select_dtypes method used to select specific columns based on dtypes. You need to specify the data types in include/exclude parameter, which you want to select.
In [9]:
df = pd.DataFrame({'A' : [True,True,False],'B' : [1.2,3.5,9.0],'C':[1,2,3]})
df
Out[9]:
A B C
0 True 1.2 1
1 True 3.5 2
2 False 9.0 3
In [10]: df.select_dtypes(include='int')
Out[10]:
C
0 1
1 2
2 3
In [11]: df.select_dtypes(include='number')
Out[11]:
B C
0 1.2 1
1 3.5 2
2 9.0 3
In [12]: df.select_dtypes(include='number',exclude="float")
Out[12]:
C
0 1
1 2
2 3
Note: select_dtypes() method used to raise an error if both include and exclude are empty and include and exclude contain the overlapping elements.
. . .