This tutorial has explained about the data type of the columns of Pandas DataFrame/Series. Type-casting methods are also described. You can also select the columns of the DataFrame by using select_dtype() method.
In [1]: import pandas as pd d = {'A' : [10,20,30],'B' : ['a','b','c'],'C':[1.0,2.0,3.0]} df = pd.DataFrame(d) df Out[1]: A B C 0 10 a 1.0 1 20 b 2.0 2 30 c 3.0
Pandas df.dtypes attribute is used to get the data type of each column of DataFrame. Let’s get the data type of each column:
In [2]: df.dtypes Out[2]: A int64 B object C float64 dtype: object
Note – the string data type in Pandas is recognized as object data type.
Type Casting
Pandas’ astype() function used to change the data type.
In [3]: df['C'] = df['C'].astype('int') # Change Data type of column C to integer df['C'].dtypes Out[3]: dtype('int64') In [4]: df['A'] = df['A'].astype('str') # Change Data type of column A to string df.dtypes Out[4]: A object B object C int64 dtype: object
Change the data type to numeric
panda.to_numeric() method used to change the data type to numeric type.
Syntax:
pd.to_numeric(arg, errors, downcast) arg - scalar, list, tuple, 1-d array, or Series errors - {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’ 'raise' : invalid parsing will raise an exception ‘coerce’: invalid parsing will be set as NaN ‘ignore’: invalid parsing will return the input downcast - {‘integer’, ‘signed’, ‘unsigned’, ‘float’} , default None
In [5]: df = pd.DataFrame({'A' : [10,20,'x'],'B' : ['a','b','c'],'C':['1','2','3']}) pd.to_numeric(df['C']) Out[5]: 0 1 1 2 2 3 Name: C, dtype: int64
Note: This method raises an error is the parsed input can not be converted to numeric. Let’s convert the column A to numeric.
In [6]: pd.to_numeric(df['A'])
Out[6]: ValueError: Unable to parse string "x" at position 2
Here, the error has risen as the method not able to convert an alphabetic character string to the numeric type. To avoid the error, use parameter errors=’ignore’ or errors=’coerce’.
In [7]: pd.to_numeric(df['A'],errors='ignore') # set parameter errors='ignore' Out[7]: 0 10 1 20 2 x Name: A, dtype: object In [8]: pd.to_numeric(df['A'],errors='coerce') # set parameter errors='coerce' Out[8]: 0 10.0 1 20.0 2 NaN Name: A, dtype: float64
Selecting columns based on data types
Pandas’ select_dtypes method used to select specific columns based on dtypes. You need to specify the data types in include/exclude parameter, which you want to select.
In [9]: df = pd.DataFrame({'A' : [True,True,False],'B' : [1.2,3.5,9.0],'C':[1,2,3]}) df Out[9]: A B C 0 True 1.2 1 1 True 3.5 2 2 False 9.0 3 In [10]: df.select_dtypes(include='int') Out[10]: C 0 1 1 2 2 3 In [11]: df.select_dtypes(include='number') Out[11]: B C 0 1.2 1 1 3.5 2 2 9.0 3 In [12]: df.select_dtypes(include='number',exclude="float") Out[12]: C 0 1 1 2 2 3
Note: select_dtypes() method used to raise an error if both include and exclude are empty and include and exclude contain the overlapping elements.
. . .