Pandas : Data Types

This tutorial has explained about the data type of the columns of Pandas  DataFrame/Series. Type-casting methods are also described. You can also select the columns of the DataFrame by using select_dtype() method.

In [1]:
import pandas as pd
d = {'A' : [10,20,30],'B' : ['a','b','c'],'C':[1.0,2.0,3.0]}
df = pd.DataFrame(d)
df
Out[1]:
    A  B    C
0  10  a  1.0
1  20  b  2.0
2  30  c  3.0

Pandas df.dtypes attribute is used to get the data type of each column of DataFrame. Let’s get the data type of each column:

In [2]: df.dtypes
Out[2]:
A      int64
B     object
C    float64
dtype: object

Note – the string data type in Pandas is recognized as object data type.

Type Casting

Pandas’ astype() function used to change the data type.

In [3]: 
df['C'] = df['C'].astype('int')  # Change Data type of column C to integer
df['C'].dtypes
Out[3]: 
dtype('int64')

In [4]:
df['A'] = df['A'].astype('str')  # Change Data type of column A to string
df.dtypes
Out[4]:
A    object
B    object
C     int64
dtype: object

Change the data type to numeric

panda.to_numeric() method used to change the data type to numeric type.

Syntax:

pd.to_numeric(arg, errors, downcast)

arg      - scalar, list, tuple, 1-d array, or Series
errors   - {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
          'raise' : invalid parsing will raise an exception
          ‘coerce’: invalid parsing will be set as NaN
          ‘ignore’: invalid parsing will return the input
downcast - {‘integer’, ‘signed’, ‘unsigned’, ‘float’} , default None
In [5]: 
df = pd.DataFrame({'A' : [10,20,'x'],'B' : ['a','b','c'],'C':['1','2','3']})
pd.to_numeric(df['C'])
Out[5]:
0    1
1    2
2    3
Name: C, dtype: int64

Note: This method raises an error is the parsed input can not be converted to numeric. Let’s convert the column A to numeric.

In [6]: pd.to_numeric(df['A'])
Out[6]: ValueError: Unable to parse string "x" at position 2

Here, the error has risen as the method not able to convert an alphabetic character string to the numeric type.  To avoid the error, use parameter errors=’ignore’ or errors=’coerce’.

In [7]: pd.to_numeric(df['A'],errors='ignore')    # set parameter errors='ignore'
Out[7]:
0    10
1    20
2     x
Name: A, dtype: object

In [8]: pd.to_numeric(df['A'],errors='coerce')    # set parameter errors='coerce'
Out[8]:
0    10.0
1    20.0
2     NaN
Name: A, dtype: float64

Selecting columns based on data types

Pandas’ select_dtypes method used to select specific columns based on dtypes. You need to specify the data types in include/exclude parameter, which you want to select.

In [9]: 
df = pd.DataFrame({'A' : [True,True,False],'B' : [1.2,3.5,9.0],'C':[1,2,3]})
df
Out[9]: 
       A    B  C
0   True  1.2  1
1   True  3.5  2
2  False  9.0  3

In [10]: df.select_dtypes(include='int')
Out[10]:
   C
0  1
1  2
2  3

In [11]: df.select_dtypes(include='number')
Out[11]:
     B  C
0  1.2  1
1  3.5  2
2  9.0  3

In [12]: df.select_dtypes(include='number',exclude="float")
Out[12]:
   C
0  1
1  2
2  3

Note: select_dtypes() method used to raise an error if both include and exclude are empty and include and exclude contain the overlapping elements.

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Python Pandas Tutorials