In descriptive statistics, The Box plot represents the groups of numerical data through their quantiles. Box plot is widely used in machine learning to detect outlier in data. Box plot may have a line extending vertically from the box indicating variability outside the upper and lower quantiles. Box plot also has individual points outside of the box, which indicate outlier. Box plot can be horizontal or vertical.
The below figure represents the basic structure of the box plot.
A pyplot.boxplot method used to draw the box plot.
Parameters:
- x : the input data
- notch : bool(Default-False)
-
- If True, will produce a notched box plot. Otherwise, a rectangular boxplot is produced.
-
- vert : bool(Default-True)
-
- If True, makes the boxes vertical. If False, everything is drawn horizontally.
-
- positions : Sets the positions of the boxes.
- widths : Sets the width of each box
- labels : label of data
Example:
For example, I have used Kaggle’s House Prices: Advanced Regression Techniques data for demonstration.
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv("train.csv") fig,ax = plt.subplots(1,4,figsize=(15,8)) fig.suptitle("Box Plot Example", color='r',fontsize=20) green_diamond = dict(markerfacecolor='m', marker='D') ax[0].boxplot(df['GrLivArea'],widths=0.4) ax[0].set_title("Basic Plot",fontsize=15,color='g') ax[0].set_xlabel("GrLivArea") ax[1].boxplot(df['GrLivArea'],notch=True,widths=0.4) ax[1].set_title("Notched Plot",fontsize=15,color='g') ax[1].set_xlabel("GrLivArea") ax[2].boxplot(df['GrLivArea'],flierprops=green_diamond,widths=0.4) ax[2].set_title("Change Outlier Symbol",fontsize=15,color='g') ax[2].set_xlabel("GrLivArea") ax[3].boxplot(df['GrLivArea'],showfliers=False,widths=0.4) ax[3].set_title("Hide Outlier Plot",fontsize=15,color='g') ax[3].set_xlabel("GrLivArea") plt.show()
This produces the following result:
. . .