Decision Trees are a non-parametric supervised learning method used for classification and regression. Decision Tree is a non-linear model built by constructing linear boundaries. The goal is to create a model that predicts the value of a target variable by learning a simple set of if-then-else decision rules inferred from the data features. The deeper the tree, the more complex the decision rules and the fitter the model.
Scikit-Learn library provides the implementation of the many machine learning algorithms in Python. It is been very useful, we don’t need to write the code for it. But, it is necessary to understand how the model works.
This tutorial has explained how a Decision tree works with example and visualization of the tree.
For the above two features data points, Decision tree required only one split to train a model into two-class classification. Although this problem is simple, but more complex data leads to many split to train a model.
Let’s built a decision tree classifier on real-life data and visualize how a tree looks like. The Data contain the height and width of the house in square feet and need to predict the shape of the house such as regular or irregular. Here the target variable is binary:
Target :
- 0 – Irregular shape of a house
- 1 – Regular shape of a house
Width(W) | Height(H) | target |
5 | 5 | 1 |
4 | 3 | 1 |
1 | 5 | 0 |
9 | 7 | 1 |
2 | 7 | 0 |
6 | 5 | 1 |
9 | 3 | 0 |
8 | 4 | 0 |
Let’s create a Decision Tree classifier using Scikit-Learn library.
import pandas as pd import graphviz import numpy as np from sklearn import tree model = tree.DecisionTreeClassifier() data={'h':[5,3,5,7,5,7,3,4],'w':[5,4,6,9,1,2,9,8],'target':[1,1,1,1,0,0,0,0]} df = pd.DataFrame(data) df = df.sample(8) train_x = df.drop('target',axis=1) train_y = df['target'] model.fit(train_x,train_y)
. . .
Visualization a Decision Tree
Decision Tree is an interpretable model. Scikit-Learn provides export_graphviz function to visualize the tree. First of all, we need to install Graphviz Python package using the following command:
pip install graphviz
We can also export the tree in Graphviz format using the export_graphviz exporter. The export_graphviz exporter also supports a variety of parameters to control the look such as coloring nodes by their class or value for regression.
# Export as dot file tree.export_graphviz(model, out_file='tree1.dot', feature_names = ['h','w'], class_names = ['0','1'], rounded = True, proportion = False, precision = 2, filled = True) # Convert to png using system command (requires Graphviz) from subprocess import call call(['dot', '-Tpng', 'tree1.dot', '-o', 'DTtree1.png', '-Gdpi=600']) # Display in jupyter notebook from IPython.display import Image Image(filename = 'DTtree1.png')
Fig(a) depicts the tree which is made by Decision Tree classifier and Fig(b) shows the splits of data points which are drawn by the decision tree.
(a) | (b) |
The above tree has various parameters. All the node has 5 parameters except the leaf node.
Gini : It measures the Impurity of a node. The average weighted Gini Impurity decreases as we move down the tree.
samples : The number of observation in the node.
value : The number of samples in each class. For example, the root node has a total of 8 samples and 4 samples to each class.
class : It indicates the class label for all samples in the leaf node. And indicate the class label with the majority for samples in the no-leaf node.
. . .
Prediction of a test sample
The prediction of the test sample is made by traversing the tree from a root node to a leaf node where the class is the prediction. Let’s make the prediction of the test sample, which are marked with blue colour in the below graph.
test_data = np.array([[8,7],[2,8],[3,1]]) predict = model.predict(test_data) for e in range(len(predict)): print("{} belongs to class {}".format(test_data[e],predict[e]))
This produce the following result:
[8 7] belongs to class 1 [2 8] belongs to class 0 [3 1] belongs to class 0
Here, class 1 represent the regular shape house and class 0 represent the irregular shape of house. Our Decision tree classifier has predicted all test samples correctly.
. . .