[vc_row][vc_column][vc_column_text]K-Nearest Neighbours(KNN) is a very simple, easy to implement, versatile and one of the famous algorithm in Machine Learning field. KNN is a non-parametric supervised Machine Learning algorithm. KNN used for both classification and regression problem. The letter K represents the number of nearest neighbours considers in KNN. It is one of the core factor in KNN. Choosing the best value of K for your data points is a very challenging task.
The predictions are made for new observation by searching through the entire training dataset to find the closest similar neighbour. The computational complexity of KNN increases with the size of the training dataset. The distance measure can be Euclidean distance, Hamming distance, Manhattan distance and Minkowski distance. However, Euclidean distance is most used in KNN.
These above three distance measures are used for continuous variables. In order to fit the KNN model for categorical variables, the Hamming distance is used.
. . .
Pseudo Code of KNN
- Load the data
- Initialise the value of k i.e how many nearest neighbour data points should consider?
- To predict the class label for new observation, iterate through each training data.
-
- Calculate the distance between test observation and each row of training data by using a defined distance measure equation such as Euclidean distance, Hamming distance, Manhattan distance and Minkowski distance.
- Sort the calculated distance in ascending order.
- Get the top k rows from the sorted list of distance.
- Return the most frequent class of these rows as known predicted class.
-
. . .
Example
Let’s plot the training dataset to the demonstration of how the KNN algorithm works:
Now we will classify the new observation with green dot into purple and red class. Here, we will consider the three different values of K = 1,2 and 3 for prediction using the KNN algorithm.[/vc_column_text][/vc_column][/vc_row][vc_row equal_height=”yes”][vc_column width=”1/3″ css=”.vc_custom_1569404234592{padding-right: 0px !important;padding-left: 0px !important;}”][vc_single_image image=”1532″ img_size=”full” alignment=”center”][/vc_column][vc_column width=”1/3″ css=”.vc_custom_1569404243076{padding-right: 0px !important;padding-left: 0px !important;}”][vc_single_image image=”1533″ img_size=”full” alignment=”center”][/vc_column][vc_column width=”1/3″ css=”.vc_custom_1569404253048{padding-right: 0px !important;padding-left: 0px !important;}”][vc_single_image image=”1534″ img_size=”full” alignment=”center”][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]
. . .
Pros and Cons of KNN
Pros:
- Very easy to implement.
- No separate training phase required.
- Very useful to non-linear data.
Cons:
- Prediction Can be slow if the training samples are more.
- Computational is expensive because it store all training samples.
- Sensitive to the choosen value of parameter K.
. . .
[/vc_column_text][/vc_column][/vc_row]