A Beginner's Guide to k Nearest Neighbour Classification

Charlie | July 5th 2018

We perform classification every second that our eyes are open. Whenever we look at something and decide what it is, that is classification. For example we might look at a chair and notice it has certain features; 4 legs, a surface to sit on and a back to rest on. First our brain extracts features from the information sent from our eyes. Next, our brain looks at these features and decides what it is. Over the course of our lives we have come to learn what a chair looks like and what makes a chair different to a table. This sounds simple but historically it is something that machines have found very hard to do accurately.

This article is going to describe how one algorithm, k nearest neighbour (kNN), classifies data. Similar to our own brain classifying objects, kNN uses features to perform classification. The features are used to plot all of the data points in a dataset so that the machine can visualise which points are close to one another. To predict the class of a data point, kNN inspects the class of its neighbours and adopts the same class as the majority of these neighbours.

A simple example of this is to classify male or female based on height and weight. The height and weight are the features of this dataset. kNN will look at the values for the height and weight of a data point and then find the k nearest data points to it for any given value, k. Next it counts the gender of these points and assigns the gender of the majority of its neighbours to the data point in question. This is demonstrated in the picture above, where the blue squares are females and the red triangles are males. The dark circle shows the 3 nearest neighbours, 2 male and 1 female - the class given by kNN where k is 3 would be male. However if you extend the circle to 5 then there are 3 female and still 2 male so the kNN classification where k is 5 is female.

This is a very simple but powerful algorithm that is beginning to be used by companies more and more. Stay tuned for more articles focused on data science. Empower yourself by understanding the algorithms that you are starting to engage with on a daily basis.

Read this post for a more general beginners guide to data science. If you think you’re an expert read this data scientists guide to docker.