11 unsupervised learning
Machine Learning
♡ supervised learning 👉 ♡ unsupervised learning ♡ reinforcement learning
unsupervised learning is where your program has to find how the data relates to each other. there is no prior training
types of unsupervised learning
♠️ clustering ♠️ association
🎲 clustering
the k means (the distance-based) algorithm is used to try to classify the data into clusters. it will find k clusters (k is the number the user provides). the centroid defines a cluster.
🔎 centroid is a point at the center of the cluster
🎲 the how
first the centroids are randomly assigned. next each point in our dataset are assigned to a cluster. this is done by measuring nearness to the centroids. next the centroid locations are themselves updated by taking the mean value of points in the cluster
🎲 the SSE
how good was our clustering? by using the Sum of Squared Error. it is the sum of the difference between a point and the mean point in the cluster. a lower SSE means that points are closer to their centroids meaning you’ve got it right. we can split clusters with higher SSE into two clusters
🎲 bisecting k means
one way to increase our clustering is to use the bisecting k means method. we choose rhe cluster with the largest SSE, split and repeat until you get the required number of clusters
⚽️ exercise
see an implementation in python