1. Introduction¶

In the real world, you can imagine that you may not always have access to the optimal answer. Or, maybe there isn't an optimal or correct answer. You would want the robot to be able to explore things on its own, and learn things just by looking for patterns.

There are two main types of unsupervised learning that we are going to talk about. The first is clustering. This is where instead of training on labels, we try and create our own labels. We will do this grouping together data that looks alike. There are two methods of clustering that we will talk about- K-means clustering and hierarchical clustering.

Next, because in machine learning we like to talk about probability distributions, we will go into Gaussian Mixture Models and Kernel Density Estimation. This is where we talk about how to learn the probability distribution of a set of data.

On interesting fact is that under certain conditions, Gaussian Mixture Models and K-Means Clustering are exactly the same. We will prove how this is the case.

2. What is Unsupervised Learning used for?¶

2.1 Learning the structure or probability distribution of the data¶

Density Estimation
The first thing that we can go over that unsupervised learning is used for is density estimation. This is an entire area of study in statistics. We already know that we use the probability density function to tell us the probability of a random variable. Density estimation is the process of taking samples of the random variable, and figuring out the probability density function. Once you learn the distribution of the variable, you can generate samples of the variable using that distribution. For example, you could learn the distribution of a shakespeare play, and then generate text that looks like shakespeare.

Latent Variables
Another example is latent variables. Often, we want to know about hidden or underlying causes of the data we are seeing. These can be thought of as latent, missing, or hidden variables. As an example, say you are given a set of documents, but you aren't told what they are. You could do clustering on them and discover that there are a few distinct groups in that set of documents. Then when you actually read some of the documents in the data set you can see that one set of the documents is romance novels, the other is childrens books, and so on. Often, the data set is so large that it is infeasible to look at the entire dataset yourself, so you need some way of summarizing the data like this. One example of this is topic modeling. Here the latent variable is the topic, and the observed variable is the words.

2.2 Dimensionality Reduction¶

Methods such as Principle Component Analysis (PCA) and Singular Value Decomposition (SVD) generally do this.