Machine learning

4. Unsupervised Learning Cluster Analysis

1. Introduction
   In the real world, you can imagine that you may not always have access to the optimal answer. Or, maybe there isn't an optimal or correct answer. You would want the robot to be able to explore things on its own, and learn things just by looking for patterns.

2. K-Means Clustering
   We know that we are going to be trying to perform K-Means Clustering on data. So let's take a moment to visualize some data that we may get.

3. Hierarchical Clustering
   We are now going to talk about a different technique for building clustering known as: Agglomerative Clustering. If you have ever studied algorithms, you will recognize this as a greedy algorithm. We are going to be purposefully short sighted, and make what appears to be the best decision at the time.

4. Gaussian Mixture Models
   Gaussian Mixture Models are a form of density estimation. They give us an approximation of the probability distribution of our data. We want to use gaussian mixture models when we notice that our data is multimodal (meaning there are multiple modes or bumps). From probability, we can recall that the mode is just the most common value. For instance, a multi modal distribution can be seen below:

5. Customer Segmentation
   The vast majority of my articles are written with intention of highlighting some mathematical or computer science concept and tying it in to a real world example. In other words, we are operating as follows:

5. Hidden Markov Models

1. Hidden Markov Models Introduction
   This post is going to cover hidden markov models, which are used for modeling sequences of data. Sequences appear everywhere, from stock prices, to language, credit scoring, webpage visits.

2. Markov Models and The Markov Property
   What is the markov property?

3. Markov Models Example Problems
   We will now look at a model that examines our state of healthiness vs. being sick. Keep in mind that this is very much like something you could do in real life. If you wanted to model a certain situation or environment, we could take some data that we have gathered, build a maximum likelihood model on it, and do things like study the properties that emerge from the model, or make predictions from the model, or generate the next most likely state.

4. From Markov Models to Hidden Markov Models
   We are now going to extend the basic idea of markov models to hidden markov models. We have talked about latent variables before, and they will be a very important concept as we move forward. They show up in K-means clustering, Gaussian Mixture Models, principle components analysis, and many other areas. With hidden markov models, it even shows up in the name, so you know that hidden (latent) variables are central to this model.

5. Hidden Markov Model Calculations
   This appendix serves as an accompaniment to hidden markov models, discrete observations. We will go over calculations concerning:

6. Hidden Markov Models with Theano and TensorFlow
   In the last section we went over the training and prediction procedures of Hidden Markov Models. This was all done using only vanilla numpy the Expectation Maximization algorithm. I now want to introduce how both Theano and Tensorflow can be utilized to accomplish the same goal, albeit by a very different process.

7. HMM's with Continuous Observations
   At this point we are ready to look at the use application of Hidden Markov Model's to discrete observations. All that is meant by continuous observations is that what you observe is a number on a scale, rather than a symbol such as heads or tails, or words. This is an interesting topic in and of itself, because it allows us to think about:

8. HMM Applications
   We have now covered the vast majority of the theory related to HMM's. We have seen how they model the probability of a sequence, and can handle when those sequences deal with latent states. Additionally we saw how they can be extended to deal with continuous observations by incorporating the concept of a gaussian mixture model in place of the $B$ emission matrix. I want to take a post to go over a few different real world applications of HMM's and leave you with a few ideas of what else is possible.

6. Ensemble methods

1. Bias-Variance Trade-Off
   As we get started with ensemble methods, we will being by looking at the bias-variance trade-off. There are three key terms in particular that we are going to look at:

2. Bias-Variance Regression Demo
   Let's take a moment to start from the beginning. In any data generating process, we have what is a called a ground truth function, which we will call:

3. Bootstrap Estimation
   We previously looked at the bias-variance tradeoff and if you were thinking critically you may have wondered: "Could it be possible in some way to lower bias and variance simultaneously?"

4. Random Forest Algorithm
   We are now going to touch on the random forest algorithm, which touches on earlier concepts we have gone over.

5. AdaBoost Algorithm
   We are now going to talk about boosting, and introduce a realization of the boosting idea, the algorithm AdaBoost. It is currently still one of the most powerful ensemble methods in existence, and like the random forest it is considered a good off the shelf/plug and play model.

6. Summary
   We started off this series talking about the bias-variance trade-off. We showed that the error of any classification or regression model is a combination of bias, variance, and irreducible error. We then demonstrated that irreducible error can't be reduced, by bias and variance can! In the ideal situation both bias and variance are low. The main dilemma we saw was that as we decrease one, the other tends to increase. So, the idea that we found was that we want to find a happy medium where we are optimizing the test error. We learned than ensemble methods are a way to make the tradeoff, less of a tradeoff (i.e. attain low bias and low variance)!

7. Dimensionality Reduction

1. Dimensionality Reduction & Principal Component Analysis
   Principal component analysis is an incredibly powerful technique that all data scientists should be aware of in the plight against the curse of dimensionality. Before we dig into the mechanics of it, however, we need to define what the curse of dimensionality is to begin with.

8. Bayesian Machine Learning

1. Bayesian Inference
   So, you want to know about Bayesian Techniques and how they are utilized in Machine Learning? Maybe you have head about the Causal Revoluation and want to get a better understanding of the role these techniques played in getting there? Or, you have determined that your problem would be best solved Bayesian A/B Testing? Whatever your reasoning, you have come to the right place. However, before we dissect the techniques listed above, and many others, we need to determine two things:

2. Bayesian A/B Testing
   In this post we are going to discuss Bayesian Methods and their application to Bayesian A/B testing. In particular we will:

3. Bayesian Classifiers
   This is an article that I have been excited to write for quite some time! While the model and classification techniques are nothing new, Naive Bayes (and Bayes Nets generalizations) build upon all of the intuitions that I have laid out in past posts on probability theory, classic frequentist statistical inference, and bayesian inference. I highly recommend that you take some time to read them over if your background in statistics, probability, and bayesian inference is rusty (I will particularly be assuming the reader is comfortable with bayesian inference).

9. Time Series Forecasting

1. Time Series in Pandas
   If you have been following along with my posts you may have realized that something I hadn't spent a lot of time dealing with was time series and subsequent forecasting. I have dealt with sequences (both via Recurrent Neural Networks and Markov Models), but given vast amount of time series data that you can encounter in industry, this post is long over due.

2. Time Series Analysis
   I want us to now move on to learning about the main tool we will use in time series forecasting: the Statsmodels library. Statsmodels can be thoughts of as follow:


© 2018 Nathaniel Dake