Bio
Welcome! Having made it to this blog, if you are at all interested in the background and motives of the writer, read on. Otherwise, make yourself a fresh cup of coffee, get comfortable, and dig in to a topic of your choice.

With that said, my name is Nathaniel Dake and I am creator of all of the content on this site. I am a lover of knowledge, deep understanding, communication of information, and solving complex problems. Every post on this blog is an attempt of mine to simultaneously do three things:

  1. Gain insight into how the mind solves problems (particular those of the type related to mathematics and computer science), where it get's stuck, and how it can be improved.
  2. Solidify these insights via writing, experimentation, visualizations, equations, etc-whatever is necessary to ensure a clear, lucid understanding.
  3. Share these insights with others. It is one thing to teach yourself, but another to teach others, each with their own idiosyncratic learning styles and cognitive strengths and weaknesses.

I do this because I find it extremely valuable. In the modern world, the ability to think and reason clearly, make use of different branches of mathematics and computer science, and communicate with others is incredibly powerful. In order to solve some of the most pressing issues of our time, all of the above will prove critical. With that said, I truly enjoy teaching and sharing intuitions and understandings-the different lenses through which I view the world-so this blog is certainly a bit of a guilty pleasure.

As far as my professional background is concerned, I have worked in a variety of roles ranging from Data Scientist/ML Engineer to Data/Software Engineer. The titles may vary, but on the whole I generally reside at the intersection of applied mathematics, software engineering, and large scale data systems. For example, I have:

  • Created machine learned models and deployed them to production, user facing environments.
  • Architected a serverless ETL process that handled the ingestion of +50 GB of data per day into optimized data stores.
  • Designed data transformation algorithms to ensure input to models was as clean and noise free as possible, allowing for robust predictions to be made.
  • Optimized compute heavy processes to reduce run times and increase performance (reducing server costs and increasing developer iterations per day).

There isn't a day that goes by that I am not pushed out of my comfort zone, learning, growing, and becoming a better problem solver and teammate.

Prior to working in industry I was a Computer Engineering Research Assistant at the University of Florida, which was preceeded by my time at Northeastern University, where I received my BS in Mechanical Engineering and Physics. If you want to know more you can reach out via my linkedin!


Popular Posts
To get an idea of how I think about problems and communicate information relating to complex subjects, check out some of my more popular posts.


What can I find in this blog?
During my time studying Data Science and Machine Learning, Software Development, Computer Science, Physics, and Mechanical Engineering, I have learned a lot about the best way that I learn. It is clear at this point that for a beginner, jumping right into a text book is rarely the best route to follow. The concepts, and more importantly just the language and terminology, will seem very difficult to comprehend, and most likely leave you discouraged that the material is simply outside of your grasp.

This is especially true in the fields of Data Science, Applied Mathematics, and Machine Learning, where several technical disciplines intertwine:

  • Statistics
  • Probability
  • Computer Science
  • Calculus
  • Linear Algebra
  • Information Theory

Recently, in an attempt to prevent discouragement and increase practical understanding, there has been a huge push to introduce The Top Down approach. School systems generally teach via a bottom up approach - giving students the small building blocks thats can be combined in the end to create a grand system. This leaves the learner wanting more, and often struggling to connect the dots of why is this small building is useful and why should I care?

The top down approach throws you right into the deep end, allowing you to work with premade algorithms and libraries, without fully understanding the math and intuitions behind the overall system. This, in my opinion, is a much better approach, but still leaves much to be desired. There are several drawbacks, but I will highlight two:

  1. Without understanding the mathematics and computer science of the techniques you are using, as well as understanding what you don't know, you will not only be unable to find an optimal solution (which is okay in some cases), but you are prone to being very far off the mark occassionaly, leading to large errors and poor outcomes.
  2. From the perspective of the data scientist, learning new libraries, tools, and techniques is a never ending journey. If you do not have a good understanding of the fundamental building blocks you will be caught up in the Red Queen Effect, forever needing to go faster simply to stay in place.

To prevent the above, I feel that balance is the key here. The goal of teaching should be to use real world examples to teach the mechanics of what is going on under the hood. Too often concepts that are taught are treated as black boxes, and rote memorization is used to get through. This worked for a time, but in the field of Data Science and Machine Learning, it will not suffice. The world is not one size fits all - it is messy, chaotic, and unclear. And that is my job as a Data Scientist - to bring clarity to a problem, and help find a resolution.

So, with that said, the central theme of this blog is:

To gain deep insights into our cognitive problem solving processes, determine sticking points, create a lattice work of mathematical concepts and intuitions, connected via real world examples.

Anyone can follow a basic process of predetermined steps and arrive at a solution. But we want to create intuitions of what is actually going on, so that if the situation was broken from its cookie cutter form we would be able to take that in stride and still make sense of based on the fundamental principles.


Content
The general content of this blog is arranged as follows:

  • Mathematics
    This section may very well be the most valuable on the blog. It contains incredibly rich posts, getting to the core of what mathematics is, how it can be used, and why it is so powerful. By being able to fully understand the posts in this section, you will certainly have an easier time learning new skills in the world of AI.

  • Machine Learning
    This section is the most broad by far consisting of Bayesian Techniques, Decision Trees, Probabilistic Graphical Models, Ensemble Methods, Unsupervised Learning, Hidden Markov Models, and Principle Component Analysis. Each post is a great way to get a good overview of their respective topic.

  • Deep Learning
    This section contains everything from the fundamentals of FeedForward Networks and Vanilla Backprop, to Modern Deep learning techniques such as adaptive learning rates and batch normalization, and finally Recurrent Neural Networks and their application to Natural Language Processing problems. Convolutional Neural Networks coming soon.

  • Artificial Intelligence
    Focusing mainly on Reinforcement learning, specfically the Explore Exploit Dilemma (and bayesian techniques), Markov Decision Processes, Dynamic Programming, Temporal Difference Learning, Q-Learning, and Approximation Methods. Reinforcement learning with deep learning techniques coming soon.

  • Natural Language Processing
    This contains the basics of how mathematical techniques can be applied to text. It was an incredibly powerful moment when I first saw how words could be placed within a mathematical framework and subsequently used to solve real problems. This is heavily related to the content related to recurrent neural networks in the deep learning section, and HMM's in the Machine Learning section.


An Individual Post
I should mention that each individual post (which is built from a jupyter notebook) contains what I personally feel is crucial to understand a given topic or my own cognitive processes and biases. Code samples are always written via python, and are mixed with custom visualizations, equations, pseudocode, and whatever else is needed to ensure a clear and effective transfer of knowledge.


© 2018 Nathaniel Dake