As we progress in our understanding of the math surrounding machine learning, AI, and DS, there will be a host of linear algebra concepts that we are forced to reckon with. From PCA and it's utilization of eigenvalues and eigenvectors, to neural networks reliance on linear combinations and matrix multiplication, the list goes on and on. Having a very solid grasp on linear algebra is crucial to realizing how and why these algorithms work.
This notebook in particular is going to focus on the connection between the following:
These concepts are incredibly prevelant and linked to each other in beautiful ways, however, this link is generally missing in the way linear algebra is taught-particularly when studying machine learning. Before moving, on I recommend reviewing my notebook concerning vectors.
If you go to wikipedia, you can find the following definition regarding a linear combination:
A linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results. For example, a linear combination of $x$ and $y$ would be any expression of the form $ax + by$, where a and b are constants.
Now, this can be defined slightly more formally in regards to vectors as follows:
If $v_1,...,v_n$ is a set of vectors, and $a_1,...,a_n$ is a set of scalars, then their linear combination would take the form:
$$a_1\vec{v_1} + a_2\vec{v_2}+...+a_n\vec{v_n}$$
Where, it should be noted that all $\vec{v}$'s are vectors. Hence, it can be expanded as:
$$a_1\begin{bmatrix}
v_1^1 \\
v_1^2 \\
. \\
. \\
v_1^m
\end{bmatrix}
+
a_2\begin{bmatrix}
v_2^1 \\
v_2^2 \\
. \\
. \\
v_2^m
\end{bmatrix}
+
a_n\begin{bmatrix}
v_n^1 \\
v_n^2 \\
. \\
. \\
v_n^m
\end{bmatrix}
$$
where for generality we have defined $\vec{v}$ to be an $m$ dimensional vector. Notice that the final result is a single $m$ dimensional vector. So, for instance, in a simple case, we could have:
$$a_1\begin{bmatrix}
v_1^1
\end{bmatrix}
+
a_2\begin{bmatrix}
v_2^1
\end{bmatrix}
+
a_n\begin{bmatrix}
v_n^1
\end{bmatrix}
$$
$$a_1\begin{bmatrix}
v_1
\end{bmatrix}
+
a_2\begin{bmatrix}
v_2
\end{bmatrix}
+
a_n\begin{bmatrix}
v_n
\end{bmatrix}
$$
$$a_1v_1+a_2v_2+a_nv_n$$
And end up with a 1 dimensional vector, often just viewed as a scalar.
Now, this definition is good to have in mind, however we can make it a bit more concrete by expanding visually. For instance, if you have a pair of numbers that is meant to describe, a vector, such as:
$$\begin{bmatrix} 3 \\ -2 \end{bmatrix} $$We can think of each coordinate as a scalar (how does it stretch or squish vectors?). In linear algebra, there are two very important vectors, commonly known as $\hat{i}$ and $\hat{j}$:
Now, we can think of the coordinates of our vector as stretching $\hat{i}$ and $\hat{j}$:
In this sense, the vector that these coordinates describe is the sum of two scaled vectors:
$$(3)\hat{i} + (-2)\hat{j}$$Note that $\hat{i}$ and $\hat{j}$ have a special name; they are refered to as the basis vectors of the xy coordinate system. This means that when you think about vector coordinates as scalars, the basis vectors are what those coordinates are actually scaling.
Now, this brings us to our first definiton:
Linear Combination: Any time you are scaling two vectors and then adding them together, you have a linear combination. For example:
$$(3)\hat{i} + (-2)\hat{j}$$
Or, more generally: $$a\vec{v} + b \vec{w}$$
Where, above both $a$ and $b$ are scalars.
This can be seen visually:
And we can see that as we scale $\vec{v}$ and $\vec{w}$ we can create many different linear combinations:
This brings up another definition, span.
Span: The set of all possible vectors that you can reach with a linear combination of a given pair of vectors is known as the span of those two vectors.
So, the span of most 2-d vectors is all of space, however, if they line up then they span is a specific line. When two vectors do happen to line up we can say that they are linearly dependent, and one can be expressed in terms of the other. On the other hand, if they do line up, they are said to be linearly independent.
Linear transformations are absolutely fundamental in order to understand matrix vector multiplication (well, unless you want to rely on memorization). To start, let's just parse the term "Linear Transformation".
Transformation is essentially just another way of saying function. This is where the first bit of confusion can arise though if you are being particularly thoughtful about the process's-what exactly is a function? It is helpful to define it before moving forward.
Function
Generally, in mathematics we view a function as a process that take in an input and returns an output (this coincides nicely with the computer science view as well). It can be viewed as:
Or, expanded as: $$x_1, x_2, ... , x_n \rightarrow f(x_1, x_2, ... , x_n) \rightarrow y$$
This is how it is generally encountered, where anywhere from one to several inputs are taken in, and a single output is produced. However, let's define a function more rigorously:
A function is a process or a relation that associates each element $x$ of a set $X$, the domain of the function, to a single element $y$ of another set $Y$ (possibly the same set), the codomain of the function.
The important point to recognize from the above definition is that, while it is common for a function to map elements from a set $X$ to a different set $Y$, the two sets can be same. Hence, although it is not encountered quite as often in ML, a function can map $X \rightarrow X$.
Now, back to our term transformation; it is something that takes in inputs, and spits out an output for each one. In the context of linear algebra, we like to think about transformations that take in some vector, and spit out another vector:
$$\begin{bmatrix} 5 \\ 7 \end{bmatrix} \rightarrow L(\vec{v}) \rightarrow \begin{bmatrix} 2 \\ -3 \end{bmatrix} $$This is where we can see an example of a function that does not map to a different space necessarily, but potentially to itself. In other words, generally if we have a function that takes in two inputs, we end up with one output:
$$f(x,y) = z$$However, we can clearly see here that we take in two inputs (coordinates of the vector) and end up with two outputs (coordinates of the transformed vector).
So, why use the word transformation instead of function if they essentially mean the same thing? Well, it is to be suggestive of movement! The way to think about functions of vectors is to use movement. If a transformation takes some input vector to some output vector, we image that input vector moving over to the output vector:
And in order to think about the transformation as a whole, we can think about every possible input vector moving over to its corresponding output vector.
Key Point
We are now seeing that our general interpretation of a function can be expanded (via its full definition), to not simply taking in one or several inputs and producing a single output, but taking in one to several inputs, and producing one to several outputs. This expansion is key to recognizing the relationship between functions, linear transformations, and dot products.
Now let's pose the following question: If we were given the coordinates of a vector, and we then were trying to determine the coordinates of where that vector landed after being linearly transformed, how would we represent this?
$$\begin{bmatrix} x_{in} \\ y_{in} \end{bmatrix} \rightarrow ???? \rightarrow \begin{bmatrix} x_{out} \\ y_{out} \end{bmatrix} $$Well, it turns out that you actually only need to record where the two basis vectors land and everything else will follow from that! For example, let's consider the vector:
$$ \vec{v} = \begin{bmatrix} -1 \\ 2 \end{bmatrix}$$Where $\vec{v}$ can also be written as:
$$\vec{v} = -1 \hat{i} + 2 \hat{j}$$If we then perform some transformation and watch where all of the vectors go, the property (of linear transformations) that all gridlines remain parallel and evenly spaced has a really important consequence!
The place where $\vec{v}$ lands will be -1 times the vector where $\hat{i}$ landed, and 2 times the vector where $\hat{j}$ landed.
$$\text{Transformed } \vec{v} = -1 (\text{Transformed }\hat{i}) + 2 (\text{Transformed }\hat{j})$$In other words, it started off as a certain linear combination of $\hat{i}$ and $\hat{j}$, and it ended up as that same linear combination of where those two vectors landed! This means that we can determine where $\vec{v}$ must go, based only on where the two basis vectors land!
Because we have a copy of our original gridlines in the background, we can see where the vectors landed:
We see that our transformed $\hat{i}$ and $\hat{j}$ landed at:
$$ \text{Transformed }\hat{i} = \begin{bmatrix} 1 \\ -2 \end{bmatrix} \hspace{1cm} \hspace{1cm} \text{Transformed }\hat{j} = \begin{bmatrix} 3 \\ 0 \end{bmatrix} $$Meaning that our transformed $\vec{v}$ is:
$$\text{Transformed } \vec{v} = -1 \begin{bmatrix} 1 \\ -2 \end{bmatrix} + 2 \begin{bmatrix} 3 \\ 0 \end{bmatrix}$$$$\text{Transformed } \vec{v} = \begin{bmatrix} 5 \\ 2 \end{bmatrix}$$Now, the cool thing about this is that we have just discovered a way to determine where any transformed vector will land, only by knowing where $\hat{i}$ and $\hat{j}$ land, without needing to watch the transformation itself! We can write a vector with more general coordinates:
$$\begin{bmatrix} x \\ y \end{bmatrix}$$And it will land on $x$ times the vector where $\hat{i}$ lands, and $y$ times the vector where $\hat{j}$ lands:
$$ \begin{bmatrix} x \\ y \end{bmatrix} \rightarrow x \begin{bmatrix} 1 \\ -2 \end{bmatrix} + y \begin{bmatrix} 3 \\ 0 \end{bmatrix} = \begin{bmatrix} 1x + 3y\\ -2x + 0y \end{bmatrix} $$We can now get to a very key point:
This is all to say that a 2-dimensional linear transformation is completely described by just four numbers: the 2 coordinates for where $\hat{i}$ lands, and the 2 coordinates for where $\hat{j}$ lands. It is common to package these coordinates in a 2x2 grid of numbers, commonly referred to as a 2x2 matrix:
$$\begin{bmatrix}
1 & 3\\
-2 & 0
\end{bmatrix}$$
Here you can interprete the columns as where $\hat{i}$ lands and where $\hat{j}$ lands!
So, for just one more example, let's say we are given the 2x2 matrix representing a linear transformation:
$$ \begin{bmatrix} 3 & 2\\ -2 & 1 \end{bmatrix}$$And we are then given a random vector:
$$\begin{bmatrix} 5\\ 7 \end{bmatrix}$$If we want to determine where that linear transformation takes that vector, we can take the coordinates of the vector, multiply them by the corresponding columns of the matrix, and then add together what you get:
$$ 5\begin{bmatrix} 3 \\ -2 \end{bmatrix} + 7\begin{bmatrix} 2 \\ 1 \end{bmatrix}$$Let's now try and generalize this as much as possible. Assume our 2x2 matrix (the typographical representation of our linear transformation of the vector space-in other words, it is a convenient way to package the information needed to describe a linear transformation), is:
$$ \begin{bmatrix} a & b\\ c & d \end{bmatrix}$$If we apply this transformation to some vector:
$$ \begin{bmatrix} x \\ y \end{bmatrix}$$What do we end up with? Well it will be:
$$ x\begin{bmatrix} a \\ c \end{bmatrix} + y\begin{bmatrix} b \\ d \end{bmatrix} = \begin{bmatrix} ax + by\\ cx + dy \end{bmatrix}$$Now, what we just did-which has been intuitive and clear (or I should at least hope)-is frequently taught with no intuition whatsoever. It is referred to as matrix vector multiplication, and would written as follows:
$$\begin{bmatrix} a & b\\ c & d \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = x\begin{bmatrix} a \\ c \end{bmatrix} + y\begin{bmatrix} b \\ d \end{bmatrix} = \begin{bmatrix} ax + by\\ cx + dy \end{bmatrix}$$Where our matrix (linear transformation, a form of a function) is on the left, and the vector is on the right. This, when taught without the necessary background is incredibly confusing and leaves us to simply utilize rote memorization of symbol manipulation processes. When we view this transformation as a type of function that takes in a vector as input and yields an output:
$$\begin{bmatrix} 5 \\ 7 \end{bmatrix} \rightarrow L(\vec{v}) \rightarrow \begin{bmatrix} 2 \\ -3 \end{bmatrix} $$It begins to become far more intuitive why the order matters, and why the top number in the vector is multiplied by the first column of the matrix, and subsequently why the bottom number in the vector is multiplied by the second column in the matrix. It is simply due to:
- The way that vectors are, by convention, packaged and represented.
- The way that matrices are, by convention, packaged and represented.
In order for the process to perform isomorphically to the geometric interpretation, we must follow the conventions that have been laid out. However, if we only new the conventions, while we did not have the underlying intuition, then there would be very little meaning in what was going on.
We won't dig into it here, but as a quick note I wanted to touch on matrix multiplication. If we had the following two matrices:
$$\begin{bmatrix} a & b\\ c & d \end{bmatrix}$$$$\begin{bmatrix} e & f\\ g & h \end{bmatrix}$$And we wanted to the result of applying one, then the other, to a given vector:
$$\begin{bmatrix} a & b\\ c & d \end{bmatrix} \begin{bmatrix} e & f\\ g & h \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}$$We can first thinking about this a a composition of functions. What this would look like from a standard function perspective is:
$$f(g(x))$$Where we first run $x$ through function $g$, and then the result is run through function $f$. In our case, where the function is a matrix (linear transformation), we can say that the vector is first transformed via the matrix:
$$ \begin{bmatrix} e & f\\ g & h \end{bmatrix}$$Which would look like:
$$ \begin{bmatrix} e & f\\ g & h \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} ex + fy\\ gx + hy \end{bmatrix} $$And that resulting 2-d vector is then transformed via the second linear transformation:
$$ \begin{bmatrix} a & b\\ c & d \end{bmatrix} \begin{bmatrix} ex + fy\\ gx + hy \end{bmatrix} = \begin{bmatrix} a(ex + fy) + b(gx + hy)\\ c(ex + fy) + d(gx + hy) \end{bmatrix} $$Resulting in a final 2-d vector! We can now easily understand what the following property of matrices exists:
$$M_1M_2 \neq M_2M_1$$We know that because a matrix represents a transformation or function, that switching the order changes the order in which they are applied. In a more familiar case, that would mean:
$$f(g(x)) \rightarrow g(f(x))$$We know that the above is not true, and it follows that it is not true for matrices either!
Up until this point we have been dealing with transformations from 2-d vectors to other 2-d vectors, by the means of a 2x2 matrix. Dealing with non-square matrices and having an underlying intuition is crucial, especially when dealing with machine learning (specifically deep neural networks). More importantly, it should help bring our generalization around to include some of the functions we are more familiar with.
As we go through this, recall what was mentioned at the start of this notebook. There are functions that are dependent on several inputs, such as:
$$f(x,y) = z$$Or: $$f(x_1,x_2, ...,x_n) = z$$
Now, what is interesting about functions of that sort is that we start with multiple dimensions, and the result of our function process is a one dimensional output. Keep that in mind as we move forward.
Okay, so we can start by saying that it is perfectly reasonable to talk about transformations between dimensions, such as:
$$\begin{bmatrix} 2 \\ 7 \end{bmatrix} \rightarrow L(\vec{v}) \rightarrow \begin{bmatrix} 1 \\ 8 \\ 2 \end{bmatrix} $$Above, we have a 2-d input, and a 3-d output. In order to determine the transformation $L$, we just look at the coordinates of where the basis vectors land! For instance, it may be (left column is $\hat{i}$, right column is $\hat{j}$):
$$L = \begin{bmatrix} 2 & 0\\ -1 & 1 \\ -2 & 1 \end{bmatrix}$$Now, the matrix above which encodes our transformation has 3 rows and 2 columns, making it a 3x2 matrix. We can intuitively say that a 3x2 matrix has a geometric interpretation of mapping 2 dimensions to 3 dimensions. This is because the 2 columns mean that the input space has 2 basis vectors, and the 3 rows indicate that the landing spots for each of those 3 basis vectors is described with 3 separate coordinates.
Likewise, if we see a 2x3 matrix (2 rows, 3 columns):
$$\begin{bmatrix} 3 & 1 & 4\\ 1 & 5 & 9 \end{bmatrix}$$We know that the 3 columns indicate that we are starting in a space that has 3 basis vectors, and the 2 rows indicate that the landing spot for each of those 3 basis vectors is described with only 2 coordinates.
Now, we can complete the bridge from linear transformations to our standard functions. We have come across functions before of the form:
$$f(x,y) = z$$Where we take a 2 dimensional input, and produce a single 1 dimensional output. Well, linear transformations are capable of the same thing! Below, we take a 2-d input, and produce a 1-d output:
$$\begin{bmatrix} 2 \\ 7 \end{bmatrix} \rightarrow L(\vec{v}) \rightarrow \begin{bmatrix} 1.8 \end{bmatrix} $$1-dimensional space is just the number line, so a transformation such as the one above essentially just takes in 2-d vectors, and spits out a single number. A transformation such as this is encoded as a 1x2 matrix, where each column has a single entry:
$$L = \begin{bmatrix} 1 & 2\\ \end{bmatrix}$$The 2 columns represent where each of the basis vectors land, and each column requires just one number; the number where that basis vector landed on. This is a very interesting concept, with ties to the dot product.
In general, the dot product would be introduced as follows: we have 2 vector of the same length, and we take there dot product by multiplying the corresponding rows and adding the results:
$$\begin{bmatrix} 2 \\ 7 \\ 1 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ 8 \\ 2 \end{bmatrix} $$$$(2*1) + (7*8) + (1*2) = 60$$Now, this computation has a very nice geometric intepretation. Let's say we have two vectors, $\vec{v}$ and $\vec{w}$:
$$ \vec{v} = \begin{bmatrix} 4 \\ 1 \end{bmatrix} \hspace{1cm} \vec{w} = \begin{bmatrix} 2 \\ -1 \end{bmatrix} $$To think about the dot product between $\vec{v}$ and $\vec{w}$, think about project $\vec{w}$ onto the line that passes through the origin and the tip of $\vec{v}$:
Multiplying the length of the projected $\vec{w}$ by the length of $\vec{v}$, we end up with $\vec{v} \cdot \vec{w}$:
$$ \vec{v} \cdot \vec{w} = (\text{Length of projected } \vec{w})(\text{Length of }\vec{v})= \begin{bmatrix} 4 \\ 1 \end{bmatrix} \cdot \begin{bmatrix} 2 \\ -1 \end{bmatrix} $$Note: if the direction of the projection of $\vec{w}$ is pointing in the opposite direction of $\vec{v}$, that dot product should be negative.
So, when two vectors are generally pointing in the same direction their dot product is positive, when they are perpendicular their dot product is 0, and if they point in generally the opposite direction, their dot product is negative.
Note: Additionally I should mention that order doesn't matter. We will not going into the derivation here.
Why are these two views connected?
You may be sitting there at this point wondering:
"Why does this numerical process of matching coordinates, multiplying pairs, and adding them together, have anything to do with projection?"
Well, in order to give a fully satisfactory answer here, we are going to need to unearth something a little deeper here; duality. However, before we can get into that, we first need to talk about linear transformations from multiple dimensions to one dimension.
These are functions that take in a 2-d vector and spit out some number (a 1-d output):
$$\begin{bmatrix} 2 \\ 7 \end{bmatrix} \rightarrow L(\vec{v}) \rightarrow \begin{bmatrix} 1.8 \end{bmatrix} $$Linear transformations are much more restrictive that your run of the mill function with a 2-d input with a 1-d output. For example, a non-linear transformation (function) may look like:
$$f\Big( \begin{bmatrix} x \\ y \end{bmatrix} \Big) = x^2 + y^2 $$Now, recall from section 2 that we whenever we are performing a transformation, in order to determine the matrix that represents that transformation (remember: this operates isomorphically to a linear function), we just follow $\hat{i}$ and $\hat{j}$:
Only this time, each one of our basis vectors just lands on a number! When we record where they land as the columns of a matrix ($\hat{i}$ lands on 2 and $\hat{j}$ lands on 1), each of those columns just has a single number:
$$\text{Transformation Matrix:}\begin{bmatrix} 2 & 1 \end{bmatrix} $$This is a 1x2 matrix. Let's go through an example of what it means to apply one of these transformations to a vector. We will start by defining our vector, $\vec{v}$, to be:
$$\vec{v} = \begin{bmatrix} 4 \\ 3 \end{bmatrix}$$Let's say that we have a linear transformation that takes $\hat{i}$ to 1, and $\hat{j}$ to -2:
Recall that we can write our original vector $\vec{v}$ as linear combination of 4 times $\hat{i}$ plus 3 times $\hat{j}$:
$$(4)\hat{i} + (3)\hat{j}$$A consequence of linearity is that when we perform a linear transformation on $\vec{v}$, the transformed output can be written as:
$$\text{Transformed } \vec{v} = 4 (\text{Transformed }\hat{i}) + 3 (\text{Transformed }\hat{j})$$$$4(1) + 3(-2) = -2$$When we perform this calculation purely numerically, it is matrix vector multiplication:
$$\begin{bmatrix} 1 & -2 \end{bmatrix} \begin{bmatrix} 4 \\ 3 \end{bmatrix} = 4*1 + 3*-2 = -2 $$Where again, $\begin{bmatrix}1 & -2\end{bmatrix}$ is our transform, and $\begin{bmatrix}4 \\3\end{bmatrix}$ is our vector. Now, this numerical operation of multiplying a 1x2 matrix by a 2x1 vector feels just like taking the dot product of two vectors. The 1x2 matrix looks just like a vector that we tipped on its side. In fact, we can say that there is a nice association between 1x2 matrices and 2-d vectors:
$$\text{1 x 2 Matrices} \longleftrightarrow \text{2-d vectors}$$At this point it is worth noting again that a matrix is a representation of a linear transformation, or function. What is often confusing in linear algebra is when matrices and vectors begin to look the same. For instance, we could have a matrice that is 2x1, looking identical to a vector. That means that it is expect 1 input dimension, and will result in with 2 output dimensions. It is important to remain aware of what typographical objects are meant to represent matrices and vectors as we continue.
Back to where we were, the above association suggests something very cool form the geometric view:
There is some kind of connections between linear transformations that take vectors to numbers, and vectors themselves.
We can discover more surrounding this connection with the following example. Let's take a copy of the number line and place it diagonaly in space somehow, with the number 0 sitting at the origin:
Now, think of the 2-dimensional unit vector whose tip sits where the number 1 on the number line is.
We will call that vector $\hat{u}$. This vector plays a large role in what is about to happen. If we project 2 vectors straight onto this number line:
In effect, we have just defined a function that takes 2-d vectors to numbers. What is more, this function is actually linear, since it passes the visual test any line of evenly spaced dots must remain evenly spaced once landing on number line.
To be clear, even though the number line was embeded in 2-d space, the outputs are numbers, not 2-d vectors. Again, we should be thinking of a function that takes in two coordinates, and outputs a single coordinate:
$$\begin{bmatrix} x \\ y \end{bmatrix} \rightarrow L(\vec{v}) \rightarrow \begin{bmatrix} z \end{bmatrix} $$But, the vector $\hat{u}$ is a two dimensional vector living in the input space. It is just situated in such a way that overlaps with the embedding of the number line.
With this projection, we just defined a linear transformation from 2-d vectors, to numbers. This means that we will be able to find a 1x2 matrix that describes this transformation:
$$\begin{bmatrix} \text{Where } \hat{i} \text{ lands} & \text{Where } \hat{j} \text{ lands} \end{bmatrix}$$To find this 1x2 matrix, we can zoom in on our diagonal number line setup and think about where $\hat{i}$ and $\hat{j}$ each land, since those landing spots are going to be the columns of the matrix:
We can actually reason through this with a piece of symmetry; since $\hat{i}$ and $\hat{u}$ are both unit vectors, projecting $\hat{i}$ onto the line passing through $\hat{u}$, looks completely symmetric to projecting $\hat{u}$ onto the x-axis.
So, when we ask what number does $\hat{i}$ land on when it gets projected, the answer will be the same as what number $\hat{u}$ lands on when it gets projected onto the x-axis. But, projecting $\hat{u}$ onto the x-axis just means taking the x coordinate of $\hat{u}$:
We can use nearly identical reasoning when determining where $\hat{j}$ lands!
So, the entries of a 1x2 transformation matrix describing the projection transformation are going to be the coordinates of $\hat{u}$:
$$\begin{bmatrix} u_x & u_y \end{bmatrix}$$And computing that transformation for arbitrary vectors in space:
Requires multiplying that matrix by those vectors:
$$ \begin{bmatrix} u_x & u_y \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = u_x * x + u_y * y $$This is computationally identical to taking the dot product with $\hat{u}$!
$$ \begin{bmatrix} u_x \\ u_y \end{bmatrix} \cdot \begin{bmatrix} x \\ y \end{bmatrix} = u_x * x + u_y * y $$Key Point:
This is why taking the dot product with a unit vector can be interpreted as projecting a vector onto the span of that unit vector and taking the length.
Let's take a moment to think about what just happened here; we started with a linear transformation from 2-d space to the number line, which was not defined in terms of numerical vectors or numerical dot products. Rather, it was defined by projecting space onto a diagonal copy of the number line. But, because the translation was linear, it was necessarily described by some 1x2 matrix. And since multiplying some 1x2 matrix by a 2-d vector is the same as turning that matrix on its side and taking the dot product, this transformation was, inescapably related to some 2-d vector.
The lesson here is that any time you have one of these linear transformations, whose output space is the number line, no matter how it was defined, there is going to be some unique vector $\vec{v}$ corresponding to that transformation, in the sense that applying that transformation is the same as taking a dot product with that vector.
What just happened above in math is an example of duality. Loosely speaking, it is defined as:
Duality: A natural-but-surprising correspondence between two types of mathematical things.
For the linear algebra case we just saw, we would say the dual of a vector is the linear transformation that it encodes. And the dual of a linear transformation from some space to 1-dimension, is a certain vector in that space.
In summation:
The dot product is a very useful geometric tool for understanding projections, and for testing whether or not vectors generally point in the same direction.
The above statement is probably the most important thing to remember about the dot product. But at a deeper level:
Dotting two vectors together is a way to translate one of them into the world of transformations.