Nathaniel Dake Blog

2. Exploring Inverse Functions, Exponentials, and Logarithms

Throughout fields ranging from mathematics and computer science, to physics and engineering, as well as economics and biology, frequently you will into the concept of the exponential and the logarithm. Often you may have memorized these concepts years ago, committed the rules for manipulating them to memory, and then simply treated them with an axiomatic esteem; they are felt to be fundamental givens rather than derivable from lower level principles. As with many things, this generally presents no problem. For instance, if I asked to you to simplify the following expression:

$$x^3 \cdot x^4 = ?$$

Surely you would arrive at:

$$x^3 \cdot x^4 = x^7$$

And if I asked someone who fancies themselves to be rather mathematically inclined, they may simplify the following:

$$log_{10}(100) = ?$$

To be:

$$log_{10}(100) = 2$$

If you just finished a course that deals specifically with exponentials and logarithms, you may even be able to reduce:

$$e ^{log_e(x)} = ?$$

To simply be:

$$e ^{log_e(x)} = x$$

But, most likely, you called upon a specific mechanical process that you had once memorized, and applied it to a very (relatively) familiar problem. Cognitively you were operating on a low level; shunting symbols around and follwing a set of steps (an algorithm) in order to arrive at an end result. There are several issues that can arise from having an approach that lives too much in the low level.

Why are you doing what you are doing?
To be clear, operating at a low level is an incredibly effective way to solve complex problems. Consider the following: You are presented with a nicely restricted problem, take a brief moment to analyze it, classify it into a bucket that is similar to a problem that you solved before, and then immediately jump into the lower level mechanical mode. Once there, you are recalling a set of steps that can be used to solve a certain problem, operating nearly automatically. This type of automatic association is talked about at length in Daniel Kanhemans Thinking, Fast and Slow$^1$. You are not in a period of deep contemplation; rather autopilot has taken over and you are performing manipulations in order to reach your end goal. But, why? Why are you performing the steps that you are in performing, in the specific order that you are performing them in?

The real world is more than the low level
In a real world environment, there is much more going on than a simple, well constrained test problem conveys. By spending too much time at the low level, or by being inefficient at transitioning from low to high level thinking, we risk missing critical insights into problems, and potentially may not be able to solve for them at all.

Thinking via analogy tends to break down at the extremes
If you encounter a problem that looks similar to something that you have seen before, you will implement the same process that you used to solve the previous problem. However, they almost never have the same conditions, assumptions, edge cases, etc; by nature you apply a process to a problem that is not the same as the one that you originally solved. This requires a high level consideration of what is different, and how to handle that difference.

The goal of this post is to show how essential having a high level perspective is when it comes to applying mathematics to the real world. In the process I will be performing a deep dive into the following concepts:

  • Exponentials
  • Logarithms
  • Inverse Functions

As you follow along, however, the focus should not only be on the topics presently discussed, but more deeply the paradigm of high and low level thinking, it's power, and how it applies to your own problem solving process.

1.1 Motivation

To begin, I want to start by offering a real world motivation from a recent experience of mine. I was working on a problem relating probabilities to plausibilities, specifically stepping through the proof of probability theory as an extension of logical reasoning. For the sake of this post the proof itself, and probability theory for that matter, is not particularly important (for those interested it can be found here).

What is important is the particular point that I hit a wall. For context, the entire proof was rather long, and I had been working through it over the course of several hours. My high level thinking was mainly drained, and I was relying on the automatic low level processes that I had learned over the years carry me through the proof. I finally arrived at the following:

$$p(\alpha) = \frac{1}{w(\alpha)^\frac{log(2)}{log(w(\alpha))}}$$

Several lines below the author casually stated that:

$$p(\alpha) = \frac{1}{2}$$

And I was at a bit of a loss as to how they arrived at this point. Let me be clear, this is not necessarily a complex simplification, and certainly not one that is difficult once broken down. A 10th grader who had recently worked with exponentials and logarithms would most likely have been able to handle it. The key insight to keep in mind here is that this small simplification was in the context of a very large proof (~20 pages). There was no test problem that stated: "Please simplify the following expression", immediately priming my associative memory for simplification processes related to exponentials and logarithms (see Thinking, Fast and Slow, Chapter 4). No, rather this was sneakily tucked in on page 17 where the reader is most certainly going to have nearly let their high level cognitive guard down. It requires a high level stepping in order to solve, especially if the mechanical process of exponential/logarithm simplification is not fresh in ones mind.

1.2 Insight into problem solving via Associative Activation and Priming

As I have briefly alluded to, psychologist Daniel Kanheman has extensively researched and written about the idea of associative activation, associative memory, and priming). I want to take a moment to briefly define both since I will reference them frequently in this post (and I encourage those interested to read Thinking, Fast and Slow).

Associative Activation: Ideas that have been evoked trigger many other ideas, in a spreading cascade of activity in your brain.
Associative Memory: A vast network of nodes, where each node is an idea, and each node links to many others.
Priming: Exposure to one stimulus influences a response to a subsequent stimulus.

Now you may be asking, "if this post is about inverse functions, exponentials, and logarithms, then why are we talking about psychology"? The reason is that if you dig into these psychological concepts, you will realize that the standard mode of problem solving that most people enter is not effective. We rely on different types of priming to activate portions of associative memory that will then help us solve a problem. What happens if this priming does not occur? Unfortunately, often all is lost.

As an example, let's look back to the motivating problem of this post. The first issue with this is that there is a different set of notation/symbols than one is used to seeing. Anyone who has taken a calculus course has most certainly dealt with the functions $ln(x)$ and $e^x$ rather extensively. There is a good chance that if the above simplification had been defined with that notation a priming would have occurred in my mind, activating my associative memory, which would have immediately pulled in the simplification rules relating to $ln$ and $e$. However, the slight notational change caused my associative memory to be nearly useless in this scenario. It was not triggered to pull in the necessary background information. A main reason for this error was that in the past I had memorized a very specific case ($ln$ and $e$) of a more general fundamental concept. You may think that the more general is the standard exponential and logarithm (with any base). In fact I am actually referring to one level up the hierarchy; the concept of inverse functions. Instead of memorizing that:

$$e^{ln(x)} = ln(e^x) = x$$

I should have internalized the concept of a function, what it does, and what it means to actually take the inverse of a function. This would have allowed me to make the necessary connections in that moment and not stall out. This is a common issue when problem solving via analogy.

2. Inverse Functions

We are now ready to discuss what exactly an inverse function is and to get a handle on some very useful visualizations. As I discussed in great detail in my post on composition of functions, a function very simply maps an input to a response/output (if this is not familiar please take a minute to read the previous post). So, we can write:

$$x = input$$$$y = output$$$$f = function$$$$y = f(x)$$

If we let $f(x) = 5x + 2$, graphically we then have:

In [1]:
import math
from math import log, e

import numpy as np
from scipy.stats import bernoulli, binom, norm
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rc, animation
from IPython.core.display import display, HTML

from _plotly_future_ import v4_subplots
import cufflinks
import plotly.plotly as py
import plotly
import plotly.graph_objs as go
from plotly.offline import iplot

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

sns.set(style="white", palette="husl")
sns.set_context("talk")
sns.set_style("ticks")
In [2]:
fig, ax = plt.subplots(figsize=(8,5))

plt.axhline(y=0, color='grey', alpha=0.2)
plt.axvline(x=0, color='grey', alpha=0.2)

lower_bound = -2
upper_bound = 10
composition_upper_bound = 25 
length = 2000

x = np.linspace(lower_bound, upper_bound, length)
y = 5 * x + 2 

plt.plot(x, y, lw=2, c=sns.xkcd_rgb["red"])
plt.title(r'f(x) = $5x + 2$ = y', pad="10")
ax.set_xlabel('X', fontsize=20)
ax.set_ylabel('Y', fontsize=20)
plt.show()

Recall from the previous post, we essentially mapped our input space from the line $y = x$ to the this function, $f$; visually this looked like:

In [3]:
fig, (ax1) = plt.subplots(1, 1, figsize=(8,4))

def original_func(x):
    return 5 * x + 2 

lower_bound = -2
upper_bound = 10
composition_upper_bound = 25 
length = 2000

x = np.linspace(lower_bound, upper_bound, length)
y_output = original_func(x)
y_orig = x

ax1.plot(x, y_orig, lw=2, c=sns.xkcd_rgb["green"])
ax1.plot(x, y_output, lw=2, c=sns.xkcd_rgb["red"])

func_arrow_square_1 = ax1.annotate(
    '',
    xy=(2, original_func(2)),
    xytext=(2, 2),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_1_inp, = ax1.plot(2, 2, 'og', zorder=5)
marker_1_resp, = ax1.plot(2, original_func(2), 'or', zorder=5)

func_arrow_square_2 = ax1.annotate(
    '',
    xy=(4, original_func(4)),
    xytext=(4, 4),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_2_inp, = ax1.plot(4, 4, 'og', zorder=5)
marker_2_resp, = ax1.plot(4, original_func(4), 'or', zorder=5)

func_arrow_square_3 = ax1.annotate(
    '',
    xy=(6, original_func(6)),
    xytext=(6, 6),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_3_inp, = ax1.plot(6, 6, 'og', zorder=5)
marker_3_resp, = ax1.plot(6, original_func(6), 'or', zorder=5)

ax1.set_xlabel('X', fontsize=20)
ax1.set_ylabel('Y', fontsize=20)
plt.legend(["Input to Function, $y=x$", "Output of Function, $f(x)=5x+2$"], loc="upper left", fontsize=15)
ax1.axhline(y=0, color='grey', alpha=0.2)
ax1.axvline(x=0, color='grey', alpha=0.2)
plt.show()

Where the input was the green line, $y=x$, and we then pass the values, $y$ to the function $f$ as input. $f$ transforms this input into a response, shown by the red line. With this visualization present, defining the inverse is incredibly straightforward! It is simply:

Inverse Function: A function that maps a transformed input back to its original form. In other words, it reverses a function.

In our graph above, the inverse function will be whatever can take the red line as an input and transform it back to the green line. In our case, since our function is $f(x) = 5x + 2$, the inverse can be defined as whatever can take the response, $f(x)$, and return $x$. We will define it is $f^{-1}$:

$$f^{-1}(f(x)) = x$$

In the present scenario that function happens to be:

$$f(x) = 5x + 2$$$$f(x) - 2 = 5x$$$$\frac{f(x) - 2}{5} = x$$$$f^{-1}(x) = \frac{x - 2}{5}$$

Where above, in the final step we replace $f(x)$ with $x$, and $x$ with $f^{-1}(x)$.

2.1 Inversion Visualization

Visually, we can see that this inverse function will map the function output, $f(x)$, back to the original input $x$:

In [5]:
fig, (ax1) = plt.subplots(1, 1, figsize=(8,4))

def inverse_func(x):
    return (x-2)/5 

lower_bound = -2
upper_bound = 10
composition_upper_bound = 25 
length = 2000

x = np.linspace(lower_bound, upper_bound, length)
y_orig = original_func(x)
y_output = inverse_func(y_orig)

ax1.plot(x, y_orig, lw=2, c=sns.xkcd_rgb["green"])
ax1.plot(x, y_output, lw=2, c=sns.xkcd_rgb["red"])

func_arrow_square_1 = ax1.annotate(
    '',
    xytext=(2, original_func(2)),
    xy=(2, 2),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_1_inp, = ax1.plot(2, 2, 'or', zorder=5)
marker_1_resp, = ax1.plot(2, original_func(2), 'og', zorder=5)

func_arrow_square_2 = ax1.annotate(
    '',
    xytext=(4, original_func(4)),
    xy=(4, 4),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_2_inp, = ax1.plot(4, 4, 'or', zorder=5)
marker_2_resp, = ax1.plot(4, original_func(4), 'og', zorder=5)

func_arrow_square_3 = ax1.annotate(
    '',
    xytext=(6, original_func(6)),
    xy=(6, 6),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_3_inp, = ax1.plot(6, 6, 'or', zorder=5)
marker_3_resp, = ax1.plot(6, original_func(6), 'og', zorder=5)

ax1.set_xlabel('X', fontsize=20)
ax1.set_ylabel('Y', fontsize=20)
plt.legend(
    [
        "Input to Function, $f(x) = 5x +2$",
        r"Output of Function, $f^{-1}(f(x)) = \frac{f(x)-2}{5}$ = x"], 
    loc="upper left",
    fontsize=13
)
ax1.axhline(y=0, color='grey', alpha=0.2)
ax1.axvline(x=0, color='grey', alpha=0.2)
plt.show()

It is important to keep in mind that above, the inverse function $f^{-1}(x)$ is not taking the standard input $y=x$; rather, it is taking the output of $f(x)$ (the green line) as input. This distinction is made clear in the visual below. On the right the input is $f(x)$, and the left it is simply $x$:

In [6]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18,5))

lower_bound = -2
upper_bound = 10
composition_upper_bound = 25 
length = 2000

x = np.linspace(lower_bound, upper_bound, length)
y_orig = original_func(x)
y_output = inverse_func(y_orig)
y_output_original = inverse_func(x)

ax1.plot(x, y_orig, lw=2, c=sns.xkcd_rgb["green"])
ax1.plot(x, y_output, lw=2, c=sns.xkcd_rgb["red"])

func_arrow_square_1 = ax1.annotate(
    '',
    xytext=(2, original_func(2)),
    xy=(2, 2),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_1_inp, = ax1.plot(2, 2, 'or', zorder=5)
marker_1_resp, = ax1.plot(2, original_func(2), 'og', zorder=5)

func_arrow_square_2 = ax1.annotate(
    '',
    xytext=(4, original_func(4)),
    xy=(4, 4),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_2_inp, = ax1.plot(4, 4, 'or', zorder=5)
marker_2_resp, = ax1.plot(4, original_func(4), 'og', zorder=5)

func_arrow_square_3 = ax1.annotate(
    '',
    xytext=(6, original_func(6)),
    xy=(6, 6),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_3_inp, = ax1.plot(6, 6, 'or', zorder=5)
marker_3_resp, = ax1.plot(6, original_func(6), 'og', zorder=5)

ax2.plot(x, x, lw=2, c=sns.xkcd_rgb["green"])
ax2.plot(x, y_output_original, lw=2, c=sns.xkcd_rgb["red"])

func_arrow_square_1 = ax2.annotate(
    '',
    xytext=(2, 2),
    xy=(2, inverse_func(2)),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_1_inp, = ax2.plot(2, 2, 'og', zorder=5)
marker_1_resp, = ax2.plot(2, inverse_func(2), 'or', zorder=5)

func_arrow_square_2 = ax2.annotate(
    '',
    xytext=(4, 4),
    xy=(4, inverse_func(4)),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_2_inp, = ax2.plot(4, 4, 'og', zorder=5)
marker_2_resp, = ax2.plot(4, inverse_func(4), 'or', zorder=5)

func_arrow_square_3 = ax2.annotate(
    '',
    xytext=(6, 6),
    xy=(6, inverse_func(6)),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_3_inp, = ax2.plot(6, 6, 'og', zorder=5)
marker_3_resp, = ax2.plot(6, inverse_func(6), 'or', zorder=5)

ax1.set_xlabel('X', fontsize=20)
ax1.set_ylabel('Y', fontsize=20)

ax1.legend(
    [
        "Input to Function, $f(x) = 5x +2$",
        r"Output of Function, $f^{-1}(f(x)) = \frac{f(x)-2}{5} = x$"
    ],
    loc="upper left",
    fontsize=13
)
ax1.axhline(y=0, color='grey', alpha=0.2)
ax1.axvline(x=0, color='grey', alpha=0.2)

ax2.legend(
    [
        "Input to Function, $f(x) = x$",
        r"Output of Function, $f^{-1}(x) = \frac{x-2}{5}$"
    ],
    loc="upper left",
    fontsize=13
)
ax2.axhline(y=0, color='grey', alpha=0.2)
ax2.axvline(x=0, color='grey', alpha=0.2)
plt.show()

To see this as one fluid function composition, let's look at the animation below:

In [48]:
func_1 = original_func
func_2 = inverse_func

# ZOOMED ANIMATION
lower_bound = -4
upper_bound = -1 * lower_bound
composition_upper_bound = upper_bound * 2 + upper_bound 
length = 2000

# Turn off interactive plotting
plt.ioff()                        

# Create figure and axis object   
fig = plt.figure(figsize=(10, 6), dpi=200)       
ax1 = plt.subplot(111)

# Add x and y axis lines
ax1.axhline(y=0, color='grey')
ax1.axvline(x=0, color='grey')

plt.tight_layout()

# Create x input space, plot line x = y
x = np.linspace(lower_bound, upper_bound, length)
y = x 
func_comp_y = func_comp(x)

# Create iterable input axes, as well as set color of response curve
ax_x, = ax1.plot(x, y, lw=3, c=sns.xkcd_rgb["soft green"], zorder=1)   
ax_func_1, = ax1.plot(0, 0, lw=3, c=sns.xkcd_rgb["red"], zorder=2)   
ax_func_2, = ax1.plot(0, 0, lw=3, c=sns.xkcd_rgb["red"], zorder=3)   

# Create markers
marker_x, = ax1.plot(lower_bound, 400, 'og', zorder=5)
marker_func_1, = ax1.plot(lower_bound, 400, 'or', zorder=5)
marker_func_2, = ax1.plot(lower_bound, 400, 'or', zorder=5)
# marker_exponentiated, = ax1.plot(lower_bound, 400, 'or', zorder=5)

offset = 0.5 # General offset

# ------------- Create arrow representing func_1 function---------------
func_arrow_func_1 = ax1.annotate(
    '',
    xy=(lower_bound, func_1(lower_bound)),
    xytext=(lower_bound, lower_bound),
    arrowprops=dict(facecolor='black', shrink=0.05),
)

# ------------- Create label for arrow, representing func_1 function ----------------
offset_func_1 = 0.5
epsilon = 0.000001
func_label_func_1 = ax1.annotate(
    func_1.__name__,
    xy=(lower_bound, func_1(lower_bound)/2),
    xytext=(lower_bound + offset_func_1, (func_1(lower_bound) - lower_bound)/2 + offset_func_1),
    arrowprops=dict(
        color='grey',
        arrowstyle="-",
        connectionstyle="angle3,angleA=0,angleB=-90"
    ),
    bbox=dict(boxstyle="square", alpha=0.1, ec="gray"),
    size=20,
)

# ------------- Create arrow representing func_2 function---------------
func_2_hide_coord = -10
func_arrow_func_2 = ax1.annotate(
    '',
    xy=(func_2_hide_coord, func_2_hide_coord),
    xytext=(func_2_hide_coord, func_2_hide_coord),
    arrowprops=dict(facecolor='black', shrink=0.05),
)

# ------------- Create label for arrow, representing func_2 function ----------------
offset_func_2 = 1
shift = 1
func_label_func_2 = ax1.annotate(
    func_2.__name__,
    xy=(func_2_hide_coord, func_2_hide_coord),
    xytext=(func_2_hide_coord+0.01, func_2_hide_coord),
    arrowprops=dict(
        color='grey',
        arrowstyle="-",
        connectionstyle="angle3,angleA=0,angleB=-90"
    ),
    bbox=dict(boxstyle="square", alpha=0.1, ec="gray"),
    size=20,
)


# Composition animation function
def animate_composition(current):
    if round(current, 5) < upper_bound:
        # Gathering x axis metrics
        x = np.linspace(lower_bound, current, length)
        func_1_of_x = func_1(x)

        # Set output curve, marker_x, marker_squared
        ax_func_1.set_data(x, func_1_of_x) 
        marker_x.set_data(current, current)
        marker_func_1.set_data(current, func_1_of_x[-1])

        # Set function arrow head and tail position
        func_arrow_func_1.set_position((current + epsilon, current))
        func_arrow_func_1.xy = (current, func_1_of_x[-1])

        # Label location, followed by label arrow head
        func_label_func_1.set_position((current + offset + epsilon, (func_1_of_x[-1] - current)/2 + offset))
        func_label_func_1.xy = (current, (func_1_of_x[-1] - current)/2 + current)
        
    elif round(current, 5) == upper_bound:
        # End of func 1, start of func 2
        func_arrow_func_1.remove()
        marker_x.remove()
        func_label_func_1.remove()
        
        x = np.linspace(lower_bound, current, length)
        func_1_of_x = func_1(x)
        
        # Updating squared curve to be input to negate function (setting color to green)
        marker_func_1.set_color("green")
        ax1.plot(x, y, lw=3, c=sns.xkcd_rgb["grey"])  
        ax1.plot(x, func_1_of_x, c=sns.xkcd_rgb["soft green"], linewidth=3)

    elif round(current, 5) > upper_bound and round(current, 5) < (upper_bound*3) :
        current -= upper_bound*2 
        
        # Gathering x axis metrics
        x = np.linspace(lower_bound, current, length)
        func_1_of_x = func_1(x)
        x_func_1_func_2 = func_2(func_1_of_x)

        # Set output curve, marker1, marker2
        ax_func_2.set_data(x, x_func_1_func_2) 
        marker_func_1.set_data(current, func_1_of_x[-1])
        marker_func_2.set_data(current, x_func_1_func_2[-1])

        # Set function arrow head and tail position
        func_arrow_func_2.set_position((current + 0.000001, func_1_of_x[-1])) # Arrow tail
        func_arrow_func_2.xy = (current, x_func_1_func_2[-1]) # Arrow head

        # Label location, followed by label arrow head
        func_label_func_2.set_position((current + offset + 0.000001, (x_func_1_func_2[-1] - current)/2 + offset - shift))
        func_label_func_2.xy = (current, (func_1_of_x[-1] - current)/2 + current)   

    return ax_x,

# Composition init function
def init_composition():
    ax1.set_xlim(lower_bound, upper_bound)                               
    ax1.set_ylim(-25, 25) 
    return ax_x,

""" Define steps and create animation object """
step = 0.025
# step = 0.05
steps = np.arange(lower_bound, composition_upper_bound, step)

# Shrink current axis by 20%
box = ax1.get_position()
ax1.set_position([box.x0, box.y0, box.width * 0.65, box.height])

# Put a legend to the right of the current axis
ax1.legend(
    (marker_x, marker_func_1),
    ['Input to function', 'Output of function'],
    loc='center left',
    bbox_to_anchor=(1, 0.5)
)

# # For rendering html video in cell
# gif_video = animation.FuncAnimation(
#         fig,
#         animate_composition,
#         steps,
#         init_func=init_composition, 
#         interval=50,
#         blit=True
#     )


# gif_video.save('initial_to_inverse.gif', writer='imagemagick')

# html_video = HTML(
#     animation.FuncAnimation(
#         fig,
#         animate_composition,
#         steps,
#         init_func=init_composition, 
#         interval=50,
#         blit=True
#     ).to_html5_video()
# )
# display(html_video)
plt.close()

It is clear that the inverse can be utilized in order to reverse a transformation performed on a given input via a specific function.

2.2 Inversion Mechanics

Visually this most likely makes sense, but it can be useful to break down the mechanics of how this actually works. It is helpful to think of inversion as the reversal of a set of steps. Imagine for a moment that you wanted to give instructions to a friend on how to get to your house (home $A$). One way of doing this, assuming you knew where they lived (home $B$), would be to give a list of $right$, $left$ and $straight$ instructions that if followed would allow them to navigate to your location successfully. For example, a set of instructions may look like:

Home $B$ $\rightarrow$ Home $A$
1. Right
2. Straight
3. Straight
4. Left
5. Left
6. Right

If they then wanted to get back to their own home, they could simply reverse the order of your original set of reverse/take the opposite of every instruction you had given. In the above example, we can visualize this as starting at $6$, inverting the instruction, and having that be the new $1$. So, in other words, $right$ gets mapped to $left$ and set as the new 1st instruction. The entire set would be:

Home $A$ $\rightarrow$ Home $B$
1. Left
2. Straight
3. Straight
4. Right
5. Right
6. Left

This process is essentially what is happening when taking a mathematical inverse (in fact, the above direction example can easily be mapped to the mathematical domain via letting $left = 180 ^{\circ}$, $right = 0 ^{\circ}$ and $straight = 90 ^{\circ}$). Consider our original example:

$$f(x) = 5x + 2$$

In this case we have an input $x$, are performing several transformations of the input domain, mapping it effectively to the output domain. More concretely, we are taking the input domain, $x$, scaling by $5$ (multiplication), and then shifting by 2 (addition):

$$x \rightarrow \text{scale by 5} \rightarrow \text{add 2} \rightarrow \text{output domain: } f(x)$$

When viewing our function, $f$, not simply as a hollistic transformation, but rather a composition of smaller transformations, the inverse begins to make much more sense. In order return to the original input domain $x$, from the output domain, $f(x)$, we must reverse the transformations that occured (just as when dealing with directions). This looks like:

$$f(x) \rightarrow \text{subtract 2} \rightarrow \text{Divide by 5} \rightarrow \text{original input domain: } x$$

An interesting thing to note is that even in this deconstructed view we are still performing mini inversions as we reverse transformations to end up at the original input domain. By subtracting 2 we are essentially performing the inverse of the add 2 operation. The same applies when dividing by 5, we are performing the inverse to multiplying by 5. So a way of thinking of an inverse function is that it is a composition of small inverse transformations, reversing that which was performed by the original function.

From a notational standpoint, we simply take the composition of mini inverse transformations and call it $f^{-1}(x)$:

$$f^{-1}(x) = \frac{x -2}{5}$$

And, for clarity, I find it helpful to have the parameter of $f^{-1}$ be $f(x)$:

$$f^{-1}(f(x)) = \frac{f(x) -2}{5} = x$$

Since that is what we are really trying to map back to the original input domain, $x$.

3. Exponentials

3.1 Repeated Multiplication of a Variable

Now that we have a solid understanding of what an inverse function is, we can move on to exponentials. We are going to build up our understanding one step at a time, and I really encourage you to let some of this section resonate with you over the course of a few days.

Often exponentiation is viewed as a fundamental mathematical operation, but in reality it is important to remember that it originally was created as a simple way to write repeated multiplication of a specific variable. Formally, it can be written as:

$${base}^n = \overbrace{base \times \dots \times base}^\text{n times}$$

Where if we were to let $n= 7$ and the $base$ be $b$, we would have the following equivalence:

$$b^7 = b \times b \times b \times b \times b \times b \times b$$

3.2 Abstractions and Hierarchical Thinking

Now, in a sense when we introduce a new notation (here the superscript representing the number of times that we perform a repeated multiplication of the base, $b$), we are in reality introducing a higher level abstraction, or put another way adding an additional level to our hierarchy of mathematical processes. This fact is something that is often not explained very well, and the cost's are immense. Relating back to the original focus of this post:

By not having a solid understanding of the foundational underpinnings, you have no hope that the intuitive representation you call upon from associative memory links to the necessary abstractions that you utilized.

Put another way, you need to be able to realize (automatically), that exponentiation can intuitively be thought of as repeated multiplication of a base. What often happens in mathematics is that the fact that this notation represents an abstraction is lost; it is treated as a fundamental, similar to addition. You may ask, why are abstractions necessary at all? Why even introduce this new notation and not simply use something such as:

$$\overbrace{b \times \dots \times b}^\text{n times}$$

Think of how often the concept of an exponential comes up in mathematics; a simple answer to our question is that it would be incredibly cumbersome to use the above notation to write out the concept of repeated multiplication of a base. However, there is a deeper principle behind this and that is abstraction. From the perspective of learning and cognitive science, what we have done was created a new pattern (exponentiation) that is made up of a prexisting pattern (multiplication).

In reality this is how the brain is designed to work; it is structured to think in a hierarchical fashion, and specifically evolved this way because we live in a hierarchical world$^2$. Look to language (characters which make up words, which make up sentences, which make up paragraphs, which in turn make up books and thoughts), or biology (molecules make up organic compounds, which in turn make up macromolecules, which make up cells, which make up tissues and organs, which finally make up living organisms)$^{3}$. This idea of abstracting out patterns to be composed of lower level patterns is incredibly prevelant in mathematics. My previous post on functions highlights this; the function $f(x) = 3x + 4$ can be viewed from a high level as a single function transformation, or from a lower level as two specific fundamental transformations, a scaling of the input domain by 3 and a shifting of the scaled input by 4.

Unfortunately, this abstraction is all to frequently either forgotten, or, more commonly, never truly learned and understood in the first place. What ends up happening is the following:

Instead of abstractions that are linked together, they are disconnected. This means that when dealing with a difficult problem where it would be valuable to relate the higher level abstraction of exponentiation back to that of the lower level multiplication, we can't. Now, this isn't to say that we cannot understand the relation or recall it if given a push in the right direction. We do have both concepts in our memory and can make that connection. The issue is that our associative memory will not automatically pull in the hierarchical chain on it's own; it requires priming. When trying to solve incredibly complex problems, how useful is it if all you can say is: "If only someone would tell me the steps I need to take and angle I need to view this problem from to solve it, then I could do it!". We need to actively ensure that lower level patterns are linked to the higher level abstractions or risk not being able to make the connections that allow for extraodinary problem solving skills.

3.3 Exponentiation Visualization

With that said, I want us to get a feel for the exponentiation process and it's subsequent visualization:

In [40]:
fig, (ax1) = plt.subplots(1, 1, figsize=(8,5))

def exponentiate_base(x, base=2):
    return base**x

lower_bound = 0
upper_bound = 5
composition_upper_bound = 25 
length = 2000

x = np.linspace(lower_bound, upper_bound, length)
y_output = exponentiate_base(x)
y_orig = x

ax1.plot(x, y_orig, lw=2, c=sns.xkcd_rgb["green"])
ax1.plot(x, y_output, lw=2, c=sns.xkcd_rgb["red"])

func_arrow_square_1 = ax1.annotate(
    '',
    xy=(3, exponentiate_base(3)),
    xytext=(3, 3),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_1_inp, = ax1.plot(3, 3, 'og', zorder=5)
marker_1_resp, = ax1.plot(3, exponentiate_base(3), 'or', zorder=5)

func_arrow_square_2 = ax1.annotate(
    '',
    xy=(4.1, exponentiate_base(4.1)),
    xytext=(4.1, 4.1),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_2_inp, = ax1.plot(4.1, 4.1, 'og', zorder=5)
marker_2_resp, = ax1.plot(4.1, exponentiate_base(4.1), 'or', zorder=5)

func_arrow_square_3 = ax1.annotate(
    '',
    xy=(5, exponentiate_base(5)),
    xytext=(5, 5),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_3_inp, = ax1.plot(5, 5, 'og', zorder=5)
marker_3_resp, = ax1.plot(5, exponentiate_base(5), 'or', zorder=5)

ax1.set_xlabel('X', fontsize=20)
ax1.set_ylabel('Y', fontsize=20)
plt.legend(["Input to Function, $y=x$", "Output of Function, $f(x)=2^x$"], loc="upper left", fontsize=15)
ax1.axhline(y=0, color='grey', alpha=0.2)
ax1.axvline(x=0, color='grey', alpha=0.2)
plt.show()

Now, I hope that there is an oddity concerning the above exponetial curve that sticks out. Clearly locations where $x$ has an integer value, $\{0, 1, 2, 3, 4, 5\}$, make intuitive sense when matched to the explanation we gave earlier about what an exponent was meant to represent; repeated multiplication. For instance, we know that our base is $2$, and if the value of the $x$ exponent is $3$:

$$2^3 = 2 \cdot 2 \cdot 2 = 8$$

Or when $x = 5$:

$$2^5 = 2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 = 32$$

But, clearly our curve is defined at non integer values as well. What exactly does it mean to raise $2$ to the exponent $3.486$? Put another way, what does it mean to multiply $2$ by itself $3.486$ times? We must be able to account for this scenario, or else our exponentation abstraction will lose it's continuity, and subsequently it's usefulness. We need to extend our understanding of an exponential from the discrete to the continuous case. Let's try and gain some intuition for what this really means.

3.4 Raising to a fraction

3.4.1 Can we extend our intuitive perspective?

We will start by seeing if we can extend this idea of exponentiation to repeated multiplication in the case of fractional exponents. Often when dealing with mathematical concepts we will encounter the idea of extension. For instance, there may be an intuitive situation that humans encounter where a mathematical principle fits nicely. Then, suddenly there are areas where the principle isn't quite as intuitive and breaks down. This is where the idea of extension comes into play; we extend the principle to include this new terrain.

For example, consider our number system. Originally designed with a base of ten due to the fact that we have 10 fingers, it could be used to count the number of chickens one had, how many soldiers were under command, and so on. Intuitively it was not terribly difficult to wrap ones head around. If you had 5 chickens and gave 3 to a friend, you only had 2 left (i.e. $5 - 3 = 2$). But with the advent of currency, the number system required an expansion. Now say that I had a 1,000 dollars but I needed 2,000 in order to secure a new home. I may borrow the additional 1,000 from someone, meaning my net balance is -1,000. The number system needed to be expanded in order to account for this; enter negative numbers. The same thing occured when complex numbers were established. The system needed to be expanded to account for the fact that it had no value for the quantity $\sqrt{-1}$, and yet it arose in areas such as electrical engineering.

This same principle applies in our present situation. While it may not be immediately intuitive, we need a way to extend our system. We will start by doing so via a logical progression. First, consider the following:

$$2^3 = 2 \cdot 2 \cdot 2 = 8$$

Above, we can see that we had $3$ repeated multiplications by $2$. Now, consider:

$$2^5 = 2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 = 32$$

Now we have $5$ repeated multiplications by $2$. So, we can view the exponent as the number of repeated multiplications we have by $2$ (our base). How about:

$$2^{\frac{1}{2}} = ?$$

Well, if we extend our thought process we may come to the conclusion that we now simply have half a multiplication by $2$. In other words, $2^{\frac{1}{2}}$ is the number that yields half a multiplication by $2$. In order to get a full multiplication by $2$, we must multiply that number twice:

$$2^{\frac{1}{2}} \cdot 2^{\frac{1}{2}} = 2^1$$

This can be shown via a simple example. Take the following equation:

$$5 \cdot 7 = 35$$

Now, I am going to define a variable $a$ to be equal to one half of a multiplication by $5$, ($a = 5^{\frac{1}{2}}$). Specifically, I am going to define $a = 2.23606797749979$. Now, based on our above reasoning, two half multiplications by $5$ should be equivalent to a single multiplication by $5$. Hence:

$$a \cdot a \cdot 7 = 35$$

Let's see how that looks via python. We can see that $a \times 7$ is equal to:

In [7]:
a = 2.23606797749979
a_times_seven = a * 7

display(f"a * 7 = {a_times_seven}")
'a * 7 = 15.652475842498529'

Now we can multiply that above value, $a \times 7$, times another $a$: $a \times (a \times 7)$

In [8]:
a_times_a_times_seven = a * a_times_seven

display(f"a * a * 7 = {a_times_a_times_seven}")
'a * a * 7 = 35.0'

And we can see that we end up at just about 35 (numerical precision errors may arise due to my setting of $a$). However, there is something that is not quite right about this intuitive extension. Let's look at the exact verbage I used when defining $5^{\frac{1}{2}}$: one half of a multiplication by $5$. I hope you can see that this is incredibily ambiguous; it could just as well represent $1 \times 5 \times \frac{1}{2}$. I don't think anyone can argue that that also is a way to think about one half of a multiplication by $5$. So, our intuitive phrasing has essentially lead us to an inconsistency:

$$a \cdot a \cdot 7 \neq 2 \cdot a \cdot 7 $$

We know that the above terms on the left and right are in fact not equivalent, however, unless we determine a better way to extend our concept of exponentiation, and how fractional exponents are handled, this is what we will be left with; an inconsistent extension.

3.4.2 Can we formally derive a solution without thinking outside the box?

Now, we can also take a more formal perspective. Recall our earlier definition of an exponential:

$$b^n = \overbrace{b \times \dots \times b}^\text{n times}$$

We have a simple extension to allow $n$ to take on negative values. We can write:

$$b^{-n} = \frac{1}{\underbrace{b \times \dots \times b}_\text{n times}}$$

Where we can see via an example that this holds:

$$\frac{b^3}{b^5} = b^{-2} = \frac{1}{b^2}$$

And we can let $b = 2$:

$$\frac{2^3}{2^5} = \frac{8}{32} = \frac{1}{4}$$$$\frac{2^3}{2^5} = 2^{-2} = \frac{1}{2^2} = \frac{1}{4}$$

Now, again we ask: what if $n$ is a fraction?

$$b^{\frac{1}{n}}$$

Well, let's say for a moment that this value, $b^{\frac{1}{n}}$, is equal to $x$:

$$b^{\frac{1}{n}} = x$$

If we raise each side to the $n$:

$$\big(b^{\frac{1}{n}}\big)^n = \overbrace{ b^{\frac{1}{n}} \cdot b^{\frac{1}{n}} \dots \cdot b^{\frac{1}{n}}}^\text{n times} = b = x^n$$$$b = x^n$$

Keeping in mind that our goal is to find $x$, the value that $b^{\frac{1}{n}}$ is equal to, we need to do the following:

Figure out what number multiplied by itself $n$ times is equal to $x$. This value is $b^{\frac{1}{n}}$.

3.4.3 Did that actually yield the intuitive understanding we were looking for?

Now, there may still be confusion about determing these values in practice. Sure, if you are given $8^{\frac{1}{3}}$ you may be able to see that $2 \cdot 2 \cdot 2$ yields $8$, and you have your answer. But what about the many scenarios where things are quite so simple? For instance, what about $2^{0.848}$. There are different ways of thinking about this, for instance can $0.848$ easily be written as a fraction? However, I want to take another approach right now. First, I want to point out that while this may feel uncomfortable-the fact that an answer doesn't come easily to mind is never fun-realize that it isn't that different from long division. What exactly do I mean by that?

Consider the following? You undoubtably feel very comfortable with the concept of division; simply how many times can one quantity fit into another. For instance, $\frac{30}{6}$ most likely gives no pause, with the number $5$ coming to mind instantly. However, what about $\frac{839234235789}{45689458}$? I would reason that a number doesn't exactly pop into your head. Long ago you were most likely taught the long division algorithm, but hopefully haven't had to use it in years due to having a calculator on hand. The key point is that there was a particular algorithm that was used to actually solve for the correct value. In rather basic cases, the algorithm is not necessary. But the point remains that a set of steps is used to the reach a concrete conclusion (a numeric output) to an abstract problem (how many times does one quantity go into another). Not knowing the algorithm (long division) doesn't hinder your understanding of the actual abstraction.

The above situation is very much like the one we are in now. We have a quantity that we want to determine, namely $2^{0.848}$. The basic scenario makes intuitive sense and is very easy to reason about (i.e. $2^4$ or $8^{\frac{1}{3}}$). However, it's extension is a bit harder to solve and will require more advanced reasoning. The point is the same as long division, that being that this isn't essential to know in order to be productive with exponentials. The key to really keep in mind is as follows:

There is some number that $2^{0.848}$ evaluates to. It is somewhere between $1$ and $2$. In order to solve it you most likely need an a more advanced extension.

Now, let's take a deep dive into that advanced extension!

3.4.4 Exponentials as repeated multiplication: A specific case of a general function

As stated several times already, we generally think of exponentials as repeated multiplication:

$$2^3 = 2 \times 2 \times 2$$

I want to define the function $f$ to represent the concept of exponentiation:

$$f(x \mid b) = b^x = \overbrace{b \times \dots \times b}^{\text{x times}}$$

And, for notational purposes I will from here on out drop the $b$ as a given in the function input; but do not forget that it is indeed there. Now, as discussed earlier, a consequence of our notation is that:

$$2^{3+5} = \overbrace{2 \times 2 \times 2}^{\text{3 times}} \times \overbrace{2 \times 2 \times 2 \times 2 \times 2}^{\text{5 times}}$$$$2^{3+5} = 2^3 \cdot 2^5$$

This can generally be written as:

$$2^{x+y} = 2^x \cdot 2^y$$

This is known as the exponential property. Now, what does this say about $f$ (our function that is representing exponentiation)? Well, $f$ clearly possesses the property:

$$f(x+y) = f(x) \cdot f(y)$$

Now, again, when $x$ is a negative or fractional number (i.e. $2^{\frac{1}{2}}$, $2^{-1}$), our intuitive understanding of exponentiation breaks down. What does it mean to multiply $2$ by itself half of a time?

To get around this we do something that is very common throughout mathematics: We extend the original definition, which only makes sense for counting numbers, to something that applies to all sorts of numbers. However, we do not do this randomly!

Fractional and negative exponent definitions are entirely motivated to be sure that the exponential property, $2^{x+y} = 2^x \cdot 2^y$, holds.

Now, let's consider what the above property means from the perspective of group theory (see appendix B). It is saying that adding the inputs, $x$ and $y$, corresponds to multiplying the outputs of the function, $2^x$ and $2^y$. To make this more clear, recall that from a group theory perspective, addition is defined as follows:

$$addition (\overbrace{x,y}^\text{adders}) = \overbrace{x + y}^\text{adder}$$$$addition(3, 4) = 3 + 4 = 7$$

We see that addition takes in two adders (shifts), and outputs an adder (another shift, the sum of the two adders):

Multiplication is defined as follows:

$$multiplication(\overbrace{x ,y}^\text{multipliers}) = \overbrace{x \cdot y}^\text{multiplier}$$$$multiplication(3, 4) = 3 \times 4 = 12$$

Where in this case multiplication takes in two multipliers (stretches) and outputs a multiplier (another stretch, the product of two multipliers):

So, in our case of exponentiation, we can think of our inputs to $f$ not simply as numbers, but as members of the additive group of sliding actions. Likewise, we think of the outputs of the function not merely as numbers, but as members of the multiplicative group of stretching and squishing actions:

Hence, exponentiation is really a function that satisfies the property discussed earlier:

$$f(\overbrace{x+y}^\text{adders}) = \overbrace{f(x) \cdot f(y)}^\text{multiplier}$$$$exponentiation(\overbrace{x+y}^\text{adders}) = \overbrace{b^x \cdot b^y}^\text{multiplier}$$

To be clear, our input is an adder because, as discussed in the appendix, $x+y$ is the addition of $x$ and $y$, which outputs an adder. Likewise, our output is a multiplier since it is the multiplication of $f(x)$ and $f(y)$, which is a multiplier. So, $f$ takes in an adder, $x+y$, and outputs a multiplier, $f(x) \cdot f(y)$

If this is all a lot to digest, let us try to break it down as follows. I want us to treat every real number simultaneously as three things:

  1. A point on the infinitely extending line.
  2. An adder. This is an action that slides the line along itself.
  3. A multiplier. This is an action that stretches or squishes the line.

To be clear, I am not saying that adders and multipliers are just ways to visualize what addition and multiplication do by moving around numbers on the line. Rather, we should think of numbers primarily as being movements of the pure unlabeled line), and associating them with points on the line is just a convenient way to the record the specific movement of each adder/multiplier). If we wanted to dig into the weeds, we could technically say that multipliers don't act on the line, rather they transform adders to adders. So, technically, we have points, functions of points (adders), and functions of functions of points (multipliers). Each number is an overloaded term referring to any one of these three things.

Now, for any function $f$ that satisfies our property $f(x+y) = f(x) \cdot f(y)$ (takes in an adder and outputs a multiplier), we will write it as:

$$b^x$$

Where:

$$b = f(1)$$

Now, this awful notation originates from the relationship that our function $f$ has with repeated multiplication, which originally defined exponentiation. However, as we extend the idea of exponentiation we see that repeated multiplication is simply a specific case within a more general framework. Note that the reason I describe this notation as awful is because of the priming that it triggers; it immediately causes one to think of repeated multiplication, and particularly that repeated multiplication defines exponentiation.

In reality repeated multiplication arises as a specific case of exponentiation and it's defining properties.

Note that historically we did in fact start thinking about exponentiation as repeated multiplication. So, if $x$ happens to be a counting number ($1,2,3,...$), we can write it as:

$$\overbrace{1+ \dots + 1}^\text{x times}$$

And hence the property $f(x+y) = f(x) \cdot f(y)$ implies:

$$f(x) = f \Big( \overbrace{1+ \dots + 1}^\text{x times} \Big) = \overbrace{ f(1) \dots f(1) }^\text{x times}$$

Where if $x = 5$:

$$f(5) = f \Big( 1 + 1 + 1 + 1 + 1 \Big) = f(1) \cdot f(1) \cdot f(1) \cdot f(1) \cdot f(1) = b \cdot b \cdot b \cdot b \cdot b = b^5$$

Now, one of these functions, $f$, is in some sense the most "natural", and we write it either as $exp(x)$ or $e^x$. Note, this just means that the number $e$ is the value of the special function at 1:

$$f(1) = e$$

In other words, using past terminology, at $f(1)$ the base is seen to be $e$. If it is unclear what I mean by special function, remember that currently we are just defining $f$ as any function that satisfies the property $f(x+y) = f(x) \cdot f(y)$. Many functions can do that, for instance:

$$2^x$$$$5^x$$$$\pi^x$$$$e^x$$

Of all of these functions, I am simply saying that one is more "natural", and that is $e^x$. $e$ is simply the value of this "natural" function when evaluated at $x=1$; i.e. $e$ is the base. Now, this particular function is defined explicitly with the following infinite sum:

$$e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots \frac{x^n}{n!} + \dots$$

For those unfamiliar or a bit rust with infinite sums and series, it is important to know that many functions can be defined by a given series. To make this clear, let's say I have function $g$, and it is defined as:

$$g(x) = x^2 + 2x^3 - 5$$

Clearly, the polynomial $x^2 + 2x^3 - 5$ defines $g$. Likewise, if we have our special function, $f$, which notationally we write as $e^x$, there may be no simple polynomial that can define it; in that case we can use a series.

Generally, this series is derived with a good deal of handwaving, and references that $e^x$ is it's own derivative (see appendix A). However, we will walk through a much more natural derivation in the next section. Before we do though, I would like to have a slight aside on mathematical function notation.

Function Notation

I have already mentioned my dislike of the notation $b^x$ for exponentials, but I think that it is worth digging into the concept of other special functions which are given their own unique notation. We can define any function we want via a generic variable, say $g$. We can let $g$ be:

$$g(x) = x^2 + 2x^3 - 5$$

As discussed in the previous section. Now, $g$ as defined above has nothing particularly special about it; However, there are certain types of functions which come up so frequently that it makes sense to provide a special and unique notation. For instance, if $g$ is defined as:

$$g(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} -\frac{x^7}{7!} + \dots = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n + 1)!}x^{2n + 1}$$

You may recognize that this is the series representation of the $sin$ function. Considering that the $sin$ function comes up frequently in mathematics, it makes sense to simply write $sin$, instead of the entire series above every time we want to use it. It is a useful abstraction; we are moving up one level in our hierarchy and it enables us to free up our brain to solve more challenging problems (instead of trying to dissect what series we are even dealing with). To be clear though, if you trying to evaluate the following expression via python or a calculator:

$$sin(x) \cdot 5x + 4x^2 - 10$$

The $sin$ portion will in fact be evaluated via the series representation:

$$\sum_{n=0}^{\infty} \frac{(-1)^n}{(2n + 1)!}x^{2n + 1} \cdot 5x + 4x^2 - 10$$

Again, it is just an abstraction. The same thing applies to $cos$, $tan$ and other trigonometric functions. Likewise, that same thing holds for our exponential function! Repeating the same argument from above, if I define function $g$ as:

$$g(x) = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots \frac{x^n}{n!} + \dots$$

That is a completely acceptable and appropriate function definition. However, if this particular $g$ is used very frequently, it may make sense to define an alternate notation and abstract the series away. The notation used for this particular function is $e^x$. Again, just as $sin$ was notation that we chose, $e^x$ is notation that we chose (for historical reasons). But, do not forget that it is an abstraction for the underlying series that actually defines it.

3.4.5 Constructing the series for $e^x$

Let's imagine for a moment that we didn't know the series defining $e^x$, and we were asked to define a function $f(x)$ with the functional property $f(x+y) = f(x) \cdot f(y)$ for all numbers $x$ and $y$. How could we go about constructing such a function? If we only knew how to add and multiply numbers, the only functions we can define explicitly are polynomials, so we will begin by assuming $f$ is of the form:

$$f(x) = c_0 + c_1x + c_2x^2 + \dots + c_nx^n$$

For some constants, $c_0,\dots,c_n$. To start, adding $0$ to the input should not change the output, which from our property means that:

$$f(x) = f(x+0) = f(x) \cdot f(0)$$

In order for the above equality to hold, $f(0)$ must equal $1$. Hence, if we evaluate $f(0)$:

$$f(0) = 1 = c_0 + c_1 \cdot 0 + c_2\cdot 0^2 + \dots + c_n \cdot 0^n$$$$c_0 = 1$$

Now, we can write our defining property $f(x+y) = f(x) \cdot f(y)$ in terms of our polynomial, but for now only look at terms with exponents $x \leq 1$:

$$1 + c_1(x + y)+ \dots = (1 + c_1x + \dots)(1 + c_1y + \dots)$$$$1 + c_1x + c_1y + \dots = 1 + c_1x + c_1y + \dots$$

So, no matter what $c_1$ is, each side will look the same to start (and then continue with some higher order terms). This means that we can freely choose what $c_1$ will be! Note that this will be the only choice we are allowed to make-from here on out we will be constrained. This choice will completely determine what exponential our function $f(x)$ will be (see 3.4.6). The number that feels the least arbitrary is $1$, so we can let $c_1 = 1$. We will come back to what would have happened if we chose a different value soon.

Next, we can expand our expression a little further in order to look at the quadratic terms ($x^2, xy, y^2$):

$$\dots + c_2(x + y)^2 + \dots = (1 + x + c_2x^2 + \dots)(1 + y + c_2y^2 + \dots)$$$$\dots + c_2x^2 + 2 c_2 xy + c_2y^2 + \dots = \dots + c_2x^2 + xy + c_2y^2 + \dots $$

Above our $x^2$ and $y^2$ terms are guaranteed to match up, since on either side they have a single $c_2$ coefficient, but for the $xy$ terms to match up on either side we must have:

$$2 c_2 xy = xy$$$$2 c_2 = 1$$$$c_2 = \frac{1}{2}$$

The be sure that we have found the correct pattern, we can look at the cubic terms ($x^3, x^2y, xy^2, y^3$):

$$\dots + c_3(x+y)^3 + \dots = (1 + x + \frac{1}{2}x^2 + c_3x^3 + \dots)(1 + y + \frac{1}{2}y^2 + c_3y^3 + \dots)$$$$\dots + c_3x^3 + 3c_3x^2y + 3c_3xy^2 + c_3y^3 + \dots = \dots + c_3 x^3 + \frac{1}{2}xy^2 + \frac{1}{2}x^2y + c_3y^3 + \dots$$

We can see that these terms will match up if $c_3 = \frac{1}{6}$:

$$3c_3x^2y = \frac{1}{2}x^2y$$$$3c_3 = \frac{1}{2}$$$$c_3 = \frac{1}{2 \cdot 3} = \frac{1}{6}$$

If we continue along with this process, we will see that each step we require:

$$n c_n = c_{n-1}$$

And hence, each term must be:

$$c_n = \frac{1}{n!}$$

With that said, in order to be sure that our series is in fact correct we would still have to verify that once forced into the above coefficients, every term will indeed matchup. In other words, how do we know that the $x^my^k$ term of $f(x+y)$ will always be the same as that of $f(x)\cdot f(y)$, no matter what the value of $m$ and $k$? Well, we can expand the polynomials of $f(x)$ and $f(y)$ and see that:

$$f(x) \cdot f(y) = (\dots + \frac{1}{m!} x^m+ \dots)(\dots + \frac{1}{k!}y^k + \dots) = \dots + \frac{1}{m!} \frac{1}{k!} x^m y^k+\dots $$

Where we know that the coeffiecient of the $m$th order term is $\frac{1}{m!}$ based on our above reasoning. Likewise, for $f(x + y)$ we can expand the expression:

$$f(x + y) = (x + y)^{m + k} = \dots + c_{mk}{m + k \choose m} x^m y^k + \dots = \dots + \frac{1}{(m+k)!}\frac{(m+k)!}{m!k!} x^m y^k + \dots = \dots + \frac{1}{m!k!} x^m y^k + \dots $$

Where $c_{mk}$ is defined to be $\frac{1}{n!}$, where $n = m + k$. Note that above we made use of the binomial expansion.

Now with all of that said, I want to stress the following:

We should be thinking of exponentials purely in terms of the property $f(x+y) = f(x) \cdot f(y)$, and not in terms of the infinite sum that we used to define them explicitly. The infinite sum should be used only for an explicit computation (which is exactly what python does when we interact with exponentials, as well as to prove that such a function satisfying our property even exists in the first place.

3.4.6 Chosing a different $c_1$

I mentioned that the value we chose for $c_1$ would completely define our exponential. We chose $c_1$ to be $1$. But what if we had chosen some other number? Well, chosing another number corresponds to defining an exponential function whose base is not $e$. For instance, we could chose to define $2^x$. How would we do this? Well, making use of the derivative of $2^t$ (see appendix A), and knowledge of a Taylor Series, it turns out that $c_1$ is really just equal to the natural log of our base! So, for $f(x) = e^x$:

$$c_1 = ln(e) = 1$$

And for $f(x) = 2^x$:

$$c_1 = ln(2)$$

And $f(x) = a^x$

$$c_1 = ln(a)$$

This value of $c_1$ is the propagated through the rest of the series, leaving us with:

$$f(x) = 1 + x\cdot ln(a) + \frac{x^2 \cdot ln(a)^2}{2!} + \frac{x^3 \cdot ln(a)^3}{3!} + \dots \frac{x^n \cdot ln(a)^n}{n!} + \dots$$

I want to repeat once again that there is nothing special about $e$ that allows for $c_1$ to equal $1$ in our series; rather, the value is defined because it is the special case where $c_1 = 1$, which is based upon the fact that the derivative of $e^x$ is simply $e^x$. These properties define the number $e$.

3.4.7 Exponential implementation in python

I mentioned that when performing explicit computation of exponentials, that python actually implements the function via its series expansion. However, I don't want you to just take my word for it; let's actually put together an implementation quickly. I will redefine the expansion below:

$$e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!} = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots $$

And our python function will essentially look like:

def taylor_series_exponential_base_e(x, upper):
    accumulator = 0
    for n in range(0, upper):
        accumulator += x**n / math.factorial(n)
    return accumulator

Let's see how it works in practice via taylor_series_exponential_base_e(3.48, 10), comparing to $e^{3.48}$:

In [50]:
def taylor_series_exponential_base_e(x, upper_n):
    accumulator = 0
    for n in range(0, upper_n):
        accumulator += x**n / math.factorial(n)
    return accumulator
In [51]:
display(f"Our taylor series exponential: {taylor_series_exponential_base_e(3.48, 10)}")
display(f"Native python exponential: {e**3.48}")
'Our taylor series exponential: 32.35631136853784'
'Native python exponential: 32.45972207586379'

Excellent, it is close to the value calculated by native python, and with $n$ only being equal to $10$. Let's see how it performs with $n=100$, taylor_series_exponential_base_e(3.48, 100):

In [52]:
display(f"Our taylor series exponential: {taylor_series_exponential_base_e(3.48, 100)}")
'Our taylor series exponential: 32.45972207586377'

We now have a nearly perfect approximation. We can see how our function approaches that of native python via the plot below:

In [53]:
num_n = 20
n = np.arange(num_n)

decimal_power = 3.48

built_in_python = math.e ** decimal_power
built_in_python = np.full(num_n, built_in_python)
taylor_series_exp = []
for i in n:   
    taylor_series_exp.append(taylor_series_exponential_base_e(decimal_power, i))
In [54]:
trace1 = go.Scatter(
    x=n,
    y=built_in_python,
    marker = dict(
        color = 'red',
    ),
    name="Built in Python"
)

trace2 = go.Scatter(
    x=n,
    y=taylor_series_exp,
    marker = dict(
        color = 'green',
    ),
    name="Taylor Series Approximation"
)

data = [trace1, trace2]
layout = go.Layout(
    showlegend=False,
    width=750,
    height=450,
    title=r"$\text{Built in Python Exponential vs. Taylor Series Approximation, } e^{3.48}$",
    xaxis=dict(title="n - Upper bound of Taylor series"),
    yaxis=dict(title="Output")
)

fig = go.Figure(data=data, layout=layout)

# plotly.offline.iplot(fig)
html_fig = plotly.io.to_html(fig, include_plotlyjs=True)
display(HTML(html_fig))

And we can also view this another way, by plotting the difference and see how it approaches $0$ as $n$ increases:

In [55]:
trace1 = go.Scatter(
    x=n,
    y=built_in_python - taylor_series_exp,
    marker = dict(
        color = 'red',
    ),
)

data = [trace1]
layout = go.Layout(
    showlegend=False,
    width=750,
    height=450,
    title=r"$\text{Difference Between Built in Python Exponential vs. Taylor Series Approximation, } e^{3.48}$",
    xaxis=dict(title="n (Upper bound of Taylor series)"),
    yaxis=dict(title="Difference")
)

fig = go.Figure(data=data, layout=layout)

# plotly.offline.iplot(fig)
html_fig = plotly.io.to_html(fig, include_plotlyjs=True)
display(HTML(html_fig))

Finally, we can see how our function performs when our exponent, $x$, is variable, for different upper bounds:

In [56]:
n = [5,10,15,20]
x = np.arange(11)

built_in_python = math.e ** x
taylor_series_exp = []
for i in n:
    series_for_specific_n = []
    for _x in x:
        series_for_specific_n.append(taylor_series_exponential_base_e(_x, i))
    taylor_series_exp.append(series_for_specific_n)
In [57]:
trace1 = go.Scatter(
    x=x,
    y=built_in_python,
    marker = dict(
        color = 'red',
    ),
    name="Built in Python"
)

trace2 = go.Scatter(
    x=x,
    y=taylor_series_exp[0],
    name=f"Taylor Series Approximation, n = {n[0]}"
)

trace3 = go.Scatter(
    x=x,
    y=taylor_series_exp[1],
    name=f"Taylor Series Approximation, n = {n[1]}"
)

trace4 = go.Scatter(
    x=x,
    y=taylor_series_exp[2],
    name=f"Taylor Series Approximation, n = {n[2]}"
)

trace5 = go.Scatter(
    x=x,
    y=taylor_series_exp[3],
    name=f"Taylor Series Approximation, n = {n[3]}"
)

data = [trace1, trace2, trace3, trace4, trace5]
layout = go.Layout(
    width=850,
    height=450,
    title="Built in Python Exponential vs. Taylor Series Approximation",
    xaxis=dict(title="n - Upper bound of Taylor series"),
    yaxis=dict(title="Output")
)

fig = go.Figure(data=data, layout=layout)

# plotly.offline.iplot(fig)
html_fig = plotly.io.to_html(fig, include_plotlyjs=True)
display(HTML(html_fig))

We can see very clearly that as we let $n \rightarrow \infty$ our series will converge that of built in python. With only 5 lines of code we have implemented taylor series expansion of $e^x$, it's analytical definition.

What is the point?

I want to take a moment to stop and discuss the reason for digging so deep into the exponential function in the past section. Remember, the goal of this entire post is to discuss the importance of being able to transition between low and high level thinking. Often when being taught mathematics (and problem solving in general), this is "achieved" via specific questions that are meant as primers. In the real world, we cannot expect to be provided with primers and must have a deep understanding of the concepts we are working with.

As we can see, the exponential can be viewed on many different levels. Those who can understand those levels, relate one to another, pull them from associative memory, and call upon them when needed have a tremendous advantage when it comes to solving open ended problems.

4. Logarithms

Having explored the depths of inverse functions and exponentiation, it is time to move on to logarithms. Now, logarithms have some wonderful properties and we will get to them all in due time. But first, it is important to build up a very intuitive understanding of what they actually are. What we are going to see is that logarithms are actually the inverse of the exponential function.

4.1 Logarithms as repeated divison via a base

To start, it is worth recalling our original mathematical definition of exponentiation:

$${b}^n = \overbrace{b \times \dots \times b}^\text{n times}$$

We can move this to the more familiar functional notation:

$$f(x) = {b}^x = \overbrace{b \times \dots \times b}^\text{x times}$$

Now, for the moment we are only going to consider the behavior of $f$ when dealing with integer values of $x$. Let's calculate $f$ for $x = \{1, 2, \dots, 9\}$, when $b=2$:

$x$ $2^x$ $f$ ($x$)
0 $2^0$ 1
1 $2^1$ 2
2 $2^2$ 4
3 $2^3$ 8
4 $2^4$ 16
5 $2^5$ 32
6 $2^6$ 64
7 $2^7$ 128
8 $2^8$ 256
9 $2^9$ 512
10 $2^{10}$ 1024

Thinking back to our discussion on inverse functions, I pose the following question:

If we know the output of the function how can we reverse our steps in order to find the specific input, $x$, that lead to it?

Let's look at the particular example of $f(x) = y = 32$. We can see from the table that in order to get $f(x) = 32$ then $x$ must be equal to $5$. We also know that our base is $2$. What $f$ essentially did was perform the repeated multiplication of $x$ $2$'s, where in this case $x = 5$:

$$2 \times 2 \times 2 \times 2 \times 2$$

So, what we really need to do is take our output, $32$, and divide by the base, $2$ until we reach out starting point (which in reality is $1$). So, that may look like:

$$\frac{\frac{\frac{\frac{\frac{32}{2}}{2}}{2}}{2}}{2} = 1$$

Another way of writing this that may be a bit more pleasant to the eye is:

$$32 \times \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} = 1$$

Where here we are performing repeated division. What is very important to acknowledge is the similarity between what we just did above, and the way that we found the inverse function earlier. Previously, we found the inverse function by taking the output and in reverse order performed the inverse of each operation in the original function $f$ applied to the input. We are doing the same thing here! Our function, $f$, takes an input $x$, and then multiplies $1$ by the base, $b$, $x$ times. In order to reverse that we simply need to take the output and divide by the base, $b$, until we get $1$. The number of divisions that it takes in order to reach $1$ is equal to $x$. The key take away from this is:

The logarithm is the inverse of the exponential function.

To put this a bit more formally, we can write the following:

$$f(x) = b^x = \overbrace{b \times \dots \times b}^\text{x times}$$$$f^{-1}(x) = x \times \overbrace{\frac{1}{b} \times \dots \times \frac{1}{b}}^{f^{-1}(x) \text{ times}}$$

Where, if the input to our inverse function is $f(x)$:

$$f^{-1}\big(f(x)\big) = f(x) \times \overbrace{\frac{1}{b} \times \dots \times \frac{1}{b}}^\text{x times} = x $$

Something that was a bit challenging for me to fully internalize was how to represent this function in a purely mathematical way. You may have noticed that the way I defined of $f^{-1}(x)$ actually is somewhat circular! Let's take a look again:

$$f^{-1}(x) = x \times \overbrace{\frac{1}{b} \times \dots \times \frac{1}{b}}^{f^{-1}(x) \text{ times}}$$

And let $x = 32$:

$$f^{-1}(32) = 32 \times \overbrace{\frac{1}{2} \times \dots \times \frac{1}{2}}^{f^{-1}(x) \text{ times}}$$

We want to know the original exponent that our base of $2$ was raised to in order to yield $32$, which in this case is $5$. However, the above function requires dividing by $2$, $f^{-1}(x)$ times; the issue is that we don't know $f^{-1}(x)$, it is the exact thing we are trying to solve for! So, how do we move forward? Well, we could count the number of times we divide by $2$, but representing it mathematically is slightly confusing at first. We will go through a mathematical representation and a python implementation.

First we could represent $f^{-1}(x)$ as:

$$ \begin{equation} f^{-1}(x) = \left\{ \begin{array} _n \; \; \;\; \; \; \; \; \; \; \; \; if \; \frac{x}{b} = 1 \\ f^{-1}(\frac{x}{b}) \; ; \; \; \; \; \; n = n + 1\\ \end{array} \right. \end{equation} $$

Where $n$ starts at $1$. This recursive definition may look a little strange at first, so we can walk through how it would work in our particular example. We would start by passing in $32$:

$$f^{-1}(32)$$

We would first check to see if $\frac{x}{b}$ was equal to $1$. In this case that is $\frac{32}{2} = 16$, and hence that condition is not hit. Hence, we move to the second statement, which says we call $f^{-1}$ passing in $\frac{x}{b}$, and we also increment $n$ (so it now has a value of $2$). In this case that means we are passing in $\frac{32}{2} = 16$. We can evaluate $f^{-1}(16)$ the same way as we did the original: we check to see if the input, in this case $x=16$ divided by $b$ is equal to $1$. It is equal to $8$ so this is not true. We move to the next line where we increment $n$ by 1 (so it's value is now $3$), and then call $f^{-1}$ with $8$ as an input. We continue this for two more iterations. When we finally pass in $2$, we see that $\frac{x}{b} = \frac{2}{2} = 1$, and our first condition is hit. This means that we return the value of $n$, which is currently $5$; our function works! It's recursive progression can be seen in the table below:

$x$ $\frac{x}{b}$ $n$ initial $n$ updated
32 16 1 2
16 8 2 3
8 4 3 4
4 2 4 5
2 1 5 NA

The equation above is one way of writing the logarithm in a purely mathematical notation. Note that while mathematical functions are not supposed to contain a state (which I am doing above) an algorithmic process used to find a particular value can in fact contain a state. Consider the algorithm used in long division! There is a great deal of intermediate state held in this process. We are simply doing the same thing above since the traditional mathematics notation does not have a great way to represent a sequence of unknown length, whose length can be found via an iterative process. So, from a technical standpoint my "function" above is more appropriately referred to as an algorithm. However, the important point to take away is the actual implementation.

A python implementation would look something like:

def log_base_2(x, n=1):
    b = 2.0
    if x / b == 1.0:
        return n
    n += 1
    return log_base_2(x/b, n)
In [89]:
def log_base_2(x, n=1):
    b = 2.0
    if x / b == 1.0:
        return n
    n += 1
    return log_base_2(x/b, n)
In [90]:
display(f"Python log base 2 implementation, evaluating log_2(32): {log_base_2(32)}")
'Python log base 2 implementation, evaluating log_2(32): 5'

And we can see that this function does indeed do the job. History has us write this function with the following notation:

$$log_2(x) = y$$$$log_2(32) = 5$$

Where it can be read as:

What number must our base, $2$ in this case, be raised to in order to yield $x$?

Notational Aside

As I mentioned earlier when dealing with exponentials, I am not a fan at all of the notation used to represent them; this disdain is amplified when it comes to logarithms. The idea that an inverse relationships would be represented by such disimilar notation, $b^x$ vs. $log_b(x)$, is simply careless if you ask me. A good explaination of the shortcomings can be found here, but I will go over it briefly. My issue arises from the fact that exponentials, roots, and logarithms and highly related, but their notational differences lead you to think nothing of the sort:

$$ \text{Exponential} \longrightarrow x^y = z$$$$ \text{Logarithm} \longrightarrow log_x(z) = y$$$$ \text{Root} \longrightarrow \sqrt[y]{z} = x$$

Let's use the example that I referenced often throughout this post, the fact that $2 \cdot 2 \cdot 2 = 8$. There are three separate ways to explain this relationship:

$$2^3 = 8$$$$log_2(8) = 3$$$$ \sqrt[3]{8} = 2$$

We have created three different ways to explain the same relationship. The exponential notation communicates via position, the root notation via a new symbol, and the logarithm notation via a word.

Mathematics is, in many ways, meant to make seemingly different facts look the same. However, in this case it takes three facts which should obviously be the same and makes them look artificially different.

The link above goes into ideas for better notation, and if you are interested I encourage you check it out. However, my intention here was just to highlight that the notation is very unsatisfactory and struggles to internalize it are normal. Making this fact clear hopefully will allow us to move on and work with what we have.

I think this is a great time to really solidify the idea of functional abstraction, which I touched upon earlier. We could in all reality define a function $f(x, y)$ to abstractly represent the multiplication of $x$ and $y$, where $f$ simply returns their product. However, multiplication is used so frequently that it made sense to create a symbol(s) to represent it: $x \times y, x \cdot y, (x)(y)$.

Likewise, we could simply define exponentation abstractly via the function $f(x)$, but it was so commonly used that we wanted to create a specific notation for it as well, $b^x$. The exact same thing occurred with logarithms. We could simply define the logarithm as $g(x)$, with the property that it is the inverse of exponentiaton: $g(b^x) = x$. Just as with exponentiation when I said that their were many functions $f$ that satisfied our property of $f(x+y) = f(x)f(y)$ (infinitely many, one for each base in the set of real numbers), the same applies to our $g$ for logarithms. Because the concept of inversing an exponential was so common, we provided (albeit awful) notation that would allow for easier manipulation.

4.2 Logarithm Visualization

At this point, as we did with exponentiation, I want us to get a feel for the logarithm process and it's subsequent visualization:

In [141]:
fig, (ax1) = plt.subplots(1, 1, figsize=(10,5))

def logarithm_base_2(x):
    return np.log2(x)

lower_bound = 0.0001
upper_bound = 10
composition_upper_bound = 25 
length = 2000

x = np.linspace(lower_bound, upper_bound, length)
y_output = logarithm_base_2(x)
y_orig = x

ax1.plot(x, y_orig, lw=2, c=sns.xkcd_rgb["green"])
ax1.plot(x, y_output, lw=2, c=sns.xkcd_rgb["red"])

func_arrow_square_1 = ax1.annotate(
    '',
    xy=(3, logarithm_base_2(3)),
    xytext=(3, 3),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_1_inp, = ax1.plot(3, 3, 'og', zorder=5)
marker_1_resp, = ax1.plot(3, logarithm_base_2(3), 'or', zorder=5)

func_arrow_square_2 = ax1.annotate(
    '',
    xy=(6, logarithm_base_2(6)),
    xytext=(6, 6),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_2_inp, = ax1.plot(6, 6, 'og', zorder=5)
marker_2_resp, = ax1.plot(6, logarithm_base_2(6), 'or', zorder=5)

func_arrow_square_3 = ax1.annotate(
    '',
    xy=(8.5, logarithm_base_2(8.5)),
    xytext=(8.5, 8.5),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_3_inp, = ax1.plot(8.5, 8.5, 'og', zorder=5)
marker_3_resp, = ax1.plot(8.5, logarithm_base_2(8.5), 'or', zorder=5)

ax1.set_xlabel('X', fontsize=20)
ax1.set_ylabel('Y', fontsize=20)
plt.legend(["Input to Function, $y=x$", "Output of Function, $f(x)=log_2(x)$"], loc="upper left", fontsize=12)
ax1.axhline(y=0, color='grey', alpha=0.2)
ax1.axvline(x=0, color='grey', alpha=0.2)

plt.show()
In [138]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,5))

def logarithm_base_2(x):
    return np.log2(x)

lower_bound = 0.0001
upper_bound = 10000
composition_upper_bound = 25 
length = 2000

x = np.linspace(lower_bound, upper_bound, length)
y_output = logarithm_base_2(x)
y_orig = x

ax1.plot(x, y_output, lw=2, c=sns.xkcd_rgb["red"])


lower_bound = 0.0001
upper_bound = 50
composition_upper_bound = 25 
length = 2000
x = np.linspace(lower_bound, upper_bound, length)
y_output = logarithm_base_2(x)
y_orig = x
ax2.plot(x, y_orig, lw=2, c=sns.xkcd_rgb["green"])
ax2.plot(x, y_output, lw=2, c=sns.xkcd_rgb["red"])


func_arrow_square_1 = ax2.annotate(
    '',
    xy=(25, logarithm_base_2(25)),
    xytext=(25, 25),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_1_inp, = ax2.plot(25, 25, 'og', zorder=5)
marker_1_resp, = ax2.plot(25, logarithm_base_2(25), 'or', zorder=5)

func_arrow_square_2 = ax2.annotate(
    '',
    xy=(45, logarithm_base_2(45)),
    xytext=(45, 45),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_2_inp, = ax2.plot(45, 45, 'og', zorder=5)
marker_2_resp, = ax2.plot(45, logarithm_base_2(45), 'or', zorder=5)

ax1.set_xlabel('X', fontsize=20)
ax1.set_ylabel('Y', fontsize=20)
plt.legend(["$f(x)=log_2(x)$"], fontsize=12)
ax1.axhline(y=0, color='grey', alpha=0.2)
ax1.axvline(x=0, color='grey', alpha=0.2)
ax2.axhline(y=0, color='grey', alpha=0.2)
ax2.axvline(x=0, color='grey', alpha=0.2)
ax2.set_xlabel('X', fontsize=20)
ax1.set_title("$f(x) = log_2(x)$, upper bound: 10,000")
ax2.set_title("$f(x) = log_2(x)$, upper bound: 50")
plt.show()

There are a few key things to note about the plots above. First, take direct note of the shape of the plot. We want to internalize how the logarithm grows as the input increase. As the input increases the logarithm does not grow linearly with it; rather, it's growth tapers. This can be directly contrasted with the exponential, whose growth increases as the input increases. It is this contrast that allows a logarithm to inverse an exponential (or, perhaps the more correct phrasing would be that it is the fact that the logarithm is the inverse of the exponential that provides this contrast).

Also, we can get a feel for how this growth tapers off. In the second plot (left) we can see that when our input is $10,000$ our logarithm only yields ~$13$ (a linear growth curve would yield $10,000$). When we discuss logarithmic complexity as it relates to computer science we will see why this is important.

Below, we can see what happens when our input to the logarithm is the output of the exponetial function:

In [161]:
fig, (ax1) = plt.subplots(1, 1, figsize=(10,5))

def logarithm_base_2(x):
    return np.log2(x)

lower_bound = 0.0001
upper_bound = 5
length = 2000

x = np.linspace(lower_bound, upper_bound, length)
y_orig = np.exp(x)
y_output = logarithm_base_2(y_orig)


ax1.plot(x, y_orig, lw=2, c=sns.xkcd_rgb["green"])
ax1.plot(x, y_output, lw=2, c=sns.xkcd_rgb["red"])

func_arrow_square_1 = ax1.annotate(
    '',
    xy=(3, logarithm_base_2(np.exp(3))),
    xytext=(3, np.exp(3)),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_1_inp, = ax1.plot(3, np.exp(3), 'og', zorder=5)
marker_1_resp, = ax1.plot(3, logarithm_base_2(np.exp(3)), 'or', zorder=5)

func_arrow_square_2 = ax1.annotate(
    '',
    xy=(4.1, logarithm_base_2(np.exp(4.1))),
    xytext=(4.1, np.exp(4.1)),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_2_inp, = ax1.plot(4.1, np.exp(4.1), 'og', zorder=5)
marker_2_resp, = ax1.plot(4.1, logarithm_base_2(np.exp(4.1)), 'or', zorder=5)

func_arrow_square_3 = ax1.annotate(
    '',
    xy=(5, logarithm_base_2(np.exp(5))),
    xytext=(5, np.exp(5)),
    arrowprops=dict(facecolor='black', shrink=0.05),
)
marker_3_inp, = ax1.plot(5, np.exp(5), 'og', zorder=5)
marker_3_resp, = ax1.plot(5, logarithm_base_2(np.exp(5)), 'or', zorder=5)

ax1.set_xlabel('X', fontsize=20)
ax1.set_ylabel('Y', fontsize=20)
plt.legend(["Input to Function, $y=x$", "Output of Function, $f(x)=log_2(x)$"], loc="upper left", fontsize=12)
ax1.axhline(y=0, color='grey', alpha=0.2)
ax1.axvline(x=0, color='grey', alpha=0.2)

plt.show()

It is simply transformed back the original $x$. In other words, the above plot demonstrates:

$$x = log_2(2^x)$$

And of course we could perform the reverse as well:

$$x = 2^{log_2(x)}$$

This inversion is further demonstrated via the animation below. We can clearly see the logarithm reverse the exponential function back to the original $x$:

In [ ]:
def natural_log(x):
    return np.log(x)

def exponentiate(x):
    return np.exp(x)

def func_comp(x):
    return exponentiate(natural_log(x))

func_1 = exponentiate
func_2 = natural_log

# ZOOMED ANIMATION
lower_bound = -4
upper_bound = -1 * lower_bound
composition_upper_bound = upper_bound * 2 + upper_bound 
length = 2000

# Turn off interactive plotting
plt.ioff()                        

# Create figure and axis object   
fig = plt.figure(figsize=(10, 6), dpi=200)       
ax1 = plt.subplot(111)

# Add x and y axis lines
ax1.axhline(y=0, color='grey')
ax1.axvline(x=0, color='grey')

plt.tight_layout()

# Create x input space, plot line x = y
x = np.linspace(lower_bound, upper_bound, length)
y = x 
func_comp_y = func_comp(x)

# Create iterable input axes, as well as set color of response curve
ax_x, = ax1.plot(x, y, lw=3, c=sns.xkcd_rgb["soft green"], zorder=1)   
ax_func_1, = ax1.plot(0, 0, lw=3, c=sns.xkcd_rgb["red"], zorder=2)   
ax_func_2, = ax1.plot(0, 0, lw=3, c=sns.xkcd_rgb["red"], zorder=3)   

# Create markers
marker_x, = ax1.plot(lower_bound, 400, 'og', zorder=5)
marker_func_1, = ax1.plot(lower_bound, 400, 'or', zorder=5)
marker_func_2, = ax1.plot(lower_bound, 400, 'or', zorder=5)
# marker_exponentiated, = ax1.plot(lower_bound, 400, 'or', zorder=5)

offset = 0.5 # General offset

# ------------- Create arrow representing func_1 function---------------
func_arrow_func_1 = ax1.annotate(
    '',
    xy=(lower_bound, func_1(lower_bound)),
    xytext=(lower_bound, lower_bound),
    arrowprops=dict(facecolor='black', shrink=0.05),
)

# ------------- Create label for arrow, representing func_1 function ----------------
offset_func_1 = 0.5
epsilon = 0.000001
func_label_func_1 = ax1.annotate(
    func_1.__name__,
    xy=(lower_bound, func_1(lower_bound)/2),
    xytext=(lower_bound + offset_func_1, (func_1(lower_bound) - lower_bound)/2 + offset_func_1),
    arrowprops=dict(
        color='grey',
        arrowstyle="-",
        connectionstyle="angle3,angleA=0,angleB=-90"
    ),
    bbox=dict(boxstyle="square", alpha=0.1, ec="gray"),
    size=20,
)

# ------------- Create arrow representing func_2 function---------------
func_2_hide_coord = -10
func_arrow_func_2 = ax1.annotate(
    '',
    xy=(func_2_hide_coord, func_2_hide_coord),
    xytext=(func_2_hide_coord, func_2_hide_coord),
    arrowprops=dict(facecolor='black', shrink=0.05),
)

# ------------- Create label for arrow, representing func_2 function ----------------
offset_func_2 = 1
shift = 1
func_label_func_2 = ax1.annotate(
    func_2.__name__,
    xy=(func_2_hide_coord, func_2_hide_coord),
    xytext=(func_2_hide_coord+0.01, func_2_hide_coord),
    arrowprops=dict(
        color='grey',
        arrowstyle="-",
        connectionstyle="angle3,angleA=0,angleB=-90"
    ),
    bbox=dict(boxstyle="square", alpha=0.1, ec="gray"),
    size=20,
)


# Composition animation function
def animate_composition(current):
    if round(current, 5) < upper_bound:
        # Gathering x axis metrics
        x = np.linspace(lower_bound, current, length)
        func_1_of_x = func_1(x)

        # Set output curve, marker_x, marker_squared
        ax_func_1.set_data(x, func_1_of_x) 
        marker_x.set_data(current, current)
        marker_func_1.set_data(current, func_1_of_x[-1])

        # Set function arrow head and tail position
        func_arrow_func_1.set_position((current + epsilon, current))
        func_arrow_func_1.xy = (current, func_1_of_x[-1])

        # Label location, followed by label arrow head
        func_label_func_1.set_position((current + offset + epsilon, (func_1_of_x[-1] - current)/2 + offset))
        func_label_func_1.xy = (current, (func_1_of_x[-1] - current)/2 + current)
        
    elif round(current, 5) == upper_bound:
        # End of func 1, start of func 2
        func_arrow_func_1.remove()
        marker_x.remove()
        func_label_func_1.remove()
        
        x = np.linspace(lower_bound, current, length)
        func_1_of_x = func_1(x)
        
        # Updating squared curve to be input to negate function (setting color to green)
        marker_func_1.set_color("green")
        ax1.plot(x, y, lw=3, c=sns.xkcd_rgb["grey"])  
        ax1.plot(x, func_1_of_x, c=sns.xkcd_rgb["soft green"], linewidth=3)

    elif round(current, 5) > upper_bound and round(current, 5) < (upper_bound*3) :
        current -= upper_bound*2 
        
        # Gathering x axis metrics
        x = np.linspace(lower_bound, current, length)
        func_1_of_x = func_1(x)
        x_func_1_func_2 = func_2(func_1_of_x)

        # Set output curve, marker1, marker2
        ax_func_2.set_data(x, x_func_1_func_2) 
        marker_func_1.set_data(current, func_1_of_x[-1])
        marker_func_2.set_data(current, x_func_1_func_2[-1])

        # Set function arrow head and tail position
        func_arrow_func_2.set_position((current + 0.000001, func_1_of_x[-1])) # Arrow tail
        func_arrow_func_2.xy = (current, x_func_1_func_2[-1]) # Arrow head

        # Label location, followed by label arrow head
        func_label_func_2.set_position((current + offset + 0.000001, (x_func_1_func_2[-1] - current)/2 + offset - shift))
        func_label_func_2.xy = (current, (func_1_of_x[-1] - current)/2 + current)   

    return ax_x,

# Composition init function
def init_composition():
    ax1.set_xlim(lower_bound, upper_bound)                               
    ax1.set_ylim(-4, 60) 
    return ax_x,

""" Define steps and create animation object """
step = 0.025
# step = 0.05
steps = np.arange(lower_bound, composition_upper_bound, step)

# Shrink current axis by 20%
box = ax1.get_position()
ax1.set_position([box.x0, box.y0, box.width * 0.65, box.height])

# Put a legend to the right of the current axis
ax1.legend(
    (marker_x, marker_func_1),
    ['Input to function', 'Output of function'],
    loc='center left',
    bbox_to_anchor=(1, 0.5)
)

# # For rendering html video in cell
gif_video = animation.FuncAnimation(
        fig,
        animate_composition,
        steps,
        init_func=init_composition, 
        interval=50,
        blit=True
    )

gif_video.save('exponential_to_logarithm_composition.gif', writer='imagemagick')

# html_video = HTML(
#     animation.FuncAnimation(
#         fig,
#         animate_composition,
#         steps,
#         init_func=init_composition, 
#         interval=50,
#         blit=True
#     ).to_html5_video()
# )
# display(html_video)
plt.close()

I also want to demonstrate how logarithms of different bases compare to eachother:

In [285]:
lower_bound = 0.001
upper_bound = 15
length = 2000

x = np.linspace(lower_bound, upper_bound, length)
y_b_e = np.log(x)
y_b_2 = np.log2(x)
y_b_10 = np.log10(x)

trace2 = go.Scatter(
    x=x,
    y=y_b_e,
    marker = dict(
        color = 'red',
    ),
    name=f"r$log_e(x)$"
)

trace1 = go.Scatter(
    x=x,
    y=y_b_2,
    name=f"r$log_2(x)$"
)

trace3 = go.Scatter(
    x=x,
    y=y_b_10,
    name="r$log_{10}(x)$"
)

data = [trace1, trace2, trace3]
layout = go.Layout(
    width=650,
    height=450,
    title="Logarithms with different bases",
    xaxis=dict(title="X"),
    yaxis=dict(title="Y")
)

fig = go.Figure(data=data, layout=layout)

# plotly.offline.iplot(fig)
html_fig = plotly.io.to_html(fig, include_plotlyjs=True)
display(HTML(html_fig))

We can see above that the larger the base, the more slowly the the logarithm grows. Why, intuitively, is that the case? As is often helpful with logarithms, let's look at their exponential counterparts. Say $x=10$ in our logarithm plot above, and hence in our exponential equation the value on the right will be 10:

$$2^? = 10$$$$e^? = 10$$$$10^? = 10$$

We are trying to figure out what power our bases need to be raised to in order to yield 10. It turns out the answer is:

$$2^{3.3219} = 10$$$$e^{2.3025} = 10$$$$10^1 = 10$$

Hence, the result of the logarithm in these cases would be:

$$log_2(10) = 3.3219$$$$log_e(10) = 2.3025$$$$log_{10}(10) = 1$$

Intuitively, the larger our base, the fewer number of times it must be multiplied in order to yield a particular value (in this case $10$). Hence, a logarithm with a smaller base will grow faster than a larger one.

4.2 Logarithms Extension

Now that we have gone over how to view logarithms in the context of reversing exponentiation (when only viewing exponentiation as repeated multiplication), a few things should be clear. First and foremost, logarithms are not your traditional type of mathematical function. Often we think of a function as a definitive procedure where we perform a unique set of consecutive transforms to our input variable. For instance:

$$f(x) = 5x + 3x^2 - 2$$

We take our input variable, $x$, scale it by 5, and then add that to its square scaled by 3, from which we substract 2. This gives us our transformed variable, $f(x)$. Our function transformation is complete when we have performed all operations. What makes logarithms feel a little different is the idea that you don't quite know when to stop; an internal state must be kept in mind in order to determine how many times we must divide our input, $x$, by our base, $b$, in order to get the original power $b$ was raised to, $y$.

However, if we take a more abstract and mature view of what a function really is, this is completely acceptable (and normal)! Again, I recommend reading my previous post on functions, but the key idea is that a function is just a transformation of an variable to a different domain. Clearly that is what is occuring here! In the case of:

$$log_2(32) = 5$$

We take in $32$ as input, and repeatedly divide by 2 until we get 1. The number of divisions is our output. Just like any sort of language or notation, it takes the brain some time to really internalize the pattern, so I suggest looking at it as many ways as possible and really giving a good effort to let it sink in.

4.2.1 A Defining Property

With that said, while we have now walked through how to think about logarithms in the discrete case (as they relate to integers), I am sure you are wondering how exactly they handle all real numbers. To show this, we are going to take a similar route to that of our exponential derivation: we will extend our definition. First and foremost, we are going to state that logarithms are defined by the following fundamental property:

$$f(xy) = f(x) + f(y)$$

In other words:

Logarithms map multiplication into addition.

From the group theory perspective, the logarithm maps the multiplicative group to the additive group. Now, based on this property, and the fact that the logarithm is the inverse of the exponential, how could we go about deriving it's series representation?

Well, to start, I am going to assume that we are using the natural logarithm, $ln = log_e$, going forward. This will allow for a slightly cleaner derivation. Next, I want to dig into the derivative of the natural log function:

$$\frac{d(ln(x))}{dx} = ?$$

We can use the chain rule and a few logarithm/exponetial properties to derive this:

$$e^{ln(x)} = x$$$$\frac{d(e^{ln(x)})}{dx} = \frac{d(x)}{dx} = 1$$$$e^{ln(x)} \big[ \frac{d(ln(x))}{dx} \big] = 1$$$$x\big[ \frac{d \big(ln(x) \big)}{dx} \big] = 1$$$$\frac{d\big (ln(x) \big)}{dx} = \frac{1}{x}$$

Great, so we have solved for the derivative of $ln(x)$, proving that it is simply equal to $\frac{1}{x}$. Now, what can this help us with? Well, recall that the area under the curve of the derivative of a function, $g'$, yields the original function, $g$ (see my post on the fundamental theorem of calculus). If we place bounds on our integral, we see that:

$$\int_a^b f(x)dx = F(b) - F(a)$$

So, we can take the integral as follows:

$$ln(x) = \int_a^b \frac{1}{t} = ln(b) - ln(a)$$

Where I am simply inserting $t$ in order to leave $x$ available as our input variable. And what to do in regards to the bounds? Well, if we want to just end up with $ln(x)$, we can utilize the fact that $ln(1) = 0$, so our lower bound should be $1$. Additionally, based on the fundamental theorem of calculus, we know that the upper bound should be $x$:

$$ln(x) = \int_1^x \frac{1}{t} = ln(x) - ln(1) = ln(x)$$

You may be wondering, "why is the above function a logarithm"? The answer is because is satisfies the fundamental property of the logarithm: $f(xy) = f(x) + f(y)$. This is exactly the same type of situation we dealt with when extending the exponential. Let's quickly prove that it does in fact satisfy our property. To start, I want to let $ln$ be represented by $F$:

$$F(x) = ln(x) = \int_1^x \frac{1}{t} dt$$

Where we know the derivative of $F$ to be:

$$F'(x) = \frac{1}{x}$$

I will then let:

$$h(x) = F(ax)$$

And subsequently $h'$ is:

$$h'(x) = F'(ax) \frac{d(ax)}{dx} = \frac{1}{ax}a = \frac{1}{x}$$

So, $F(ax)$ and $F(x)$ have the same derivative, and hence $F(ax) - F(x)$ has a derivative of $0$, meaning it has no rate of change (it is a constant function). In other words:

$$F(ax) - F(x) = constant$$

If we then let $x = 1$:

$$F(a \cdot 1) - F(1) = constant = F(a) - 0 = F(a)$$$$constant = F(a)$$

Hence, we must have:

$$F(ax) - F(x) = F(a)$$

Or, equally:

$$F(ax) = F(x) + F(a)$$

We can then replace $F$ with $ln$, proving that equation does indeed satisfy our fundamental property:

$$ln(ax) = ln(x) + ln(a)$$

4.2.2 Analytical Representations

Now, based on this representation of $ln(x)$ we can show that the series representation is equal to:

$$ln(x) = \sum_{k=1}^{\infty} \frac{(-1)^{k-1}(x-1)^k}{k}$$

However, it can be shown that this series converges to the $ln$ function only in the region $-1 \leq x \leq 1$. Outside of this region the approximations get worse and worse. We can see this via a quick implementation:

def taylor_series_ln_func(x, upper_n):
    accumulator = 0
    for n in range(1, upper_n):
        val = ((-1)**(n-1) * (x - 1)**n) / n
        accumulator += val
    return accumulator
In [269]:
def taylor_series_ln_func(x, upper_n):
    accumulator = 0
    for n in range(1, upper_n):
        val = ((-1)**(n-1) * (x - 1)**n) / n
        accumulator += val
    return accumulator
In [270]:
n = [3,4,5]
x = np.linspace(0.001,5,1000)

built_in_python = np.log(x)
taylor_series_ln = []
for i in n:
    series_for_specific_n = []
    for _x in x:
        series_for_specific_n.append(taylor_series_ln_func(_x, i))
    taylor_series_ln.append(series_for_specific_n)
In [271]:
trace1 = go.Scatter(
    x=x,
    y=built_in_python,
    marker = dict(
        color = 'red',
    ),
    name="Numpy ln"
)

trace2 = go.Scatter(
    x=x,
    y=taylor_series_ln[0],
    name=f"Taylor Series Approximation, n = {n[0]}"
)

trace3 = go.Scatter(
    x=x,
    y=taylor_series_ln[1],
    name=f"Taylor Series Approximation, n = {n[1]}"
)

trace4 = go.Scatter(
    x=x,
    y=taylor_series_ln[2],
    name=f"Taylor Series Approximation, n = {n[2]}"
)

data = [trace1, trace2, trace3, trace4]
layout = go.Layout(
    width=850,
    height=450,
    title="Numpy Natural Log vs. Taylor Series ln Approximation",
    xaxis=dict(title="x"),
    yaxis=dict(title="Output")
)

fig = go.Figure(data=data, layout=layout)

# plotly.offline.iplot(fig)
html_fig = plotly.io.to_html(fig, include_plotlyjs=True)
display(HTML(html_fig))

Well that clearly isn't ideal if we want a nice implementation! How else can we calculate the natural log with greater precision? A key insight is to take advantage of the relation to the exponential! If we do that, we can find the following property:

$$ln(x) = \lim_{n \rightarrow \infty} n(x^{\frac{1}{n}} - 1)$$

And an implementation in python would look like:

def ln_property_func(x, n):
    return n * ((x ** (1 / n)) - 1)
In [79]:
def ln_property_func(x, n):
    return n * ((x ** (1 / n)) - 1)
In [80]:
n = [3,10,50]
x = np.linspace(0.001,5,1000)

built_in_python = np.log(x)
ln_property_list = []
for i in n:
    ln_property_for_specific_n = []
    for _x in x:
        ln_property_for_specific_n.append(ln_property_func(_x, i))
    ln_property_list.append(ln_property_for_specific_n)
In [276]:
trace1 = go.Scatter(
    x=x,
    y=built_in_python,
    marker = dict(
        color = 'red',
    ),
    name="Numpy ln"
)

trace2 = go.Scatter(
    x=x,
    y=ln_property_list[0],
    name=f"Taylor Series Approximation, n = {n[0]}"
)

trace3 = go.Scatter(
    x=x,
    y=ln_property_list[1],
    name=f"Taylor Series Approximation, n = {n[1]}"
)

trace4 = go.Scatter(
    x=x,
    y=ln_property_list[2],
    marker = dict(color = 'blue',),
    name=f"Taylor Series Approximation, n = {n[2]}"
)

data = [trace1, trace2, trace3, trace4]
layout = go.Layout(
    width=850,
    height=450,
    title="Numpy Natural Log vs. ln property approximation",
    xaxis=dict(title="x"),
    yaxis=dict(title="Output")
)

fig = go.Figure(data=data, layout=layout)

# plotly.offline.iplot(fig)
html_fig = plotly.io.to_html(fig, include_plotlyjs=True)
display(HTML(html_fig))

Clearly we can see above that this is a far more accurate approximation! Note that we could also implement a function that would calculate the integral under $\frac{1}{x}$ to find $ln(x)$. But I will leave that to the reader to experiment with! Additional methods can be explored as well.

5. Computational Complexity

One area where the exponential and logarithm are incredibly important is in the analysis of algorithm run times in computer science. This could easily constitute it's own post, so I won't go into great detail, but suffice to say that if you are ever working in the world of software you will certainly encounter it.

In essence, when performing certain computations different algorithms will accomplish the computation in a different number of base operations. An example will make this more clear. Say we have a function my_algorithm, and it is meant to find a specific name in a python list. If our list has no inherent ordering, that would look like:

def my_algorithm(name_list, specific_name):
    for name in name_list:
        if name == specific_name:
            return name

Imagine now that the name we are looking for is john, and it is the last name in the list:

["tim", "max", "sam", "zach", "john"]

In this case we had to perform 5 checks, each time seeing if the name was john. What computational complexity (specifically Big-O analysis) digs into is what happens as the number of names in the list becomes large. It looks at what happens as the number of names, n grows, in relation to the number of checks. Now, to abstract a bit these checks can really be any operation (the addition of two numbers, the calling of another function, etc). What we are really concerned about is how our number of operations grows as the length of our list grows. In this case, it is simply a linear relationship-for each new name in the list, we need to perform another check.

Some algorithms will perform worse, and others better. For instance, a binary search can find a name in a sorted list in log(n) checks. The different complexities are explored below:

In [104]:
from scipy.special import factorial

x = np.arange(0.0001,100, 1)
y_linear = x
y_square = x**2
y_exponential = 2**x
y_logarithm = np.log(x)
y_factorial = factorial(x)

trace1 = go.Scatter(
    x=x,
    y=y_linear,
    marker = dict(
        color = 'red',
    ),
    name="Linear"
)

trace2 = go.Scatter(
    x=x,
    y=y_square,
    name="Square"
)

trace3 = go.Scatter(
    x=x,
    y=y_exponential,
    name="Exponential"
)

trace4 = go.Scatter(
    x=x,
    y=y_logarithm,
    marker = dict(color = 'blue',),
    name="Logarithm"
)

trace5 = go.Scatter(
    x=x,
    y=y_factorial,
    name="Factorial"
)

data = [trace1, trace2, trace3, trace4, trace5]
layout = go.Layout(
    width=650,
    height=450,
    title="Computational Complexity Comparison",
    xaxis=dict(title="n"),
    yaxis=dict(title="Number of Operations", range=[0,100])
)

fig = go.Figure(data=data, layout=layout)

# plotly.offline.iplot(fig)
html_fig = plotly.io.to_html(fig, include_plotlyjs=True)
display(HTML(html_fig))

The key takeaway here is not to become a computational complexity expert suddenly; rather, it is to show why an intuitive understanding of the exponential and logarithm function is so crucial to being an effective applied mathematician. What is interesting is that often people will see these plot visualizations and have an entirely different concept brought into memory. What I mean by that is consider the logarithmic curve above. What it generally makes people think of is "diminishing returns". As you continue to increase your input, you get less and less of a return. However, remember that a logarithm is also the transform that converts multiplication into division! This is an entirely different way of viewing the same concept. It is this ability to hold these competing concepts both in memory, and use them fluidly in the context of a mathematical problem that is truly invaluable.

6. In the real world

This post would not be complete if I didn't take a moment to discuss the place of both exponentials and logarithms in the real world. This is yet another lense from which to view these mathematical constructs.

6.1 Exponentials

Exponentials show up all through out the real world with examples such as:

  • Investments: Compound interest allows for money to grow exponentially with time, starting out small but becoming incredibly formidable as the years pass. The logic behind investments is actually a man made construct; humans chose to design compound interest based on an an exponential growth function.
  • Social Media Followers: As your gain more followers, it becomes easier to gain even more. This is due to in large part to network effects (gaining 1 follower means that you have a potential to gain 10 more if they share with their friends) and lack of system constraints (there is no real limit to the number of social media followers one may have-at first).
  • Population Growth: Whether it be rabbits or bacteria, in a system lacking constraints population growth tends to be exponential.
  • Half Life Process: When an organism dies, the carbon-14 in its body starts to decompose with a half-life of 5,734 years. The amount of carbon-12 on the other hand does not change$^8$. By comparing the ratio of carbon-12 to carbon-14 we can get an accurate estimate of the age of organic material.

Generally if we are dealing with exponentials (in all examples above) we are talking about growth (or, decay if the exponent is less than $1$). If (in this context) we then talk about logarithms, we are generally trying to figure out the time it would take to achieve this growth.

6.2 Logarithms

Logarithms also show up all through out the real world; they arise in areas such as:

  • Weight Loss and Strength Training: When first training for sport or trying to lose weight, results can be seen quickly; however, as time progresses they become harder to come by, slowly tapering off.
  • Language Proficiency: Once you have become a fluent speaker of a language only meager gains remain.
  • Enjoyment of Food: The first bite of a piece of pizza is sure to be delicious, but after several slices your enjoyment has certainly tapered off.

When dealing with the logarithm, our intuitions about the logic behind them will differ depending on context. If our context is in that of an exponential, the logarithm is most likely being used as an inverse in order to the find the time required to achieve certain growth. However, as discussed above, the logarithm can be used in order to model certain physical processes itself, specifically diminishing returns; it isn't simply the sidekick of the exponential.

7. Wrapping Up

Now, we have covered a good deal of ground at this point, so I'd like to take a moment and tie things back to the problem that spurred all of this. Recall, it was the simplification of the following expression, inside the body of a very long proof:

$$p(\alpha) = \frac{1}{w(\alpha)^\frac{log(2)}{log(w(\alpha))}} = \frac{1}{2}$$

I want you to ask yourself the following question: if you were working through a complex, original proof/derivation, and your ran into the above, could you simplify it? The key is to realize the beautiful interplay of exponentials, logarithms, inverse's, and composition of functions. Now, of course at this point (or if presented with this problem on an exam), you could use the change of base logarithm property, which can be found in Appendix C. However, this takes us back to the problem of priming! In the real world any sort of priming will be incredibly hazy, or entirely nonexistent. So, rather than use a memorized property, I am going to walk through the derivation from scratch, keeping in mind the exponential, logarithm, and the incredibly important fact that they are inverses of each other.

First and foremost, looking at the denominator we can see that we have a certain number raised to an exponent that is made of the division of two logarithms:

$$w(\alpha)^\frac{log(2)}{log(w(\alpha))}$$

What should spring to mind isn't necessarily the property $b^{log_bx} = x$, but rather simply that we have an expression containing an exponential and a logarithm, and that these two functions are inverses of each other. With that simple idea in mind, it is natural for the property $b^{log_bx} = x$ to come into play; however, the main point is that it should originate from a clear understanding of the inverse relationship.

At this point, knowing that exponentials and logarithms are inverses of each other, we may realize that $log_bx$ simply yields an exponent (that $b$ must be raised to in order to yield $x$). In the context of our problem, we can view the fraction of logarithms as:

$$ \frac{\text{exponent our base must be raised to to get 2}} {\text{exponent our base must be raised to to get }w(\alpha) } $$

So, this fraction is really representing two exponents. The base has been ambiguous up until this point, but we can just let it equal $10$ for simplicity. Now, let's think about this in the context of exponentials. We can write numerator of the above fraction as:

$$10^x = 2 \longrightarrow x = log_{10}(2)$$

And the denominator as:

$$10^x = w(\alpha) \longrightarrow x = log_{10}(w(\alpha))$$

With these representations handy, the inverse relationship at the front of our minds, and the goal of simplifying this expression if possible, how can we move forward? Well, in our expression $w(\alpha)^\frac{log(2)}{log(w(\alpha))}$, the base is $w(\alpha)$, so if we can convert out logarithm base to $w(\alpha)$, we can nicely reduce our expression. To do so, let's think about how we would change our exponential base to be that of $w(\alpha)$. First, imagine that our base was $3$ and we wanted to convert it to $9$:

$$3^x = 9^{\frac{1}{2} \cdot x} = 9^{\frac{x}{2}} = y$$

So, via the above example we have seen that in order to change the base of an exponential function, we simply need to scale the input, $x$ (i.e. the exponent), by the exponent the new base ($9$) must be raised to in order to yield the old base ($3$). What we have done here was solved a simpler case, as proposed by George Polya in How To Solve It$^9$. There is a general principle that we can take away from the above example, namely:

To change the base of an exponential function, we simply need to scale the input, $x$ (i.e. the exponent), by the exponent the new base must be raised to in order to yield the old base.

We can write this as:

$$9^{\frac{\text{original input}}{\text{Number new base is raised to to yield old base}}} = 9^{\frac{x}{2}} = y$$

If we then recall that we find exponents via logarithms, we can rewrite the numerator and the denominator above as:

$$\text{Numerator: } log_3(y) = x$$$$\text{Denominator: } log_3(9) = 2$$

And hence our relationship can be written as:

$$9^{\frac{x}{2}} = 9^{\frac{log_3(y)}{log_3(9)}}$$

Meaning, in order to change our exponentials base from $3$ to $9$, we simply scale the input to our function from simply $x$, to:

$$\frac{x}{2} = \frac{log_3(y)}{log_3(9)}$$

Now, to pull it all together, remember that the logarithm is going to return an exponent. So, if we had a base of $9$, and we wanted to know the exponent that it needed to be raised to in order to yield $y$:

$$log_9 y = \frac{x}{2}$$

From the above to relationships we have the equivalence:

$$log_9 y = \frac{log_3(y)}{log_3(9)}$$

Which can be easily extended to our situation! We can see that our fraction containing logarithms can be written as:

$$log_{w(\alpha)}(2) = \frac{log(2)}{log(w(\alpha))}$$

Subbing that into our exponential:

$$w(\alpha)^\frac{log(2)}{log(w(\alpha))} = w(\alpha)^{log_{w(\alpha)}(2)}$$

And since we know that the logarithms and exponentials are inverses of each other, our input $2$ is essentially being logged (with a based of $w(\alpha)$) and then that output is being immediately exponentiated (with a base $w(\alpha)$), simply yielding 2:

$$w(\alpha)^{log_{w(\alpha)}(2)} = 2$$

And just like that we have arrived at our final reduced expression:

$$p(\alpha) = \frac{1}{w(\alpha)^\frac{log(2)}{log(w(\alpha))}} = \frac{1}{2}$$

With that we are done! The key thing to take away from all of this is that we essentially used the change of base rule, only we came upon it intuitively via a deep understanding of exponentials, logarithms, and inverse functions. By being able to work with these mathematical tools on different levels and in different contexts we significantly advance our abilties as problems solvers. It is this ability to traverse the hierarchical ladder of mind that is so valuable.

Appendix A: Derivative of Exponential

To determine the derivative of an exponential function, I want us to start by defining a function, $M$:

$$M(t) = 2^t$$

We can start by trying to take it's derivative:

$$\frac{d M(t)}{dt} = \frac{2^{t+dt} - 2^t}{dt} = \frac{2^t \cdot 2^{dt} - 2^t}{dt} = 2^t \Big( \frac{2^{dt} - 1}{dt} \Big)$$

Where we specifically make use of property that $M(x+y) = M(x) \cdot M(y)$ in the 2nd and 3rd representation, i.e.:

$$2^{t+dt} = 2^t \cdot 2^{dt}$$

We were essentially relating an additive input to a multiplicative output. With that said, we want to think about what our final term on the right approaches as $dt \rightarrow 0$:

$$\lim_{dt \rightarrow 0} \; \; 2^t \Big( \frac{2^{dt} - 1}{dt} \Big)$$

Well, let's dig in to the term on the right:

$$\Big( \frac{2^{dt} - 1}{dt} \Big)$$

We can see that it holds all information relating to $dt$, and is entirely separate from the $t$ term itself. In other words, it doesn't depend on the actual time that we started!

Unlike derivatives of other functions, the derivative of $M(t) = 2^t$ has all of the stuff that depends on $dt$ separate from the value of $t$ itself.

Take a moment to think about what this means. It means that we can characterize the derivative of $2^t$ as the product of itself times some constant:

$$\frac{dM(t)}{dt} = \overbrace{2^t}^\text{itself} \cdot \overbrace{\Big( \frac{2^{dt} - 1}{dt} \Big)}^\text{a constant}$$

We can actually evaluate that constant with a bit of python, letting dt = 0.000001:

dt = 0.000001
dt_constant_term = (2**dt - 1) / dt
In [58]:
dt = 0.000001

dt_constant_term = (2**dt - 1) / dt

display(f"dt constant term: {dt_constant_term}")
'dt constant term: 0.6931474207938493'

Now we can rewrite the derivative as:

$$\frac{dM(t)}{dt} = 2^t \cdot (0.693\dots)$$

At this point you are probably wondering why exactly the constant has this odd value of $0.693\dots$. To explore this a bit, let's also take the derivative of $8^t$ as well:

dt = 0.000001
dt_constant_term = (8**dt - 1) / dt
In [59]:
dt = 0.000001

dt_constant_term = (8**dt - 1) / dt

display(f"dt constant term: {dt_constant_term}")
'dt constant term: 2.0794437036730784'
$$\frac{dM(t)}{dt} = 8^t \cdot (2.079\dots)$$

Note that this constant $2.079 \dots$ is exactly $3$ times the size as our previous constant. We can see that there is a pattern to these constants-they are not random. But what exactly is the pattern? In order to determine the pattern behind our constants, we must ask: where will the proportionality constant equal 1? In other words, is there a number $a$, where:

$$\frac{d(a^t)}{dt} = a^t \cdot 1$$

It turns out there is! Specifically, this is the case when $a$ is equal to $e = 2.71828...$.

This is actually what defines the number $e$!

Let's see why; to start, I will define $M$ as:

$$M(t) = e^t$$

We can then proceed with our derivation:

$$\frac{d M(t)}{dt} = e^t \cdot \overbrace{ \Big( \frac{e^{dt} - 1}{dt} \Big)}^{\text{as } dt \rightarrow 0 \text{, this} \\ \text{approaches 1}} = e^t \cdot (1.0000 \dots)$$

Then, if we recall the chain rule:

$$F(x) = f(g(x))$$$$F'(x) = f'(g(x)) \cdot g'(x)$$

We can find the derivative of $e^{ct}$ as follows:

$$\frac{d(e^{ct})}{dt} = e^{ct} \cdot c$$

And, if we make use of a nice the nice inverse relationship between exponentiation and logarithms, we can actually write $2$ as:

$$2 = e^{ln(2)}$$

And hence:

$$2^t = \big( e^{ln(2)}\big)^t = e^{ln(2) \cdot t}$$

We can then derive $2^t$ as follows:

$$\frac{d(2^t)}{dt} = ln(2) \cdot e^{ln(2) \cdot t} = ln(2) \cdot 2^t$$

Where $ln(2)$ is our proportionality constant! To check, we can evaluate quickly:

from math import log, e
ln_2 = log(2, e)
In [42]:
display(f"ln(2) = {log(2, e)}")
'ln(2) = 0.6931471805599453'

It is worth noting that throughout calculus you rarely see exponetials written as some base to a power $t$. You almost always see them as $e^{ct}$, just as we can write $2^t$ as $e^{ln(2) \cdot t}$.

This is why in all branches of science $e$ shows up so regularly! There are many ways to write down any particular exponential function. When you see something written as $e^{ct}$, that was choice that was made to write it that way! $e$ was by no means some "fundamental" property of the function itself.

Key Takeaway
All sorts of phenomena involve some rate of change that is proportional to the thing that is changing. For example, the rate of growth of a population does tend to be proportional to the size of the population itself (assuming there isn't a limited resource to slow growth). Likewise, if you invest your money, the rate which it grows is proportional to the amount of money there at any time

$$M(t) = e^{(1+r)t}$$$$\frac{dM}{dt} = (1 + r) M $$

Above, we chose to express $M$ as $e$ to the power of some constant times $t$.

Appendix B: Group Theory Introduction

Much of modern mathematics is rooted in understanding how a collection of actions is organized by the relation between pairs of actions and the single action you get by composing them. This branch of math is known as Group Theory. Group Theory revolves entirely around studying the nature of symmetry. We often think of starting with some entity, say a square, and then thinking about all of the actions you can take on the square that leave it indistinguishable from how it started$^6$. For instance, rotating a square 90 degrees leaves it looking no different than prior to the rotation. The same goes for a 180 degree rotation, or flipping over the x-axis, and many others. All of these actions make up a group of symmetries, or just group for short.

Now, groups are extremely general; many different ideas can be framed in terms of symmetries. The most familiar example may in fact be numbers themselves$^7$. There are two different types of ways to think of numbers as groups themselves:

  1. In a way that composing actions looks like addition.
  2. In a way that composing actions looks like multiplication.

At first this may seem a little strange, since we generally don't think of numbers as actions; rather we think of numbers as counting things, but bear with me for a second.

Additive Group of Real Numbers

Think of all the different ways that you can slide a number line along itself. This collection of all sliding actions is a group! It is a group of symmetries on an infinite line. We can associate each action with a unique point on the thing that it is actually acting on. We simply follow where the point that starts at $0$ ends up. We can see this below:

Above we are looking at the adder 3; it represents a shifting action, taking the point 0 to 3. In the most general terms, the number 3 is associated with the action of sliding to the right by 3 units. This group of actions is known as the Additive group of real numbers. The reason that this is know as the additive group of real numbers is due to what the group operation of applying one action followed by another looks like. For instance, if we slide right by 3 units, and then follow that up by sliding right 2 units, the overall effect is the same as sliding to the right by $3 + 2 = 5$ units.

The main takeaway is that this gives an alternate view into what numbers actually are- one example in a much larger category of groups; groups of symmetries acting on some object. The arithmetic of adding numbers is just one example of the arithmetic that any group of symmetries has within it.

Multiplicative Group of Real Numbers

Now let's consider a new group of actions on the number line: all of the different ways that you can stretch of squish it, keeping everything evenly spaced, with $0$ fixed in place. Again, this group of actions has the nice property that we can associate each action in the group with a specific point on the thing that it is acting on.

In this case, there is only one stretching action that brings the point $1$ to the point $3$; stretching by a factor of 3.

In this way, every single positive number is associated with a unique stretching or squishing action. Now, think about what composing actions looks like in this group. If we apply a stretch by 3 action, and follow it by a stretch by 2 action, the overall effect is that of a stretch by 6 action. In other words, it is the product of the two original numbers. So, applying one stretching action followed by another corresponds to multiplying the numbers that they are associated with. Hence, the name of this group is the Multiplicative group of positive real numbers. So, ordinary multiplication is yet another example of a group, and the arithmetic within groups.

Appendix C: Logarithm Property Derivations

To wrap up this logarithm section, I want to just go over some quick derivations of the main logarithm properties that are used frequently in practice:

$$\text{Power Rule} \; \longrightarrow \; log_ax^n = nlog_ax$$$$\text{Change of Base Rule} \; \longrightarrow \; log_b(x) = \frac{log_ax}{log_ab}$$$$\text{Product Rule} \; \longrightarrow \; log_a(xy) = log_ax + log_ay$$$$\text{Division Rule} \; \longrightarrow \; log_a(\frac{x}{y}) = log_ax - log_ay$$

Power Rule
To start we can attack the power rule. We will make use of the following two properties:

$$log_a(a^x) = x$$$$log_a(x) = y \longleftrightarrow a^y = x$$

We can then define $m$ as:

$$m = log_ax$$

And switch to exponent notation:

$$a^m = x$$

We can then raise both both sides to the $n$:

$$(a^m)^n = x^n$$

And then take the $log_a$ of both sides:

$$log_a(a^{mn}) = log_a(x^n)$$

The $log$ and exponent on the left cancel out:

$$mn = log_a(x^n)$$

And we can plug in our value for $m$:

$$n log_ax = log_a(x^n)$$

The key intuition here is that this is a this property is heavily related to intuitive properties of exponentials, namely $x^{m^n} = x^{nm}$. Another way of deriving this is to utilize the product rule:

$$log_ax^n = \overbrace{log_ax \cdot \dots \cdot log_ax}^\text{n times} = n \cdot log_ax$$

Change of Base Rule
We can start by defining:

$$y = log_b x$$

And it's exponential equivalent:

$$b^y = b^{log_bx}$$$$b^y = x$$

Now we can take the $log_a$ of each side of the equation:

$$log_a(b^y) = log_a(x)$$

Use the power rule:

$$ylog_a(b) = log_a(x)$$

Substitute our original definition of $y$:

$$ log_b x \cdot log_a(b) = log_a(x)$$

And have our final rule:

$$ log_b x = \frac{log_a(x)}{log_a(b)}$$

Appendix: References

  1. Thinking Fast and Slow, Chapter 1, 3, Daniel Kahneman
  2. How to Create a Mind, Ray Kurzweil
  3. The Organization of Complex Systems, Herbert A. Simon
  4. Numbers as groups
  5. How to think about exponentials
  6. The Model Thinker, Chapter 8
  7. How to Solve It, George Polya
In [33]:
import warnings
warnings.filterwarnings('ignore')

© 2018 Nathaniel Dake