14. Tensor Flow Basics¶

We are now going to cover TensorFlow Basics. It is going to introduce basic variables and functions and expressions, and show you how you can optimize a simple function.

We can start with our imports.

import numpy as np
import tensorflow as tf

Okay, so in Tensorflow, placeholder is like a theano variable.

# Placeholder - you must specify the type, shape and name are optional
A = tf.placeholder(tf.float32, shape=(5, 5), name='A')

We can now create a vector, but give it no shape or name.

# Vector - as stated above, shape and name are optional
v = tf.placeholder(tf.float32)

We can then do matrix multiplication, similar to what we did in theano. matmul feels a bit more appropriate than dot.

w = tf.matmul(A, v)

Similar to theano, we need to feed the variables values, since A and v do not yet have values. In TensorFlow you do the actual work in what is called a Session.

with tf.Session() as session:                          # Opening a session
  # Run matrix multiplicaton. feed_dict tells what A and v are. np can be used to send values
  # Note: v needs to be shape=(5,1), not just shape (5,). This is more like "real" mat mult
  output = session.run(w, feed_dict={A: np.random.randn(5, 5), v: np.random.randn(5, 1)})
  
  # Print the output returned by this session
  print(output, type(output))

[[-0.4347386 ]
 [ 0.5099224 ]
 [ 0.4925214 ]
 [ 0.43968624]
 [ 1.0491427 ]] <class 'numpy.ndarray'>

We can see above that the output returned is just a numpy array! Now something important to note is that TensorFlow variables are like Theano shared variables. But, Theano variables are like TensorFlow placeholders.

A tf variable can be initialized with a numpy array or a tf array, or really anything that can be turned into a tf vector.

Now, we are going to have a variable we can update using gradient descent.

shape = (2, 2)
# Create TensorFlow Variable, passing in random_normal as its initial value 
x = tf.Variable(tf.random_normal(shape))
x_1 = tf.Variable(np.random.randn(2,2))     # We can also pass in numpy array
t = tf.Variable(0)                          # Or we can pass in a scalar

With tensorflow variables, we will need to initialize them.

init = tf.global_variables_initializer()

Now we can open a session, and run our init.

with tf.Session() as session:          # Open session
  out = session.run(init)              # Run init operation
  print('out: ', out)
  
  print('x: ', x.eval())                      # After we run init can print out values of eval 
  print('t: ', t.eval())

out:  None
x:  [[0.1909285  0.36801875]
 [1.0030868  1.1062077 ]]
t:  0

Okay, now let's try to find the minimum of a simple cost function like we did in theano.

u = tf.Variable(20.0)        # Create a variable, initialize it to 20
cost = u*u + u + 1.0         # Create same cost function that we did in theano example

One big difference between theano and tensorflow is that you do not write the updates yourself in tensorflow. Instead you choose and optimizer (tensorflow has a bunch) that implements the algorithm you want. For example, GradientDescentOptimizer is just regular gradient descent, and if we want a learning rate of 0.3, we can pass it in. Check documentation for more information on params. We then tell it what expression we want to minimize.

train_op = tf.train.GradientDescentOptimizer(0.3).minimize(cost)

Now, again we will initialize our variables and then open our session. Oddly enough, while the weight update is automated, the loop itself is not. So we can just call train_op until convergence. This is useful regardless, since it allows us to track the cost function.

init = tf.global_variables_initializer()
with tf.Session() as session: 
  session.run(init)
  for i in range(12):
    session.run(train_op)
    print("i = %d, cost = %.3f, u = %.3f" % (i, cost.eval(), u.eval()))

i = 0, cost = 67.990, u = 7.700
i = 1, cost = 11.508, u = 2.780
i = 2, cost = 2.471, u = 0.812
i = 3, cost = 1.025, u = 0.025
i = 4, cost = 0.794, u = -0.290
i = 5, cost = 0.757, u = -0.416
i = 6, cost = 0.751, u = -0.466
i = 7, cost = 0.750, u = -0.487
i = 8, cost = 0.750, u = -0.495
i = 9, cost = 0.750, u = -0.498
i = 10, cost = 0.750, u = -0.499
i = 11, cost = 0.750, u = -0.500

2. TensorFlow: Neural Network¶

We are now going to create a neural network in tensorflow. We can start with our usual imports.

import numpy as np
import matplotlib.pyplot as plt        # Pulling in so that we can plot the log-likelihood
import seaborn as sns
from util import get_normalized_data, y2indicator # Util to get data and create ind matrix

# Seaborn Plot Styling
sns.set(style="white", palette="husl")
sns.set_context("poster")
sns.set_style("ticks")

We can also use the same error rate calculation from the theano walkthrough.

def error_rate(p, t):
  return np.mean(p != t)

And now we can create our main function!

def main():
  """------------- Step 1: Get our data and define the usual variables ------------"""
  X, Y = get_normalized_data()
  
  max_iter = 20
  print_period = 10
  
  lr = 0.00004
  reg = 0.01
  
  Xtrain = X[:-1000,]
  Ytrain = Y[:-1000]
  Xtest  = X[-1000:,]
  Ytest  = Y[-1000:]
  Ytrain_ind = y2indicator(Ytrain)
  Ytest_ind = y2indicator(Ytest)

  N, D = Xtrain.shape
  batch_sz = 500
  n_batches = N // batch_sz

  M1 = 300   # 300 hidden units in first layer 
  M2 = 100   # 100 hidden units in second layer
  K = 10    # 10 classes 
  W1_init = np.random.randn(D, M1) / 28
  b1_init = np.zeros(M1)
  W2_init = np.random.randn(M1, M2) / np.sqrt(M1)
  b2_init = np.zeros(M2)
  W3_init = np.random.randn(M2, K) / np.sqrt(M2)   # For TensorFlow we are going to 
  b3_init = np.zeros(K)                            # add another hidden layer to our nn
  
  """------------- Step 2: Define TensorFlow variables and expressions ---------------"""
  # Define Variables and expressions
  X = tf.placeholder(tf.float32, shape=(None, D), name='X')
  T = tf.placeholder(tf.float32, shape=(None, K), name='T')
  W1 = tf.Variable(W1_init.astype(np.float32))
  b1 = tf.Variable(b1_init.astype(np.float32))
  W2 = tf.Variable(W2_init.astype(np.float32))
  b2 = tf.Variable(b2_init.astype(np.float32))
  W3 = tf.Variable(W3_init.astype(np.float32))
  b3 = tf.Variable(b3_init.astype(np.float32))
  
  # Define the model using tensorflow functions
  Z1 = tf.nn.relu( tf.matmul(X, W1) + b1 )           # 1st hidden layer output
  Z2 = tf.nn.relu( tf.matmul(Z1, W2) + b2 )          # 2nd hidden layer output
  Yish = tf.matmul(Z2, W3) + b3 
  
  # Note: called Yish above because it is not really Y. It is just the matrix mutliplication
  # of Z2 and W3 plus b3, without doing the softmax. This is because the softmax is included
  # in the cost calculation for some reason. 
  
  # softmax_cross_entropy_with_logits take in the "logits"
  # If you wanted to know the actual output of the neural net,
  # you could pass "Yish" into tf.nn.softmax(logits)
  cost = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=Yish, labels=T))
  
  # Now we need our train and prediction functions. 
  # We choose the optimizer but don't implement the algorithm ourselves
  # Let's go with RMSprop, since we just learned about it. It includes momentum!
  train_op = tf.train.RMSPropOptimizer(lr, decay=0.99, momentum=0.9).minimize(cost)
  
  # Used to calculate the error rate
  predict_op = tf.argmax(Yish, 1)
  
  costs = []
  init = tf.global_variables_initializer()      # Initialize variables 
  with tf.Session() as session:                 # Start session
    session.run(init)                           # Run init function

    for i in range(max_iter):                   # Usual for loop
        for j in range(n_batches):
            Xbatch = Xtrain[j*batch_sz:(j*batch_sz + batch_sz),]
            Ybatch = Ytrain_ind[j*batch_sz:(j*batch_sz + batch_sz),]
            
            # Theano we just call function, Tensorflow we call session to run function
            session.run(train_op, feed_dict={X: Xbatch, T: Ybatch})
            if j % print_period == 0:
              test_cost = session.run(cost, feed_dict={X: Xtest, T: Ytest_ind})
              prediction = session.run(predict_op, feed_dict={X: Xtest})
              err = error_rate(prediction, Ytest)
              print("Cost / err at iteration i=%d, j=%d: %.3f / %.3f" % (i, j, test_cost, err))
              costs.append(test_cost)
              
  fig, ax = plt.subplots(figsize=(12,8))        
  plt.plot(costs)
  plt.show()
            
            
if __name__ == '__main__':
  main()

Reading in and transforming data...
Cost / err at iteration i=0, j=0: 2377.992 / 0.915
Cost / err at iteration i=0, j=10: 1580.701 / 0.358
Cost / err at iteration i=0, j=20: 899.760 / 0.233
Cost / err at iteration i=0, j=30: 607.762 / 0.161
Cost / err at iteration i=0, j=40: 492.200 / 0.146
Cost / err at iteration i=0, j=50: 429.369 / 0.122
Cost / err at iteration i=0, j=60: 382.317 / 0.110
Cost / err at iteration i=0, j=70: 363.119 / 0.104
Cost / err at iteration i=0, j=80: 339.043 / 0.096
Cost / err at iteration i=1, j=0: 332.146 / 0.097
Cost / err at iteration i=1, j=10: 308.637 / 0.091
Cost / err at iteration i=1, j=20: 294.324 / 0.085
Cost / err at iteration i=1, j=30: 277.907 / 0.077
Cost / err at iteration i=1, j=40: 265.106 / 0.075
Cost / err at iteration i=1, j=50: 262.576 / 0.071
Cost / err at iteration i=1, j=60: 250.367 / 0.076
Cost / err at iteration i=1, j=70: 246.810 / 0.067
Cost / err at iteration i=1, j=80: 239.462 / 0.068
Cost / err at iteration i=2, j=0: 237.938 / 0.068
Cost / err at iteration i=2, j=10: 231.635 / 0.062
Cost / err at iteration i=2, j=20: 224.735 / 0.063
Cost / err at iteration i=2, j=30: 217.435 / 0.059
Cost / err at iteration i=2, j=40: 208.233 / 0.058
Cost / err at iteration i=2, j=50: 207.137 / 0.053
Cost / err at iteration i=2, j=60: 197.169 / 0.057
Cost / err at iteration i=2, j=70: 197.700 / 0.053
Cost / err at iteration i=2, j=80: 198.244 / 0.052
Cost / err at iteration i=3, j=0: 198.480 / 0.053
Cost / err at iteration i=3, j=10: 201.338 / 0.052
Cost / err at iteration i=3, j=20: 194.644 / 0.056
Cost / err at iteration i=3, j=30: 186.661 / 0.050
Cost / err at iteration i=3, j=40: 178.027 / 0.049
Cost / err at iteration i=3, j=50: 179.487 / 0.050
Cost / err at iteration i=3, j=60: 169.187 / 0.045
Cost / err at iteration i=3, j=70: 173.373 / 0.049
Cost / err at iteration i=3, j=80: 170.823 / 0.044
Cost / err at iteration i=4, j=0: 171.323 / 0.044
Cost / err at iteration i=4, j=10: 176.780 / 0.048
Cost / err at iteration i=4, j=20: 168.316 / 0.044
Cost / err at iteration i=4, j=30: 170.296 / 0.046
Cost / err at iteration i=4, j=40: 164.295 / 0.045
Cost / err at iteration i=4, j=50: 165.945 / 0.042
Cost / err at iteration i=4, j=60: 154.873 / 0.041
Cost / err at iteration i=4, j=70: 161.653 / 0.044
Cost / err at iteration i=4, j=80: 156.672 / 0.043
Cost / err at iteration i=5, j=0: 157.078 / 0.043
Cost / err at iteration i=5, j=10: 162.809 / 0.043
Cost / err at iteration i=5, j=20: 153.625 / 0.039
Cost / err at iteration i=5, j=30: 159.543 / 0.037
Cost / err at iteration i=5, j=40: 154.257 / 0.041
Cost / err at iteration i=5, j=50: 155.760 / 0.042
Cost / err at iteration i=5, j=60: 144.425 / 0.036
Cost / err at iteration i=5, j=70: 151.085 / 0.038
Cost / err at iteration i=5, j=80: 149.526 / 0.034
Cost / err at iteration i=6, j=0: 150.336 / 0.036
Cost / err at iteration i=6, j=10: 154.673 / 0.038
Cost / err at iteration i=6, j=20: 147.732 / 0.037
Cost / err at iteration i=6, j=30: 156.231 / 0.034
Cost / err at iteration i=6, j=40: 151.054 / 0.041
Cost / err at iteration i=6, j=50: 151.366 / 0.037
Cost / err at iteration i=6, j=60: 139.566 / 0.035
Cost / err at iteration i=6, j=70: 145.670 / 0.033
Cost / err at iteration i=6, j=80: 146.704 / 0.032
Cost / err at iteration i=7, j=0: 147.762 / 0.032
Cost / err at iteration i=7, j=10: 149.364 / 0.035
Cost / err at iteration i=7, j=20: 142.968 / 0.033
Cost / err at iteration i=7, j=30: 154.572 / 0.035
Cost / err at iteration i=7, j=40: 149.902 / 0.038
Cost / err at iteration i=7, j=50: 150.078 / 0.038
Cost / err at iteration i=7, j=60: 138.088 / 0.032
Cost / err at iteration i=7, j=70: 140.942 / 0.034
Cost / err at iteration i=7, j=80: 146.252 / 0.033
Cost / err at iteration i=8, j=0: 148.250 / 0.034
Cost / err at iteration i=8, j=10: 148.050 / 0.032
Cost / err at iteration i=8, j=20: 141.994 / 0.030
Cost / err at iteration i=8, j=30: 156.973 / 0.035
Cost / err at iteration i=8, j=40: 151.744 / 0.038
Cost / err at iteration i=8, j=50: 151.797 / 0.037
Cost / err at iteration i=8, j=60: 141.103 / 0.033
Cost / err at iteration i=8, j=70: 141.099 / 0.031
Cost / err at iteration i=8, j=80: 147.255 / 0.033
Cost / err at iteration i=9, j=0: 149.857 / 0.033
Cost / err at iteration i=9, j=10: 148.180 / 0.031
Cost / err at iteration i=9, j=20: 142.375 / 0.031
Cost / err at iteration i=9, j=30: 159.417 / 0.034
Cost / err at iteration i=9, j=40: 154.496 / 0.039
Cost / err at iteration i=9, j=50: 154.557 / 0.036
Cost / err at iteration i=9, j=60: 144.599 / 0.034
Cost / err at iteration i=9, j=70: 142.154 / 0.032
Cost / err at iteration i=9, j=80: 149.696 / 0.032
Cost / err at iteration i=10, j=0: 153.326 / 0.034
Cost / err at iteration i=10, j=10: 150.079 / 0.027
Cost / err at iteration i=10, j=20: 143.347 / 0.030
Cost / err at iteration i=10, j=30: 160.333 / 0.033
Cost / err at iteration i=10, j=40: 158.118 / 0.036
Cost / err at iteration i=10, j=50: 158.516 / 0.035
Cost / err at iteration i=10, j=60: 150.110 / 0.035
Cost / err at iteration i=10, j=70: 144.167 / 0.033
Cost / err at iteration i=10, j=80: 152.761 / 0.033
Cost / err at iteration i=11, j=0: 157.007 / 0.034
Cost / err at iteration i=11, j=10: 152.156 / 0.026
Cost / err at iteration i=11, j=20: 146.377 / 0.030
Cost / err at iteration i=11, j=30: 163.497 / 0.030
Cost / err at iteration i=11, j=40: 164.000 / 0.034
Cost / err at iteration i=11, j=50: 159.997 / 0.034
Cost / err at iteration i=11, j=60: 152.331 / 0.037
Cost / err at iteration i=11, j=70: 146.288 / 0.033
Cost / err at iteration i=11, j=80: 151.482 / 0.033
Cost / err at iteration i=12, j=0: 156.947 / 0.034
Cost / err at iteration i=12, j=10: 156.475 / 0.027
Cost / err at iteration i=12, j=20: 149.089 / 0.031
Cost / err at iteration i=12, j=30: 165.447 / 0.031
Cost / err at iteration i=12, j=40: 168.512 / 0.034
Cost / err at iteration i=12, j=50: 163.669 / 0.033
Cost / err at iteration i=12, j=60: 155.995 / 0.037
Cost / err at iteration i=12, j=70: 149.282 / 0.034
Cost / err at iteration i=12, j=80: 152.937 / 0.031
Cost / err at iteration i=13, j=0: 158.655 / 0.034
Cost / err at iteration i=13, j=10: 161.237 / 0.029
Cost / err at iteration i=13, j=20: 152.915 / 0.032
Cost / err at iteration i=13, j=30: 169.678 / 0.034
Cost / err at iteration i=13, j=40: 175.407 / 0.037
Cost / err at iteration i=13, j=50: 169.002 / 0.035
Cost / err at iteration i=13, j=60: 161.081 / 0.035
Cost / err at iteration i=13, j=70: 155.246 / 0.031
Cost / err at iteration i=13, j=80: 156.768 / 0.031
Cost / err at iteration i=14, j=0: 161.611 / 0.033
Cost / err at iteration i=14, j=10: 165.174 / 0.030
Cost / err at iteration i=14, j=20: 154.571 / 0.030
Cost / err at iteration i=14, j=30: 170.587 / 0.031
Cost / err at iteration i=14, j=40: 181.405 / 0.038
Cost / err at iteration i=14, j=50: 175.270 / 0.036
Cost / err at iteration i=14, j=60: 167.391 / 0.035
Cost / err at iteration i=14, j=70: 162.670 / 0.032
Cost / err at iteration i=14, j=80: 162.555 / 0.032
Cost / err at iteration i=15, j=0: 166.955 / 0.032
Cost / err at iteration i=15, j=10: 171.512 / 0.030
Cost / err at iteration i=15, j=20: 159.807 / 0.028
Cost / err at iteration i=15, j=30: 174.853 / 0.029
Cost / err at iteration i=15, j=40: 190.530 / 0.039
Cost / err at iteration i=15, j=50: 183.269 / 0.036
Cost / err at iteration i=15, j=60: 174.226 / 0.035
Cost / err at iteration i=15, j=70: 169.026 / 0.030
Cost / err at iteration i=15, j=80: 168.624 / 0.032
Cost / err at iteration i=16, j=0: 172.216 / 0.031
Cost / err at iteration i=16, j=10: 177.128 / 0.031
Cost / err at iteration i=16, j=20: 164.209 / 0.027
Cost / err at iteration i=16, j=30: 179.276 / 0.028
Cost / err at iteration i=16, j=40: 196.415 / 0.035
Cost / err at iteration i=16, j=50: 189.817 / 0.037
Cost / err at iteration i=16, j=60: 181.795 / 0.033
Cost / err at iteration i=16, j=70: 178.305 / 0.030
Cost / err at iteration i=16, j=80: 177.308 / 0.031
Cost / err at iteration i=17, j=0: 179.851 / 0.031
Cost / err at iteration i=17, j=10: 186.060 / 0.032
Cost / err at iteration i=17, j=20: 170.839 / 0.028
Cost / err at iteration i=17, j=30: 185.961 / 0.029
Cost / err at iteration i=17, j=40: 201.099 / 0.034
Cost / err at iteration i=17, j=50: 197.416 / 0.035
Cost / err at iteration i=17, j=60: 186.098 / 0.030
Cost / err at iteration i=17, j=70: 183.531 / 0.031
Cost / err at iteration i=17, j=80: 187.359 / 0.030
Cost / err at iteration i=18, j=0: 186.930 / 0.029
Cost / err at iteration i=18, j=10: 198.604 / 0.033
Cost / err at iteration i=18, j=20: 178.906 / 0.025
Cost / err at iteration i=18, j=30: 195.487 / 0.025
Cost / err at iteration i=18, j=40: 209.658 / 0.033
Cost / err at iteration i=18, j=50: 206.294 / 0.036
Cost / err at iteration i=18, j=60: 196.687 / 0.032
Cost / err at iteration i=18, j=70: 188.350 / 0.027
Cost / err at iteration i=18, j=80: 187.418 / 0.028
Cost / err at iteration i=19, j=0: 187.155 / 0.028
Cost / err at iteration i=19, j=10: 210.489 / 0.030
Cost / err at iteration i=19, j=20: 193.358 / 0.023
Cost / err at iteration i=19, j=30: 203.273 / 0.031
Cost / err at iteration i=19, j=40: 213.058 / 0.034
Cost / err at iteration i=19, j=50: 234.467 / 0.041
Cost / err at iteration i=19, j=60: 219.604 / 0.031
Cost / err at iteration i=19, j=70: 221.491 / 0.030
Cost / err at iteration i=19, j=80: 223.441 / 0.029

3. TensorFlow - Appendix¶

We are going to quickly now touch on a few of the concepts that TensorFlow will utilize. First we can talk about the graph.

3.1 Graph¶

A graph is a useful construct in deep learning because a neural network is a special case of a graph. Recall that a graph is just a set of nodes and edges. In deep learning, each node represents some value, or computations on other values.

So, why do we need a graph? Well, we have seen that backpropagation is very hard. It is NOT something we would want to have to write manually. Even with only 1 hidden layer the equations are difficult to derive, now imagine trying to do that for 100 hidden layers. However, we know that differentiation follows some very basic rules. For example, we know that the partial derivative of E with respect to C is 1, $\frac{\partial d}{\partial C} = 1$, and it does not depend on $D$ at all. So the edges of a graph tell us which way to calculate the derivatives.

You may also recall that in the deep learning notebook, we talked about how there is a recursiveness to backpropagation. No matter what layer you are in, derivative only depends on some error term that was calculated at the layer ahead, and is the same operation each time. Keep in mind that a tree is just a special case of a graph.

3.2 Sessions¶

A session is a tensorflow specific construct. We know that google is the king of distributed systems. The key point when we talked about graphs was that none of the variables contained actual numbers (and the numbers you want to plug in may be too big to fit on just one machine).

So, if we define C = A + B, we don't know what number C should be unless we provide the numbers for A and B as well. All we know is how to calculate C. So, in other words, the actual values for A and B have not yet been loaded into tensorflows "memory" (memory is being used loosely here). Why is this important? Well, if we are doing computations on the CPU, then we will load our data (arrays) into the main RAM. However, if we are doing computations on the GPU then we will load data into GPU RAM, which is separate. In the google world, they distribute computation across multiple GPUs, so sometimes data is too big to even fit on 1 GPU. So, a session allows you to specify where you are going to do your computation, so that when you pass in actual numbers, they go to the right place and enough space is allocated for them to exist.

This also explains why we need to initialize variables, and pass in data through feed_dict. It is like telling tensorflow: "here is the value you are going to use for A, please copy it into your memory. Here is the value you are going to use for B, please copy it into your memory. Now perform the computation we asked for."