We are now going to cover TensorFlow Basics. It is going to introduce basic variables and functions and expressions, and show you how you can optimize a simple function.
We can start with our imports.
import numpy as np
import tensorflow as tf
Okay, so in Tensorflow, placeholder is like a theano variable.
# Placeholder - you must specify the type, shape and name are optional
A = tf.placeholder(tf.float32, shape=(5, 5), name='A')
We can now create a vector, but give it no shape or name.
# Vector - as stated above, shape and name are optional
v = tf.placeholder(tf.float32)
We can then do matrix multiplication, similar to what we did in theano. matmul
feels a bit more appropriate than dot
.
w = tf.matmul(A, v)
Similar to theano, we need to feed the variables values, since A and v do not yet have values. In TensorFlow you do the actual work in what is called a Session.
with tf.Session() as session: # Opening a session
# Run matrix multiplicaton. feed_dict tells what A and v are. np can be used to send values
# Note: v needs to be shape=(5,1), not just shape (5,). This is more like "real" mat mult
output = session.run(w, feed_dict={A: np.random.randn(5, 5), v: np.random.randn(5, 1)})
# Print the output returned by this session
print(output, type(output))
We can see above that the output returned is just a numpy array! Now something important to note is that TensorFlow variables are like Theano shared variables. But, Theano variables are like TensorFlow placeholders.
A tf variable can be initialized with a numpy array or a tf array, or really anything that can be turned into a tf vector.
Now, we are going to have a variable we can update using gradient descent.
shape = (2, 2)
# Create TensorFlow Variable, passing in random_normal as its initial value
x = tf.Variable(tf.random_normal(shape))
x_1 = tf.Variable(np.random.randn(2,2)) # We can also pass in numpy array
t = tf.Variable(0) # Or we can pass in a scalar
With tensorflow variables, we will need to initialize them.
init = tf.global_variables_initializer()
Now we can open a session, and run our init.
with tf.Session() as session: # Open session
out = session.run(init) # Run init operation
print('out: ', out)
print('x: ', x.eval()) # After we run init can print out values of eval
print('t: ', t.eval())
Okay, now let's try to find the minimum of a simple cost function like we did in theano.
u = tf.Variable(20.0) # Create a variable, initialize it to 20
cost = u*u + u + 1.0 # Create same cost function that we did in theano example
One big difference between theano and tensorflow is that you do not write the updates yourself in tensorflow. Instead you choose and optimizer (tensorflow has a bunch) that implements the algorithm you want. For example, GradientDescentOptimizer
is just regular gradient descent, and if we want a learning rate of 0.3, we can pass it in. Check documentation for more information on params. We then tell it what expression we want to minimize.
train_op = tf.train.GradientDescentOptimizer(0.3).minimize(cost)
Now, again we will initialize our variables and then open our session. Oddly enough, while the weight update is automated, the loop itself is not. So we can just call train_op
until convergence. This is useful regardless, since it allows us to track the cost function.
init = tf.global_variables_initializer()
with tf.Session() as session:
session.run(init)
for i in range(12):
session.run(train_op)
print("i = %d, cost = %.3f, u = %.3f" % (i, cost.eval(), u.eval()))
We are now going to create a neural network in tensorflow. We can start with our usual imports.
import numpy as np
import matplotlib.pyplot as plt # Pulling in so that we can plot the log-likelihood
import seaborn as sns
from util import get_normalized_data, y2indicator # Util to get data and create ind matrix
# Seaborn Plot Styling
sns.set(style="white", palette="husl")
sns.set_context("poster")
sns.set_style("ticks")
We can also use the same error rate calculation from the theano walkthrough.
def error_rate(p, t):
return np.mean(p != t)
And now we can create our main function!
def main():
"""------------- Step 1: Get our data and define the usual variables ------------"""
X, Y = get_normalized_data()
max_iter = 20
print_period = 10
lr = 0.00004
reg = 0.01
Xtrain = X[:-1000,]
Ytrain = Y[:-1000]
Xtest = X[-1000:,]
Ytest = Y[-1000:]
Ytrain_ind = y2indicator(Ytrain)
Ytest_ind = y2indicator(Ytest)
N, D = Xtrain.shape
batch_sz = 500
n_batches = N // batch_sz
M1 = 300 # 300 hidden units in first layer
M2 = 100 # 100 hidden units in second layer
K = 10 # 10 classes
W1_init = np.random.randn(D, M1) / 28
b1_init = np.zeros(M1)
W2_init = np.random.randn(M1, M2) / np.sqrt(M1)
b2_init = np.zeros(M2)
W3_init = np.random.randn(M2, K) / np.sqrt(M2) # For TensorFlow we are going to
b3_init = np.zeros(K) # add another hidden layer to our nn
"""------------- Step 2: Define TensorFlow variables and expressions ---------------"""
# Define Variables and expressions
X = tf.placeholder(tf.float32, shape=(None, D), name='X')
T = tf.placeholder(tf.float32, shape=(None, K), name='T')
W1 = tf.Variable(W1_init.astype(np.float32))
b1 = tf.Variable(b1_init.astype(np.float32))
W2 = tf.Variable(W2_init.astype(np.float32))
b2 = tf.Variable(b2_init.astype(np.float32))
W3 = tf.Variable(W3_init.astype(np.float32))
b3 = tf.Variable(b3_init.astype(np.float32))
# Define the model using tensorflow functions
Z1 = tf.nn.relu( tf.matmul(X, W1) + b1 ) # 1st hidden layer output
Z2 = tf.nn.relu( tf.matmul(Z1, W2) + b2 ) # 2nd hidden layer output
Yish = tf.matmul(Z2, W3) + b3
# Note: called Yish above because it is not really Y. It is just the matrix mutliplication
# of Z2 and W3 plus b3, without doing the softmax. This is because the softmax is included
# in the cost calculation for some reason.
# softmax_cross_entropy_with_logits take in the "logits"
# If you wanted to know the actual output of the neural net,
# you could pass "Yish" into tf.nn.softmax(logits)
cost = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=Yish, labels=T))
# Now we need our train and prediction functions.
# We choose the optimizer but don't implement the algorithm ourselves
# Let's go with RMSprop, since we just learned about it. It includes momentum!
train_op = tf.train.RMSPropOptimizer(lr, decay=0.99, momentum=0.9).minimize(cost)
# Used to calculate the error rate
predict_op = tf.argmax(Yish, 1)
costs = []
init = tf.global_variables_initializer() # Initialize variables
with tf.Session() as session: # Start session
session.run(init) # Run init function
for i in range(max_iter): # Usual for loop
for j in range(n_batches):
Xbatch = Xtrain[j*batch_sz:(j*batch_sz + batch_sz),]
Ybatch = Ytrain_ind[j*batch_sz:(j*batch_sz + batch_sz),]
# Theano we just call function, Tensorflow we call session to run function
session.run(train_op, feed_dict={X: Xbatch, T: Ybatch})
if j % print_period == 0:
test_cost = session.run(cost, feed_dict={X: Xtest, T: Ytest_ind})
prediction = session.run(predict_op, feed_dict={X: Xtest})
err = error_rate(prediction, Ytest)
print("Cost / err at iteration i=%d, j=%d: %.3f / %.3f" % (i, j, test_cost, err))
costs.append(test_cost)
fig, ax = plt.subplots(figsize=(12,8))
plt.plot(costs)
plt.show()
if __name__ == '__main__':
main()
We are going to quickly now touch on a few of the concepts that TensorFlow will utilize. First we can talk about the graph.
A graph is a useful construct in deep learning because a neural network is a special case of a graph. Recall that a graph is just a set of nodes and edges. In deep learning, each node represents some value, or computations on other values.
So, why do we need a graph? Well, we have seen that backpropagation is very hard. It is NOT something we would want to have to write manually. Even with only 1 hidden layer the equations are difficult to derive, now imagine trying to do that for 100 hidden layers. However, we know that differentiation follows some very basic rules. For example, we know that the partial derivative of E with respect to C is 1, $\frac{\partial d}{\partial C} = 1$, and it does not depend on $D$ at all. So the edges of a graph tell us which way to calculate the derivatives.
You may also recall that in the deep learning notebook, we talked about how there is a recursiveness to backpropagation. No matter what layer you are in, derivative only depends on some error term that was calculated at the layer ahead, and is the same operation each time. Keep in mind that a tree is just a special case of a graph.
A session is a tensorflow specific construct. We know that google is the king of distributed systems. The key point when we talked about graphs was that none of the variables contained actual numbers (and the numbers you want to plug in may be too big to fit on just one machine).
So, if we define C = A + B, we don't know what number C should be unless we provide the numbers for A and B as well. All we know is how to calculate C. So, in other words, the actual values for A and B have not yet been loaded into tensorflows "memory" (memory is being used loosely here). Why is this important? Well, if we are doing computations on the CPU, then we will load our data (arrays) into the main RAM. However, if we are doing computations on the GPU then we will load data into GPU RAM, which is separate. In the google world, they distribute computation across multiple GPUs, so sometimes data is too big to even fit on 1 GPU. So, a session allows you to specify where you are going to do your computation, so that when you pass in actual numbers, they go to the right place and enough space is allocated for them to exist.
This also explains why we need to initialize variables, and pass in data through feed_dict. It is like telling tensorflow: "here is the value you are going to use for A, please copy it into your memory. Here is the value you are going to use for B, please copy it into your memory. Now perform the computation we asked for."