Going from regular numpy to pandas is a pretty straightforward leap, however, going to theano is not and there are certain things that do not happen as you would expect. So we can now talk about variables.
# We use theano.tensor so much we can call it T
import theano.tensor as T
In order to create a scalar, vector, and matrix we do so as follows:
# initializing various types of variables
c = T.scalar('c')
v = T.vector('v')
A = T.matrix('A')
There are also tensors in theano, which are for matrices of dimensionality 3 and greater. These may be seen if you work with an image that has not been flattened. So we have vector that are 784 in length, but need to be reshaped to 28x28 to view the image. So, if you wanted to store the images as squares and you had N images, then that would be (N x 28 x 28), which is a 3 dimensional Tensor. If you had 3 color channels, then you would have an (N x 3 x 28 x 28) tensor, which is 4 dimensional.
With that said, notice that the 3 variables we created above do not have values, they are just symbols, so we can even do algebra on them!
w = A.dot(v)
However, we still have not done any multiplication, which would be impossible since A and v don't have any values yet. So, how do we set values to A and v and find the result w? This is where theano functions come into play. So, lets import the top level theano module.
import theano
We can use this to create a theano function. Each function creation specifies the inputs and outputs.
matrix_times_vector = theano.function(inputs=[A,v], outputs=w)
Now we can import numpy so that we can create real arrays and call the function.
import numpy as np
A_val = np.array([[1,2], [3,4]])
v_val = np.array([5,6])
w_val = matrix_times_vector(A_val, v_val)
display(w_val)
Now this is nothing too impressive so far. However, one of the biggest advantages of theano is that it links all of these variables up into a graph. And we can use that structure to calculate gradients using the chain rule. In theano, regular variables are not updateable. To make an updateable variable we need to make what is called a shared variable. Let's do that now.
# Creating a shared variable so that we can do gradient descent
# This will add another layer of complexity to the theano function
# First argument is the initial value, second value is it's name
x = theano.shared(20.0, 'x')
Let's also create a simple cost function that we can solve ourselves, and we know has a global minimum.
# Cost function that has a minimum value
cost = x*x + x + 1
Now, let's tell theano how we want to update x, by giving it an update expression.
# In theano you do not have to compute gradients yourself, it calculates them automatically
# Grad function takes in two parameters:
# parameter 1: function you want to take the gradient of
# parameter 2: variable you want the gradient with respect to
# You can pass in multiple variables as a list into the second parameter
x_update = x - 0.3*T.grad(cost, x)
Now we can create the theano train function. This is like the previous function we created, except we are going to add a new argument which is updates. The updates argument takes in a list of tuples, and each tuple has 2 things in it:
# Note that there are no inputs
train = theano.function(inputs=[], outputs=cost, updates=[(x, x_update)])
So, we have created a function to train, but we have not actually called it yet. Now, notice that x is not an input, it is the thing that we update. In later examples the inputs will be the data and labels. So, the inputs param takes in data and labels, while the updates param takes in your model parameters with their updates.
Now we have to write our own loop to call the training function.
for i in range(25):
cost_val = train()
print(cost_val)
# Print the optimal value of x
print(x.get_value())
We are now going to build a neural network with theano, using the basics that we learned in part 1. We can start with our imports.
import numpy as np
import theano
import theano.tensor as T
import matplotlib.pyplot as plt # Pulling in so that we can plot the log-likelihood
import seaborn as sns
from util import get_normalized_data, y2indicator # Util to get data and create ind matrix
# Seaborn Plot Styling
sns.set(style="white", palette="husl")
sns.set_context("poster")
sns.set_style("ticks")
We can now define an error rate function. New versions of Theano have the ReLU function, but incase yours does not we can create one right now. Note that both of these functions use true false values and turn them into numbers.
def error_rate(p, t):
return np.mean(p != t)
def relu(a):
return a * (a > 0)
We can now write out main function.
def main():
"""------------- Step 1: Get our data and define the usual variables ------------"""
X, Y = get_normalized_data()
max_iter = 20
print_period = 10
lr = 0.00004
reg = 0.01
Xtrain = X[:-1000,]
Ytrain = Y[:-1000]
Xtest = X[-1000:,]
Ytest = Y[-1000:]
Ytrain_ind = y2indicator(Ytrain)
Ytest_ind = y2indicator(Ytest)
N, D = Xtrain.shape
batch_sz = 500
n_batches = N // batch_sz
M = 300 # 300 hidden units
K = 10 # 10 classes
W1_init = np.random.randn(D, M) / 28
b1_init = np.zeros(M)
W2_init = np.random.randn(M, K) / np.sqrt(M)
b2_init = np.zeros(K)
"""------------- Step 2: Define theano variables and expressions ---------------"""
thX = T.matrix('X') # Placeholder for X input matrix
thT = T.matrix('T') # Placeholder for the targets
W1 = theano.shared(W1_init, 'W1') # All parameters will be shared variables
b1 = theano.shared(b1_init, 'b1') # Shared variable: first arg is initial value
W2 = theano.shared(W2_init, 'W2') # second arg is name
b2 = theano.shared(b2_init, 'b2')
thZ = relu( thX.dot(W1) + b1) # Create function to solve for Z using relu
thY = T.nnet.softmax( thZ.dot(W2) + b2) # Create function to solve for Y using softmax
# cost is sum of targets times log of predictions plus regularization
cost = ( -(thT * T.log(thY)).sum() + reg*((W1*W1).sum() +
(b1*b1).sum() + (W2*W2).sum() + (b2*b2).sum()))
prediction = T.argmax(thY, axis=1) # Need prediction to calculate error rate
"""------------- Step 3: Create training/update expressions ---------------"""
# We can just include regularization as part of the cost because it is also
# automatically differentiated!
# update_W1 = W1 - lr*(T.grad(cost, W1) + reg*W1)
# update_b1 = b1 - lr*(T.grad(cost, b1) + reg*b1)
# update_W2 = W2 - lr*(T.grad(cost, W2) + reg*W2)
# update_b2 = b2 - lr*(T.grad(cost, b2) + reg*b2)
update_W1 = W1 - lr*T.grad(cost, W1)
update_b1 = b1 - lr*T.grad(cost, b1)
update_W2 = W2 - lr*T.grad(cost, W2)
update_b2 = b2 - lr*T.grad(cost, b2)
# Now we create our train function. Takes in placeholder for X input matrix and
# placeholder for targets matrix
train = theano.function(
inputs=[thX, thT],
updates=[(W1, update_W1), (b1, update_b1), (W2, update_W2), (b2, update_b2)]
)
# Create a function to get prediction because we want to do it over the whole dataset
get_prediction = theano.function(
inputs = [thX, thT],
outputs = [cost, prediction],
)
# Training Loop
costs = []
for i in range(max_iter):
for j in range(n_batches):
Xbatch = Xtrain[j*batch_sz:(j*batch_sz + batch_sz),]
Ybatch = Ytrain_ind[j*batch_sz:(j*batch_sz + batch_sz),]
train(Xbatch, Ybatch) # Calling the train function we created
if j % print_period == 0:
# calling in prediction function we created to get cost and prediction
cost_val, prediction_val = get_prediction(Xtest, Ytest_ind)
err = error_rate(prediction_val, Ytest)
print("Cost / err at iteration i=%d, j=%d: %.3f / %.3f" % (i, j, cost_val, err))
costs.append(cost_val)
fig, ax = plt.subplots(figsize=(12,8))
plt.plot(costs)
plt.show()
if __name__ == '__main__':
main()