Nathaniel Dake Blog

7. TensorFlow

We are now going to go through a quick TensorFlow example. You may be wondering: "What is TensorFlow?" So here is a quick synopsis from the https://www.tensorflow.org/ website:

TensorFlowâ„¢ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

So TensorFlow is one of the main API's that is used in the creation of neural nets, and other machine learning algorithms. It is state of the art as of 2018, so let's get started.

We can begin with our imports.

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

Now let's create our random training data. This is the same process that we used earlier in our training and predictions walk throughs. We have three gaussian clouds, 500 of each class centered at (0,2), (2,2, and (-2,2). We will make a scatter plot of the data in case your forgot what it looked like, as well as creating indicator variable (i.e. one hot encoding) for the targets.

In [3]:
# create random training data again
Nclass = 500
D = 2 # dimensionality of input
M = 3 # hidden layer size
K = 3 # number of classes

X1 = np.random.randn(Nclass, D) + np.array([0, -2])
X2 = np.random.randn(Nclass, D) + np.array([2, 2])
X3 = np.random.randn(Nclass, D) + np.array([-2, 2])
X = np.vstack([X1, X2, X3]).astype(np.float32)

Y = np.array([0]*Nclass + [1]*Nclass + [2]*Nclass)

# let's see what it looks like
fig = plt.figure(figsize=(14,10))
plt.scatter(X[:,0], X[:,1], c=Y, s=100, alpha=0.5)
plt.show()

N = len(Y)
# turn Y into an indicator matrix for training
T = np.zeros((N, K))
for i in range(N):
    T[i, Y[i]] = 1

Tensorflow works with its own kind's of variables so we need to have a function for initing the weights. This will return a tf variable. It is going to be initialized the same was as in numpy, because everything in TensorFlow has a numpy analog. So it is going to be a a random normal of size shape with a standard deviation of 0.01.

In [9]:
def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

We will also want to define the forward direction, just as we did before with numpy. It is going to take in X, W1, b1, W2, b2 just as we did before. It will then use tensorflow functions: a tensorflow sigmoid and a tensorflow matrix multiplication. We then return just the activation (not the softmax). This is one main difference between numpy and tensorflow, is that when we calculate the cost we will want want the the logits, aka the activations, as the input- not the output of the softmax.

In [5]:
def forward(X, W1, b1, W2, b2):
    Z = tf.nn.sigmoid(tf.matmul(X, W1) + b1)
    return tf.matmul(Z, W2) + b2

Next up, we will create tensorflow placeholders. This represents your X and Y data. What happens in tensorflow is that it creates a graph, so it knows how to calculate everything, but nothing has a value yet. So the tfx below is just a placedholder for the data. We will say it is of type float32, and the shape of it is None by D. This way the second dimension will be D because that is the dimensionality of the data, but we can pass in any size N. We will do the same thing with tfY, a place holder for Y.

In [7]:
tfX = tf.placeholder(tf.float32, [None, D])
tfY = tf.placeholder(tf.float32, [None, K])

Now we will create our weights, similar to how we did in numpy.

In [10]:
W1 = init_weights([D, M])         # dimensions (D x M)
b1 = init_weights([M])            # dimensions (M x 1)
W2 = init_weights([M , K])        # dimensions (M x K)
b2 = init_weights([K])            # dimensions (K x 1)

Now we can get the output variable, remember this has no value yet.

In [11]:
py_x = forward(tfX, W1, b1, W2, b2)

Note: Here is another thing different about tensorflow. We define the cost function, which is the mean and then the function soft_max_cross_entropy_with_logits. This takes in our predictions, py_x, and the targets, tfY.

In [13]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    labels=tfY,
    logits=py_x
    )
)

So Tensorflow is going to calculate the gradients and do gradient descent automatically, so we don't have to specify the derivative in tensorflow. We are going to create what is a called a train function, and that comes from tf.train. We pass in a learning_rate

In [14]:
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)

Next we will create a predict that is just the argmax of py_x on axis = 1.

In [15]:
predict_op = tf.argmax(py_x, 1)

In tensorflow we need to create things called sessions. For now just treat this as something that needs to be done.

In [16]:
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

Now we will run our neural network by performing gradient descent. We will call sess.run() twice. The first time to perform backpropagation training, and the second time to make our predictions. This is going to take in whats called a feed_dict. This is a dictionary where the keys are tensorflow placeholders, and the values are the actual values you want to pass in (our X inputs and the T targets).

In [18]:
for i in range(1000):
    sess.run(train_op, feed_dict={tfX: X, tfY: T})             
    pred = sess.run(predict_op, feed_dict={tfX: X, tfY: T})   # returns our predictions
    if i % 100 == 0:
        print("Accuracy:", np.mean(Y == pred))
Accuracy: 0.9626666666666667
Accuracy: 0.962
Accuracy: 0.9613333333333334
Accuracy: 0.9606666666666667
Accuracy: 0.962
Accuracy: 0.962
Accuracy: 0.9613333333333334
Accuracy: 0.9613333333333334
Accuracy: 0.962
Accuracy: 0.962

Scikit Learn Artificial Neural Network

In the real world we won't be be implementing our Neural Networks from scratch; it simply doesn't make sense to do that when we have awesome libraries already there for us to utilize. Scikit Learn has a great implementation of this! This can be done in only 3 lines of code! A line to create the model, a line to train the model, and a line to make predictions!

In [21]:
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.utils import shuffle

def get_data():
    df = pd.read_csv('data/ecommerce_data.csv')
    data = df.as_matrix()
    np.random.shuffle(data)
    X = data[:,:-1]
    Y = data[:,-1].astype(np.int32)

    # one-hot encode the categorical data
    N, D = X.shape
    X2 = np.zeros((N, D+3))
    X2[:,0:(D-1)] = X[:,0:(D-1)] # non-categorical

    # one-hot
    for n in range(N):
      t = int(X[n,D-1])
      X2[n,t+D-1] = 1
    X = X2

    # split train and test
    Xtrain = X[:-100]
    Ytrain = Y[:-100]
    Xtest = X[-100:]
    Ytest = Y[-100:]

    # normalize columns 1 and 2
    for i in (1, 2):
        m = Xtrain[:,i].mean()
        s = Xtrain[:,i].std()
        Xtrain[:,i] = (Xtrain[:,i] - m) / s
        Xtest[:,i] = (Xtest[:,i] - m) / s

    return Xtrain, Ytrain, Xtest, Ytest

Xtrain, Ytrain, Xtest, Ytest = get_data()
# create the neural network
model = MLPClassifier(hidden_layer_sizes=(20, 20), max_iter=2000)

# train the neural network
model.fit(Xtrain, Ytrain)

# print the train and test accuracy
train_accuracy = model.score(Xtrain, Ytrain)
test_accuracy = model.score(Xtest, Ytest)
print("train accuracy:", train_accuracy, "test accuracy:", test_accuracy)
train accuracy: 0.9975 test accuracy: 0.93

© 2018 Nathaniel Dake