Font Classification
with 5 Deep Learning Models Using TensorFlow
Alina Li Zhang
March, 2019
TensorFlow User Group Toronto - Women in AI
What you can get from this presentation:
Build decent Deep Learning models
with a few lines of code in
TensorFlow.
SansSerif VS. Serif
Data Engineering
convert images to grayscale with 36*36 pixels
- grayscale
- rgb
- rgba
add labels to dataset
- SansSerif 0
- Serif 1
split the dataset into 2 datasets -> permutate
5 Models
● logistic regression
● Single hidden layer model
● multiple hidden layers model
● Deep CNN with convolutional and pooling layer
● Deeper CNN with 2 conv and pooling layers
Logistic regression model - build model
sess = tf.InteractiveSession()
# These will be inputs
## Input pixels, flattened
x = tf.placeholder("float", [None, 1296])
## Known labels
y_ = tf.placeholder("float", [None,2])
# Variables
W = tf.Variable(tf.zeros([1296,2]))
b = tf.Variable(tf.zeros([2]))
# Just initialize
sess.run(tf.global_variables_initializer())
# Define model
y = tf.nn.softmax(tf.matmul(x,W) + b)
### End model specification, begin training code
Logistic regression model - training
# Climb on cross-entropy
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
logits = y + 1e-50, labels = y_))
# How we train
train_step = tf.train.GradientDescentOptimizer(
0.02).minimize(cross_entropy)
…
# Actually train
epochs = 3000
train_acc = np.zeros(epochs//10)
test_acc = np.zeros(epochs//10)
for i in tqdm(range(epochs)):
...
train_step.run(feed_dict={
x: train_dataset,
y_: train_labels})
Logistic regression model - computed weights
Single hidden layer model
# Hidden layer
num_hidden = 128
W1 = tf.Variable(tf.truncated_normal([1296, num_hidden],
stddev=1./math.sqrt(1296)))
b1 = tf.Variable(tf.constant(0.1,shape=[num_hidden]))
h1 = tf.sigmoid(tf.matmul(x,W1) + b1)
# Output Layer
W2 = tf.Variable(tf.truncated_normal([num_hidden, 2],
stddev=1./math.sqrt(2)))
b2 = tf.Variable(tf.constant(0.1,shape=[2]))
# Just initialize
sess.run(tf.global_variables_initializer())
# Define model
y = tf.nn.softmax(tf.matmul(h1,W2) + b2)
### End model specification, begin training code
# Actually train
epochs = 20000
train_acc = np.zeros(epochs//10)
test_acc = np.zeros(epochs//10)
for i in tqdm(range(epochs), ascii=True):
if i % 10 == 0:
# Check accuracy on train set
A = accuracy.eval(feed_dict={
x: train_dataset,
y_: train_labels})
train_acc[i//10] = A
# And now the validation set
A = accuracy.eval(feed_dict={
x: valid_dataset,
y_: valid_labels})
test_acc[i//10] = A
train_step.run(feed_dict={
x: train_dataset,
y_: train_labels})
Single hidden layer model
Single hidden layer model
The multiple hidden layer model
# Hidden layer 1
num_hidden1 = 256
W1 = tf.Variable(tf.truncated_normal([1296,num_hidden1],
stddev=1./math.sqrt(1296)))
b1 = tf.Variable(tf.constant(0.1,shape=[num_hidden1]))
h1 = tf.sigmoid(tf.matmul(x,W1) + b1)
# Hidden Layer 2
num_hidden2 = 64
W2 = tf.Variable(tf.truncated_normal([num_hidden1,
num_hidden2],stddev=2./math.sqrt(num_hidden1)))
b2 = tf.Variable(tf.constant(0.2,shape=[num_hidden2]))
h2 = tf.sigmoid(tf.matmul(h1,W2) + b2)
# Output Layer
W3 = tf.Variable(tf.truncated_normal([num_hidden2, 2],
stddev=1./math.sqrt(2)))
b3 = tf.Variable(tf.constant(0.1,shape=[2]))
# Just initialize
sess.run(tf.global_variables_initializer())
# Define model
y = tf.nn.softmax(tf.matmul(h2,W3) + b3)
### End model specification, begin training code
The multiple hidden layer model
The multiple hidden layer model
Deep CNN with convolutional and pooling layer
# Conv layer 1
num_filters = 4
winx = 5
winy = 5
W1 = tf.Variable(tf.truncated_normal(
[winx, winy, 1 , num_filters],
stddev=1./math.sqrt(winx*winy)))
b1 = tf.Variable(tf.constant(0.1,
shape=[num_filters]))
# 5x5 convolution, pad with zeros on edges
xw = tf.nn.conv2d(x_im, W1,
strides=[1, 1, 1, 1],
padding='SAME')
h1 = tf.nn.relu(xw + b1)
# 2x2 Max pooling, no padding on edges
p1 = tf.nn.max_pool(h1, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='VALID')
# Need to flatten convolutional output for use in dense layer
p1_size = np.product(
[s.value for s in p1.get_shape()[1:]])
p1f = tf.reshape(p1, [-1, p1_size ])
# Dense layer
num_hidden = 32
W2 = tf.Variable(tf.truncated_normal(
[p1_size, num_hidden],
stddev=2./math.sqrt(p1_size)))
b2 = tf.Variable(tf.constant(0.2,
shape=[num_hidden]))
h2 = tf.nn.relu(tf.matmul(p1f,W2) + b2)
# Output Layer
W3 = tf.Variable(tf.truncated_normal(
[num_hidden, 2],
stddev=1./math.sqrt(num_hidden)))
b3 = tf.Variable(tf.constant(0.1,shape=[2]))
Deep CNN with convolutional and pooling layer
Deep CNN with convolutional and pooling layer
Deeper CNN with 2 conv and pooling layers
# Conv layer 1
num_filters1 = 16
winx1 = 5
winy1 = 5
W1 = tf.Variable(tf.truncated_normal(
[winx1, winy1, 1 , num_filters1],
stddev=1./math.sqrt(winx1*winy1)))
b1 = tf.Variable(tf.constant(0.1,
shape=[num_filters1]))
# 5x5 convolution, pad with zeros on edges
xw = tf.nn.conv2d(x_im, W1,
strides=[1, 1, 1, 1],
padding='SAME')
h1 = tf.nn.relu(xw + b1)
# 2x2 Max pooling, no padding on edges
p1 = tf.nn.max_pool(h1, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='VALID')
Conv layer 2
num_filters2 = 4
winx2 = 3
winy2 = 3
W2 = tf.Variable(tf.truncated_normal(
[winx2, winy2, num_filters1, num_filters2],
stddev=1./math.sqrt(winx2*winy2)))
b2 = tf.Variable(tf.constant(0.1,
shape=[num_filters2]))
# 3x3 convolution, pad with zeros on edges
p1w2 = tf.nn.conv2d(p1, W2,
strides=[1, 1, 1, 1], padding='SAME')
h1 = tf.nn.relu(p1w2 + b2)
# 2x2 Max pooling, no padding on edges
p2 = tf.nn.max_pool(h1, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='VALID')
Deeper CNN with 2 conv and pooling layers
# Need to flatten convolutional output
p2_size = np.product(
[s.value for s in p2.get_shape()[1:]])
p2f = tf.reshape(p2, [-1, p2_size ])
# Dense layer
num_hidden = 32
W3 = tf.Variable(tf.truncated_normal(
[p2_size, num_hidden],
stddev=2./math.sqrt(p2_size)))
b3 = tf.Variable(tf.constant(0.2,
shape=[num_hidden]))
h3 = tf.nn.relu(tf.matmul(p2f,W3) + b3)
# Output Layer
W4 = tf.Variable(tf.truncated_normal(
[num_hidden, 2],
stddev=1./math.sqrt(num_hidden)))
b4 = tf.Variable(tf.constant(0.1,shape=[2]))
# Just initialize
sess.run(tf.global_variables_initializer())
# Define model
y = tf.nn.softmax(tf.matmul(h3,W4) + b4)
Deeper CNN with 2 conv and pooling layers
Deeper CNN with 2 conv and pooling layers
Why does accuracy decreasing?
source code: https://github.com/alinazhanguwo/fontClassification
Future Work
expanding source dataset:
- introduce random noise
- flip images
- rotate images
- etc.
MORE DATA > FINE-TUNED ALGORITHM
Summary - Model Evolution
Summary - Model Evolution
Summary - Model Evolution
Summary - Model Evolution
Summary - Model Evolution
Q & A

Font classification with 5 deep learning models using tensor flow

  • 1.
    Font Classification with 5Deep Learning Models Using TensorFlow Alina Li Zhang March, 2019 TensorFlow User Group Toronto - Women in AI
  • 2.
    What you canget from this presentation: Build decent Deep Learning models with a few lines of code in TensorFlow.
  • 3.
  • 4.
    Data Engineering convert imagesto grayscale with 36*36 pixels - grayscale - rgb - rgba add labels to dataset - SansSerif 0 - Serif 1
  • 5.
    split the datasetinto 2 datasets -> permutate
  • 6.
    5 Models ● logisticregression ● Single hidden layer model ● multiple hidden layers model ● Deep CNN with convolutional and pooling layer ● Deeper CNN with 2 conv and pooling layers
  • 7.
    Logistic regression model- build model sess = tf.InteractiveSession() # These will be inputs ## Input pixels, flattened x = tf.placeholder("float", [None, 1296]) ## Known labels y_ = tf.placeholder("float", [None,2]) # Variables W = tf.Variable(tf.zeros([1296,2])) b = tf.Variable(tf.zeros([2])) # Just initialize sess.run(tf.global_variables_initializer()) # Define model y = tf.nn.softmax(tf.matmul(x,W) + b) ### End model specification, begin training code
  • 8.
    Logistic regression model- training # Climb on cross-entropy cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits_v2( logits = y + 1e-50, labels = y_)) # How we train train_step = tf.train.GradientDescentOptimizer( 0.02).minimize(cross_entropy) … # Actually train epochs = 3000 train_acc = np.zeros(epochs//10) test_acc = np.zeros(epochs//10) for i in tqdm(range(epochs)): ... train_step.run(feed_dict={ x: train_dataset, y_: train_labels})
  • 9.
    Logistic regression model- computed weights
  • 10.
    Single hidden layermodel # Hidden layer num_hidden = 128 W1 = tf.Variable(tf.truncated_normal([1296, num_hidden], stddev=1./math.sqrt(1296))) b1 = tf.Variable(tf.constant(0.1,shape=[num_hidden])) h1 = tf.sigmoid(tf.matmul(x,W1) + b1) # Output Layer W2 = tf.Variable(tf.truncated_normal([num_hidden, 2], stddev=1./math.sqrt(2))) b2 = tf.Variable(tf.constant(0.1,shape=[2])) # Just initialize sess.run(tf.global_variables_initializer()) # Define model y = tf.nn.softmax(tf.matmul(h1,W2) + b2) ### End model specification, begin training code # Actually train epochs = 20000 train_acc = np.zeros(epochs//10) test_acc = np.zeros(epochs//10) for i in tqdm(range(epochs), ascii=True): if i % 10 == 0: # Check accuracy on train set A = accuracy.eval(feed_dict={ x: train_dataset, y_: train_labels}) train_acc[i//10] = A # And now the validation set A = accuracy.eval(feed_dict={ x: valid_dataset, y_: valid_labels}) test_acc[i//10] = A train_step.run(feed_dict={ x: train_dataset, y_: train_labels})
  • 11.
  • 12.
  • 13.
    The multiple hiddenlayer model # Hidden layer 1 num_hidden1 = 256 W1 = tf.Variable(tf.truncated_normal([1296,num_hidden1], stddev=1./math.sqrt(1296))) b1 = tf.Variable(tf.constant(0.1,shape=[num_hidden1])) h1 = tf.sigmoid(tf.matmul(x,W1) + b1) # Hidden Layer 2 num_hidden2 = 64 W2 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2],stddev=2./math.sqrt(num_hidden1))) b2 = tf.Variable(tf.constant(0.2,shape=[num_hidden2])) h2 = tf.sigmoid(tf.matmul(h1,W2) + b2) # Output Layer W3 = tf.Variable(tf.truncated_normal([num_hidden2, 2], stddev=1./math.sqrt(2))) b3 = tf.Variable(tf.constant(0.1,shape=[2])) # Just initialize sess.run(tf.global_variables_initializer()) # Define model y = tf.nn.softmax(tf.matmul(h2,W3) + b3) ### End model specification, begin training code
  • 14.
  • 15.
  • 16.
    Deep CNN withconvolutional and pooling layer # Conv layer 1 num_filters = 4 winx = 5 winy = 5 W1 = tf.Variable(tf.truncated_normal( [winx, winy, 1 , num_filters], stddev=1./math.sqrt(winx*winy))) b1 = tf.Variable(tf.constant(0.1, shape=[num_filters])) # 5x5 convolution, pad with zeros on edges xw = tf.nn.conv2d(x_im, W1, strides=[1, 1, 1, 1], padding='SAME') h1 = tf.nn.relu(xw + b1) # 2x2 Max pooling, no padding on edges p1 = tf.nn.max_pool(h1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID') # Need to flatten convolutional output for use in dense layer p1_size = np.product( [s.value for s in p1.get_shape()[1:]]) p1f = tf.reshape(p1, [-1, p1_size ]) # Dense layer num_hidden = 32 W2 = tf.Variable(tf.truncated_normal( [p1_size, num_hidden], stddev=2./math.sqrt(p1_size))) b2 = tf.Variable(tf.constant(0.2, shape=[num_hidden])) h2 = tf.nn.relu(tf.matmul(p1f,W2) + b2) # Output Layer W3 = tf.Variable(tf.truncated_normal( [num_hidden, 2], stddev=1./math.sqrt(num_hidden))) b3 = tf.Variable(tf.constant(0.1,shape=[2]))
  • 17.
    Deep CNN withconvolutional and pooling layer
  • 18.
    Deep CNN withconvolutional and pooling layer
  • 19.
    Deeper CNN with2 conv and pooling layers # Conv layer 1 num_filters1 = 16 winx1 = 5 winy1 = 5 W1 = tf.Variable(tf.truncated_normal( [winx1, winy1, 1 , num_filters1], stddev=1./math.sqrt(winx1*winy1))) b1 = tf.Variable(tf.constant(0.1, shape=[num_filters1])) # 5x5 convolution, pad with zeros on edges xw = tf.nn.conv2d(x_im, W1, strides=[1, 1, 1, 1], padding='SAME') h1 = tf.nn.relu(xw + b1) # 2x2 Max pooling, no padding on edges p1 = tf.nn.max_pool(h1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID') Conv layer 2 num_filters2 = 4 winx2 = 3 winy2 = 3 W2 = tf.Variable(tf.truncated_normal( [winx2, winy2, num_filters1, num_filters2], stddev=1./math.sqrt(winx2*winy2))) b2 = tf.Variable(tf.constant(0.1, shape=[num_filters2])) # 3x3 convolution, pad with zeros on edges p1w2 = tf.nn.conv2d(p1, W2, strides=[1, 1, 1, 1], padding='SAME') h1 = tf.nn.relu(p1w2 + b2) # 2x2 Max pooling, no padding on edges p2 = tf.nn.max_pool(h1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
  • 20.
    Deeper CNN with2 conv and pooling layers # Need to flatten convolutional output p2_size = np.product( [s.value for s in p2.get_shape()[1:]]) p2f = tf.reshape(p2, [-1, p2_size ]) # Dense layer num_hidden = 32 W3 = tf.Variable(tf.truncated_normal( [p2_size, num_hidden], stddev=2./math.sqrt(p2_size))) b3 = tf.Variable(tf.constant(0.2, shape=[num_hidden])) h3 = tf.nn.relu(tf.matmul(p2f,W3) + b3) # Output Layer W4 = tf.Variable(tf.truncated_normal( [num_hidden, 2], stddev=1./math.sqrt(num_hidden))) b4 = tf.Variable(tf.constant(0.1,shape=[2])) # Just initialize sess.run(tf.global_variables_initializer()) # Define model y = tf.nn.softmax(tf.matmul(h3,W4) + b4)
  • 21.
    Deeper CNN with2 conv and pooling layers
  • 22.
    Deeper CNN with2 conv and pooling layers
  • 23.
    Why does accuracydecreasing? source code: https://github.com/alinazhanguwo/fontClassification
  • 24.
    Future Work expanding sourcedataset: - introduce random noise - flip images - rotate images - etc. MORE DATA > FINE-TUNED ALGORITHM
  • 25.
    Summary - ModelEvolution
  • 26.
    Summary - ModelEvolution
  • 27.
    Summary - ModelEvolution
  • 28.
    Summary - ModelEvolution
  • 29.
    Summary - ModelEvolution
  • 30.

Editor's Notes

  • #8 assign a weight to each pixel in the image, then take the weighted sum of those pixels (beta for weights and X for pixels). This will give us a score for that image being a particular font. Every font will have its own set of weights, as they will value pixels differently. To convert these scores into proper probabilities (represented by Y), we will use what's called the softmax function to force their sum to be between 0 and 1
  • #9 Optimizing our model really means minimizing how wrong we are. With our labels in one-hot style, it's easy to compare these with the class probabilities predicted by the model. The categorical cross_entropy function is a formal way to measure this. While the exact statistics are beyond the scope of this course, you can think of it as punishing the model for more for less accurate predictions. To compute it, we multiply our one-hot real labels element-wise with the natural log of the predicted probabilities, then sum these values and negate them.
  • #10 After some small steps with basic computations, we successfully build a decent model with just logistic regression and a few lines of TensorFlow code.
  • #11 First, let's specify how many neurons we want with num_hidden = 128; this is essentially how many nonlinear combinations will get passed to the logistic progression in the end. To accommodate this, we also need to update the shape of the W1 and b1 weight tensors. They're now feeding into our hidden neurons, so they need to match the shape: W1 = tf.Variable(tf.truncated_normal([1296, num_hidden], stddev=1./math.sqrt(1296))) b1 = tf.Variable(tf.constant(0.1,shape=[num_hidden])) The way we compute the activation function of the weighted sum is with the single h1 line; this is to multiply our input pixels by their respective weights for each neuron: h1 = tf.sigmoid(tf.matmul(x,W1) + b1) Add the neuron bias term, and finally, put this through the sigmoid activation function; at this point, we have 128 intermediate values: Now it's just your friendly logistic regression again; you already know what to do. These newly computed 128 features need their own set of weights and biases to compute a score on the output class, that's W2 and b2, respectively. Note how the shape matches the shape of the neurons 128, and the number of the output class is 5: W2 = tf.Variable(tf.truncated_normal([num_hidden, 5], stddev=1./math.sqrt(5))) b2 = tf.Variable(tf.constant(0.1,shape=[5])) sess.run(tf.global_variables_initializer()) In all these weights, we initialize them with this strange truncated normal call. With neural networks, we want to get a good spread of initial values so our weights can climb to meaningful values rather than just getting zeroed out. Truncated normal holds random values from a normal distribution with the given standard deviation, a research standard scaled to the number of inputs, but throws out values that are too extreme, hence the truncation part of this. With our weights and neurons all defined, we set the final softmax model just as we did before, except we need to take care to use our 128 neurons as the input, h1, and the associated weights and biases, W2 and b2: y = tf.nn.softmax(tf.matmul(h1,W2) + b2)