Overview

Today’s goal: to fit Convolutional Neural Network (CNN) models to image data using Keras

Resources

1 Keras

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result as fast as possible is key to doing good research.

Keras is simple (but not simplistic), flexible (simple workflows should be quick and easy, but you can also write arbitrarily advanced workflows in a clear way), and powerful, proving both performance and scalability. For more information, see here.

Exercise 16.1a

Install Keras on your pc: see Appendix F - Installing Keras on how to do this.

2 Loading data

In this tutorial, you will continue with the images that you prepared in tutorial 14.

Exercise 16.2a
Load the tidyverse, imager and abind packages. Then, load the data you saved at the end of tutorial 14: the .rds files train0, train1, val0, val1 and testgrid. If you want, you can also download the .rds files from Brightspace (Image analyses > Data > Images 1 - prepared), as we saved them in rendering these pages. To load a rds file, you can use the function readRDS, see tutorial 1.

# Load libraries
library(tidyverse)
library(imager)
library(abind)

# Load data from tutorial 14
train0 <- readRDS("data/processed/images1/train0.rds")
train1 <- readRDS("data/processed/images1/train1.rds")
val0 <- readRDS("data/processed/images1/val0.rds")
val1 <- readRDS("data/processed/images1/val1.rds")
testgrid <- readRDS('data/processed/images1/testgrid.rds')

3 Preparing data

The imager package is a convenient package to process your image data (as we saw in tutorial 14), but Keras expects our data to look a bit different compared to the cimg objects. So let’s convert our data now to make it suitable to train, validate and test CNNs with Keras.

Keras expects one array for all your training input data, one array for all your validation input data and one array for all your test input data. In these arrays the first dimension represents the different data records (so images), the second dimension represents the x-coordinate of the pixels, the third dimension the y-coordinate of the pixels and the final dimension the colour channels.

What should the dimensions be for our training, validation and test data?

We have 200 training images (the 50 selected “bush” and 50 selected “other” images, plus all of them augmented by mirroring along the vertical axis), 100 validation images and 5047 (which can differ a bit in your case) test images. All images are 50 pixels wide by 50 pixels high. Finally all images consist of 3 colour channels (red, green and blue).

This means that our training input data array should become 200 by 50 by 50 by 3, our validation data 100 by 50 by 50 by 3 and our test data 5047 by 50 by 50 by 3.

Exercise 16.3a
Let’s create the input training, validation and test arrays. For this we need the abind package to bind arrays together. Start by taking a look at the help file of the abind function of this package. Now combine the train1 and train0 objects (in that order!) together into one array along the third dimension and call it trainx. Also do the same for the validation set and convert the test set into an array as well (call the outputs valx and testx, respectively). You can do this with only one line of code per dataset if you use the do.call function again, which in this case should abind all the elements of the lists together. Check afterwards if the conversion went allright.

# create training, validation and test arrays
trainx <- do.call(abind, list(c(train1, train0), along = 3))
valx   <- do.call(abind, list(c(val1, val0), along = 3))
testx  <- do.call(abind, list(testgrid, along = 3))

# check dimensions
dim(trainx)
## [1]  50  50 200   3
dim(valx)
## [1]  50  50 100   3
dim(testx)
## [1]   50   50 5047    3
Exercise 16.3b

Something is still not right about our input data. Keras want the first dimension to be the different data records (i.e. images), yet now we have it mapped to the third dimension of the array. However, we can change this easily with a base R function:

# change order array elements
trainx <- aperm(trainx, c(3,1,2,4))
dim(trainx)
## [1] 200  50  50   3
valx <- aperm(valx, c(3,1,2,4))
dim(valx)
## [1] 100  50  50   3
testx <- aperm(testx, c(3,1,2,4))
dim(testx)
## [1] 5047   50   50    3

For the labels (or output data), Keras expects one matrix for the training data, one matrix for the validation data and one matrix for the test data. The first dimension of this matrix (the rows) again represents the different data records (so images) and the second dimension (columns) represents the different labels. Because we want to perform binary classification, we need to create two labels and thus two columns.

Exercise 16.3c
Create these matrices now for the training and validation set (for the test set we did not create the labels) and call them trainy and valy. Remember that the first half of the records in the trainx and valx arrays were the images of bush and the second half of no-bush. Make sure that your second column represents the bush-class and that the matrix contains only ones (for TRUE) and zeroes (for FALSE). Check if the output is correct.

Below, we create the 2-column matrices with \(n\) rows, where the first \(\frac{1}{2}n\) rows get the value 0 for the first column yet the value 1 for the second column, and the last \(\frac{1}{2}n\) have the value 1 for the first column and the value 0 for the second column. We do this by making use of the matrix function where we specify that there are 2 columns (ncol= 2), and where the data for the matrix is a vector of length \(2n\): first \(\frac{1}{2}n\) times the value 0, than \(n\) times the value 1 and again \(\frac{1}{2}n\) times the value 0. This vector of data fills the \(n*2\) matrix, by column:

# Creating the matrices with labels
trainy <- matrix(c(rep(0, 100),
                   rep(1, 200),
                   rep(0, 100)), 
                 ncol = 2)
valy <- matrix(c(rep(0, 50),
                 rep(1, 100),
                 rep(0, 50)), 
               ncol = 2)

# Checking the matrices with labels
dim(trainy)
## [1] 200   2
head(trainy)
##      [,1] [,2]
## [1,]    0    1
## [2,]    0    1
## [3,]    0    1
## [4,]    0    1
## [5,]    0    1
## [6,]    0    1
tail(trainy)
##        [,1] [,2]
## [195,]    1    0
## [196,]    1    0
## [197,]    1    0
## [198,]    1    0
## [199,]    1    0
## [200,]    1    0
dim(valy)
## [1] 100   2
head(valy)
##      [,1] [,2]
## [1,]    0    1
## [2,]    0    1
## [3,]    0    1
## [4,]    0    1
## [5,]    0    1
## [6,]    0    1
tail(valy)
##        [,1] [,2]
##  [95,]    1    0
##  [96,]    1    0
##  [97,]    1    0
##  [98,]    1    0
##  [99,]    1    0
## [100,]    1    0

4 Training the CNN

We have now prepared the data such that we can finally start training a CNN!

Exercise 16.4a

Now we’re going to set up the code for defining, compiling and fitting the model. If you haven’t done so before, read through these two tutorials: Guide to Keras Basic and Guide to the Sequential Model.

Load the keras package, and define a CNN with one convolutional layer with 50 filters of size 3x3 and ReLU activation, followed by a flattening layer and the final dense layer to predict bush and no-bush with a Softmax activation. Use the stochastic gradient descent optimizer with default settings, binary cross-entropy for the loss function and accuracy for the metric. Run this for 100 epochs with a batch size of 10 and validate the outcome for every epoch on the validation set. Make sure to store this run of 100 epochs into an object.

# Load keras
library(keras)

# specify model
model <- keras_model_sequential() %>% 
  layer_conv_2d(filters = 50,
                kernel_size = c(3,3),
                activation = 'relu', 
                input_shape = c(50,50,3)) %>% 
  layer_flatten() %>% 
  layer_dense(units = 2, activation = 'softmax')

# Compile model
model <- model %>% 
  compile(optimizer = optimizer_sgd(),
          loss = 'binary_crossentropy',
          metrics = list('accuracy'))

# Fit model
history <- model %>% 
  fit(trainx, 
      trainy, 
      batch_size = 10, 
      epochs = 100,
      validation_data = list(valx, valy))

When R is busy fitting your model, the performance on the train and validation set are already plotted for each epoch that has been completed until then. When it is done with all 100 epochs, print the output to the R console. It will display some statistics:

history
## 
## Final epoch (plot to see history):
##         loss: 0.06438
##     accuracy: 0.98
##     val_loss: 0.05965
## val_accuracy: 0.97

The object itself is a list, and in the second element of this list are the accuracy and loss of both the training and validation set for the subsequent epochs. With the plot function these four vectors will be visualized as two smooth lines in two charts (one for the loss and one for the accuracy, both with a line for the training and validation set) to visualize it. Try this now.

plot(history)

Well, that doesn’t look too bad for a first try with only a small model. Eventually the accuracy on our validation set reaches more than 90%, but this took quite a number of epochs and the performance doesn’t look very stable. Given that everyone has different training and validation data, your results may of course be different. It is very likely that with this current model architecture your model will even not be able to make good predictions at all on the validation (or even train) set. Don’t worry if that’s the case for you. But hang on, is it really a small model?

summary(model)
## Model: "sequential_3"
## ____________________________________________________________________________________________________________________________________________
##  Layer (type)                                                  Output Shape                                            Param #              
## ============================================================================================================================================
##  conv2d_4 (Conv2D)                                             (None, 48, 48, 50)                                      1400                 
##  flatten_3 (Flatten)                                           (None, 115200)                                          0                    
##  dense_3 (Dense)                                               (None, 2)                                               230402               
## ============================================================================================================================================
## Total params: 231802 (905.48 KB)
## Trainable params: 231802 (905.48 KB)
## Non-trainable params: 0 (0.00 Byte)
## ____________________________________________________________________________________________________________________________________________

Our small model still has more than 230 thousand parameters that were trained… This is of course way more than what is common for general frequentist models, but for a CNN it is indeed still small. However, we see that especially in our last dense layer many parameters were involved. If we were to reduce the size of the feature vector that is used as the input of the final layer, then we can substantially reduce our number of parameters.

As you may remember from the second practical, we can reduce the size of our data with a pooling layer. Let’s try this now.

Create a new model (called model2), but now after the convolution layer add a max pooling layer with a kernel size of 3 by 3.

Then compile and fit the model again. Make sure to save the output as history2 and visualize this afterwards.

How many trainable parameters will this new CNN have?

model2 <- keras_model_sequential() %>% 
  layer_conv_2d(filters = 50,
                kernel_size = c(3,3),
                activation = 'relu', 
                input_shape = c(50,50,3)) %>% 
  layer_max_pooling_2d(pool_size = c(3,3)) %>% 
  layer_flatten() %>% 
  layer_dense(units = 2, activation = 'softmax')
summary(model2)
## Model: "sequential_4"
## ____________________________________________________________________________________________________________________________________________
##  Layer (type)                                                  Output Shape                                            Param #              
## ============================================================================================================================================
##  conv2d_5 (Conv2D)                                             (None, 48, 48, 50)                                      1400                 
##  max_pooling2d_2 (MaxPooling2D)                                (None, 16, 16, 50)                                      0                    
##  flatten_4 (Flatten)                                           (None, 12800)                                           0                    
##  dense_4 (Dense)                                               (None, 2)                                               25602                
## ============================================================================================================================================
## Total params: 27002 (105.48 KB)
## Trainable params: 27002 (105.48 KB)
## Non-trainable params: 0 (0.00 Byte)
## ____________________________________________________________________________________________________________________________________________
model2 <- model2 %>% 
  compile(optimizer = optimizer_sgd(),
          loss = 'binary_crossentropy',
          metrics = list('accuracy'))

history2 <- model2 %>% 
  fit(trainx, 
      trainy, 
      batch_size = 10, 
      epochs = 100,
      validation_data = list(valx, valy))
plot(history2)

Great, that looks better! The training accuracy especially increases quite steadily over the epochs. But the validation accuracy can do with a bit of improvement still. Adding one or two extra convolution layer does not add a lot of parameters to our model, especially not if we also add one or two dropout layers to it as well. However, with these extra convolution layers the model will be able to detect more complicated (higher-order) patterns in the images than (for example) edges.

Try to add one or two convolution layers to the model and run it again. You can add these before or after the max pooling layer, note how this will impact your number of parameters. Add a drop-out layer after the last (or perhaps even each) convolution layer with a drop rate of 0.25 to reduce the number of parameters. Finally, because we convolve multiple times, the images will lose quite some number of pixels from their edges in the final layers (remember from the previous practical?). To prevent this add padding = "same" as an argument to the convolution layers. This will create an extra layer of zeroes around the images before executing the convolution, which will result in the same dimension of output images as the input.

See if the performance has improved in this 3rd model:

  • do the accuracies show a small spread?
  • do the accuracies quickly improve over the epochs?
  • are the training and validation accuracies in the same range?
  • do the accuracies reach an asymptote?
  • are the accuracies close to 100%?

The below model is our own try, how does this look compared to your own?

model3 <- keras_model_sequential() %>% 
  layer_conv_2d(filters = 50,
                kernel_size = c(3,3),
                activation = 'relu', 
                input_shape = c(50,50,3),
                padding = "same") %>% 
  layer_dropout(rate = 0.25) %>%
  layer_max_pooling_2d(pool_size = c(3,3)) %>% 
  layer_conv_2d(filters = 10,
                kernel_size = c(3,3),
                activation = 'relu', 
                padding = "same") %>% 
  layer_dropout(rate = 0.25) %>% 
  layer_flatten() %>% 
  layer_dense(units = 2, activation = 'softmax')
summary(model3)
## Model: "sequential_5"
## ____________________________________________________________________________________________________________________________________________
##  Layer (type)                                                  Output Shape                                            Param #              
## ============================================================================================================================================
##  conv2d_7 (Conv2D)                                             (None, 50, 50, 50)                                      1400                 
##  dropout_3 (Dropout)                                           (None, 50, 50, 50)                                      0                    
##  max_pooling2d_3 (MaxPooling2D)                                (None, 16, 16, 50)                                      0                    
##  conv2d_6 (Conv2D)                                             (None, 16, 16, 10)                                      4510                 
##  dropout_2 (Dropout)                                           (None, 16, 16, 10)                                      0                    
##  flatten_5 (Flatten)                                           (None, 2560)                                            0                    
##  dense_5 (Dense)                                               (None, 2)                                               5122                 
## ============================================================================================================================================
## Total params: 11032 (43.09 KB)
## Trainable params: 11032 (43.09 KB)
## Non-trainable params: 0 (0.00 Byte)
## ____________________________________________________________________________________________________________________________________________
model3 <- model3 %>% 
  compile(optimizer = optimizer_sgd(),
          loss = 'binary_crossentropy',
          metrics = list('accuracy'))

history3 <- model3 %>% 
  fit(trainx, 
      trainy, 
      batch_size = 10, 
      epochs = 100,
      validation_data = list(valx, valy))
plot(history3)

5 Evaluating predictions

Although the third performance plot does not look too different from the second, we’re happy enough with this last model to apply it on the test image.

Why do we want to apply the model on a test image?

It could be that the model is overfitted on our validation image, for example because our annotated examples did not include enough variation.

We have not labelled any of our test grid cells, but visualizing which grid cells our model predicts to be bush and which not will be very useful to visually check for the generalizability of the model. Let’s do that now. To do that we can add one line of code to generate the predictions on the test set.

# Predict fitted CNN on test data
testpred <- model3 %>%
  predict(testx)

Now we’re going to visualize the grid cells of the two predicted classes in the test image.

You have been working and thinking very hard until this point, so we’re going to make life a little bit easier for you by supplying the code to plot the test image classes for you. Try to copy and paste this code to run it for yourself.

With this code you will make two copies of the test grid cells (one image for bush and one for the other class) and make the cell’s transparency inversely proportional to the probability of the predicted class. After that you determine (using your original test image) how many grid cells fitted in both the x- and y- dimension, you can check if these numbers are correct by inspecting the filenames of the grid cells in your test-folder. Then you split the imlist into equal chunks, where each sublist contains all the grid cells of one row in the image. Please note that the order in which you should do this depends on the order in which you supplied your grid cells to the testx object in the first place. If you did this per column instead of per row, then you should of course also evaluate the predictions per column. Finally, you paste all the grid cells together again per row (in the loop) and then all the rows together after that.

test0 <- testgrid
test1 <- testgrid

for(i in seq_len(nrow(testpred))) {
  test0[[i]] <- colorise(test0[[i]],
                         px.all(test0[[i]]),
                         col = 'white',
                         alpha = testpred[i,2])
  test1[[i]] <- colorise(test1[[i]],
                         px.all(test1[[i]]),
                         col = 'white',
                         alpha = testpred[i,1])
}

max_x <- length(testgrid[grepl('y = 1 - 50',names(testgrid))])
max_y <- length(testgrid[grepl('x = 1 - 50',names(testgrid))])

test0 <- split(test0, rep(seq_len(max_x),
                          each = max_y))
test1 <- split(test1, rep(seq_len(max_x),
                          each = max_y))

for(x in seq_len(max_x)) {
  test0[[x]] <- imappend(test0[[x]], axis = 'y')
  test1[[x]] <- imappend(test1[[x]], axis = 'y')
}

test0 <- imappend(test0, axis = 'x')
test1 <- imappend(test1, axis = 'x')

plot(test0, main = "no bush")

plot(test1, main = "bush")

Not bad… We can see that what is bush is generally predicted as bush and what is not bush is also predicted as such. However, we do see that especially in the lower left corner a lot of presumably high shrubs or grass are predicted as bush. Furthermore, some small bushes that are not connected to other bushes are wrongly classified as not being bush. This last issue could probably be an artefact of the regular grid cells that might be too large for these bushes or that cut them in half. Still, not a result to be ashamed of!

6 Conclusion

Well, actually there is not really a conclusion here. Some stories have an open ending, just like this practical. We have created quite a good model already, with a good accuracy on the validation set and sensible results on the test set. However, we’ve also seen that the test set contains some predictions that could have been better.

7 Challenge

Challenge

Now see if you can beat our model… Check if you can for example correctly identify the grid cells on the bottom left corner of the test image. Or see if you can create a more stable accuracy and loss output for the validation set. Or see if you can achieve the same performance with a simpler model. Or go nuts and create a very complicated model. There are a lot more switches and levers that you can tweak in the CNN. We haven’t for example optimized our batch size and learning rate. Also think about the final layer, after the flattening we immediately predicted the two classes, see if some extra hidden dense layers will improve the performance. Maybe try different optimizer algorithms, a different loss function, or add learning rate decay functions. Take a look at the keras tutorials again and look at the help files of the functions to see what you can tweak. Finally, and maybe most importantly: take a look at your training and validation data! Do you have enough training records and do this data have enough variation per class? Updating the labels can maybe improve the performance of the model as well, especially on the patch in the bottom left of the test image.

To do all this, try to create a script where you loop over multiple combinations of hyperparameter values to optimize the performance of your CNN. (so using a grid search). You can do this by for example creating a data.frame or tibble with all combinations of hyperparameter values (by using the handy expand.grid function that you’ve used before in the lecture about the HPC). Then create a for loop that loops over all the rows in this data.frame or tibble that uses the set values for the hyperparameters at the corresponding iteration of the loop. Just don’t forget to save the output during each iteration of the loop, or otherwise it will be a shame of your time and CPU sweat and tears (not talking about experience here of course…).

Please let us know what you come up with, we’re always happy to learn from you… And above all: enjoy!

8 Recap

Today, we’ve finally fitted a CNN to the image data that we prepared the last days. By first cutting the original images into many (thousands) of smaller 50x50 pixel tiles, and then training a CNN image classifier to the sets of bush vs other classes, we’ve in some sense implemented segmentation of the full image by classifying all smaller tiles into the two classes.

An R script of today’s exercises can be downloaded here