CIFAR-10 Example

Prerequisites for this tutorial are a good knowledge of Python and nuts-flow. Please read the nuts-flow tutorial if you haven’t. Some knowledge of Keras, and of course deep-learning, will be helpful.

Task

In this example we will implement a nuts-ml pipeline to classify CIFAR-10 images. CIFAR-10 is a classical benchmark problem in image recognition. Given are 10 categories (airplane, dog, ship, …) and the task is to classify small images of these objects accordingly.

../_images/cifar10.png

The CIFAR-10 dataset consists of 60000 RGB images of size 32x32. There are 6000 images per class and the dataset is split into 50000 training images and 10000 test images. For more details see the Tech report.

In the following we will show how to use nuts-flow/ml and Keras to train a Convolutional Neural Network (CNN) on the CIFAR-10 data. For readability some code will be omitted (e.g. import statements) but the complete code and more examples can be found under nutsml/examples.

Network

The network architecture for the CNN is a slightly modified version of the Keras CNN example (Keras version 2.x) with the notable exception of the last line, where the model is wrapped in a KerasNetwork.

INPUT_SHAPE = (32, 32, 3)
NUM_CLASSES = 10

def create_network():
    model = Sequential()
    model.add(Convolution2D(32, (3, 3), padding='same',
                            input_shape=INPUT_SHAPE))
    model.add(Activation('relu'))
    model.add(Convolution2D(32, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.5))

    model.add(Convolution2D(64, (3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(Convolution2D(64, (3, 3))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.5))

    model.add(Flatten())
    model.add(Dense(512))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(NUM_CLASSES))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy',
                  optimizer='adam', metrics=['accuracy'])

    return KerasNetwork(model, 'weights_cifar10.hd5')

The wrapping allows us using the CNN as a nut within a nuts-flow, which simplifies training. The wrapper also takes a path to a weights file for check-pointing. Weights are saved in the standard Keras format as HDF5 file.

Note

So far only wrappers for Keras and Lasagne models are provided. However, any deep-learning library that accepts an iterable over mini-batches for training will work with nuts-ml.

Loading data

In many image processing applications the complete set of training images is too large to fit in memory and images are loaded in a streamed fashion. See read_images.py for an example that loads images sequentially.

CIFAR-10, however, is small benchmark data set and fits in memory. We therefore take advantage of the function cifar10.load_data() provided by Keras, and load all images in memory but rearrange the data slightly

def load_samples():
    from tensorflow.python.keras.datasets import cifar10
    (x_train, y_train), (x_test, y_test) = cifar10.load_data()
    train_samples = list(zip(x_train, map(int, y_train)))
    test_samples = list(zip(x_test, map(int, y_test)))
    return train_samples, test_samples

Specifically, we convert class labels from floats to integers, and zip inputs x and outputs y to create lists with training and test samples. Sample are then tuples of format (image, label), where the image is a Numpy array of shape (32,32,3), and the label is an integer between 0 and 9, indicating the class. We can verify the type and shape of the samples by running the following flow (complete code here )

train_samples, test_samples = load_samples()
train_samples >> Take(3) >> PrintColType() >> Consume()

which takes the first three samples and prints for each sample the data type and content information for the sample columns

item 0: <tuple>
  0: <ndarray> shape:32x32x3 dtype:uint8 range:0-255
  1: <int> 6
item 1: <tuple>
  0: <ndarray> shape:32x32x3 dtype:uint8 range:5-254
  1: <int> 9
item 2: <tuple>
  0: <ndarray> shape:32x32x3 dtype:uint8 range:20-255
  1: <int> 9

Note

The standard formats for image data in nuts-ml are NumPy arrays of shape (h,w,3) for RGB images, (h,w) for gray-scale images and (h,w,4) for RGBA image.

Not only can we inspect the type of the data but we can also have a look at the images themselves

train_samples, test_samples = load_samples()
train_samples >> Take(3) >> PrintType() >> ViewImage(0) >> Consume()
../_images/viewimage_cifar10.png

Training

We will introduce the code for the network training in pieces before showing the complete code later. First, let us create the network and load the sample data using the functions introduced above

network = create_network()
train_samples, test_samples = load_samples()

Having a network and samples we can now train the network (for one epoch) with the following nuts-flow

train_samples >> augment >> rerange >> Shuffle(100) \
              >> build_batch >> network.train() >> Consume()

The flow augments the training images by random transformations, re-ranges pixel values to [0, 1], shuffles the samples, builds mini-batches, trains the network and consumes outputs of the training (losses, accuracies).

Consume and Shuffle are nuts from nuts-flow. Image augmentation, re-ranging and batch-building are parts of nuts-ml that we describe in detail in the next sections.

Augmentation

Deep learning requires large data sets and a common strategy to increase the amount of image data is to augment the data set with randomly perturbed copies, e.g. rotated or blurred. Here we want augment the CIFAR-10 data set by flipping images horizontally and changing the brightness

p = 0.1
augment = (AugmentImage(0)
           .by('identical', 1.0)
           .by('fliplr', p)
           .by('brightness', p, [0.7, 1.3]))

The AugmentImage nut takes as parameter the index of the image within the sample (image, label), here position 0 and augmentations are specified by invoking by(transformation, probability, *args).

We augment by passing the unchanged image ('identical') through with probability 1.0 (all of them), flipping images horizontally for 10% of the samples (p = 0.1), and randomly changing the brightness in range [0.7, 1.3], again with 10% probability p. We could have a look at the augmented images and their labels using the following flow (complete code here )

train_samples, test_samples = load_samples()
train_samples >> augment >> ViewImageAnnotation(0, 1, pause=1) >> Consume()

In detail: for every sample processed by AugmentImage, the image is extracted from position 0 of the sample tuple and new samples with the same label but with augmented images are outputted. For each input image the identical output image is generated (identical), and additional augmented samples (fliplr, brightness) are created with 10% probability each, resulting in 20% more training data.

Transformation

Images returned by load_samples() are NumPy arrays with integers in range [0, 255]. The network, however, expects floating point numbers (float32) in range [0,1]. We therefore transform images by reranging

rerange = TransformImage(0).by('rerange', 0, 255, 0, 1, 'float32')

where TransformImage takes as parameter the index of the image within the sample and transformation are defined by invoking by(transformation, *args).

Note

Transformation are chained, meaning that an input image is transformed by sequentially applying all transformations to the image, resulting in one output image. Consequently, the number of input and output images after transformation are the same. Augmentations, on the other hand, are applied independently and the number of input and output images can differ.

See TransformImage in transformer.py for a list of available transformations. Each transformation can also be used for augmentation. Custom transformations can be added via register

>>> from nutsml import TransformImage, AugmentImage
>>> my_brightness = lambda image, c: image * c
>>> TransformImage.register('my_brightness', my_brightness)

>>> transform = TransformImage(0).by('my_brightness', 1.5)
>>> augment = AugmentImage(0).by('my_brightness', [0.7, 1.3])

While transformations take a specific parameter values, e.g. 1.5 for brightness, augmentations take ranges, e.g. [0.7, 1.3], where parameter values are uniformly sampled from.

Batching

Networks are trained with mini-batches of samples, e.g. a stack of images with their corresponding class labels. BuildBatch(batchsize) is used to build these batches. The following example creates a batcher that extracts images from column 0 of the samples and class labels from column 1. Class labels are encode as one-hot vectors, while images within the batch are represented as NumPy arrays with dtype float32.

NUM_CLASSES = 10
BATCH_SIZE = 32

build_batch = (BuildBatch(BATCH_SIZE)
                .input(0, 'image', 'float32')
                .output(1, 'one_hot', 'uint8', NUM_CLASSES))

Having a batcher we can now build a complete pipeline that trains the network for one epoch

train_samples >> augment >> rerange >> build_batch >> network.train() >> Consume()

Note

Consume(), Collect(), Unzip() or some other data sink is needed. Without a consumer at the end of the pipeline no data is processed.

Usually it is a good idea to shuffle the data (especially after augmentation) to ensure that each mini-batch contains a nice distribution of different class examples. Complete shuffling is not feasible if the training images do not fit in memory but we can perform a partial shuffling, e.g. over 100 samples. Let’s also train for more than one epoch

EPOCHS = 20
for epoch in range(EPOCHS):
    (train_samples >> augment >> rerange >> Shuffle(100) >> build_batch >>
     network.train() >> Consume())

Training results

Instead of consuming (and throwing away) the outputs of the training we can collect and print the results (loss, accuracy)

for epoch in range(EPOCHS):
    t_loss, t_acc = (train_samples >> augment >> rerange >> Shuffle(100) >>
                     build_batch >> network.train() >> Unzip())

    print("train loss  :", t_loss >> Mean())
    print("train acc   :", t_acc >> Mean())

network.train() takes mini-batches as input and outputs loss and accuracy per mini-batch as specified in create_network(). Unzip() transforms the outputted sequence of (loss, accuracy) tuples into a sequence of losses t_loss and a sequence of accuracies t_acc. Finally, we print the mean (over mini-batches) for training loss and accuracy.

Validation

The CIFAR-10 data set is divided into a training and a test set but does not come with a validation set per default. However, we can easily split the training set into a new training set and a validation set

train_samples, val_samples = train_samples >> SplitRandom(0.8)

The new training set will contain 80% of the original set and the validation set the remainder.

Note

SplitRandom() can split into more than two sets and can take constraints into account and SplitLeaveOneOut() performs leave-one-out splits.

The performance of the network on the validation data can then be computed analogous to the way the training results were computed. Important differences are that we are using the validation data, calling network.validate() instead of network.train(), do not perform augmentation and there is no need to shuffle the data

for epoch in range(EPOCHS):
    v_loss, v_acc = val_samples >> rerange >> build_batch >> network.validate() >> Unzip()
    print("val loss  :", v_loss >> Mean())
    print("val acc   :", v_acc >> Mean())

Again, printed results are mean values over mini-batch losses and accuracies.

Evaluation

Validation accuracy averaged over mini-batches provides a reasonable estimate for the prediction accuracy and is, for instance, useful for early stopping, but is not an accurate measure of the true classification performance. Typically we want to evaluate on an independent test set and average over samples, not mini-batches. The code below calls network.evaluate() to compute the categorical_accuracy over all test samples

e_acc = test_samples >> rerange >> build_batch >> network.evaluate([categorical_accuracy])
print("evaluation acc  :", e_acc)

In contrast to the training or validation accuracies computed by network.train() or network.validate(), network.evaluate() returns a single number per metric and no averaging is required.

Check-pointing

A common method to enable the continuation of an interrupted training or to implement early-stopping is to save the network weights, either at regular intervals (e.g. at each epoch) or when the validation accuracy reaches a new high. Network weights can be easily be saved by invoking the save() method

network.save()

where the path to the weights file was specified when wrapping the model via KerasNetwork(model, weightsfile) in create_network().

For early-stopping we want to save the weights depending on the validation loss or accuracy. The following code shows how to compute the validation accuracy and uses save_best() to save the weights for the network with the highest accuracy

v_acc = val_samples >> rerange >> build_batch >> network.validate() >> Get(1) >> Mean()
network.save_best(v_acc, isloss=False)

Note that the computation of the validation accuracy is slightly different than shown before. Here we need only the accuracies but not the losses and therefore call Get(1) to extract them. Since the output then contains only accuracies and not tuples (loss, acc) anymore, we can directly call Mean() and don’t need to Unzip.

If we want to save the network with the smallest loss instead, we can write

v_loss = val_samples >> rerange >> build_batch >> network.validate() >> Get(0) >> Mean()
network.save_best(v_loss, isloss=True)

Reading

The CIFAR-10 benchmark dataset is small enough to fit in memory. However, in many practical applications the image datasets are too large to be loaded in memory entirely and images need to be read sequentially from the file system. The following example shows how to read PNG images from a folder and to display them

show_image = ViewImage(0, pause=1, figsize=(2, 2), interpolation='spline36')
glob('images/*.png') >> ReadImage(None) >> show_image >> Consume()

ReadImage takes a sequence of file paths as input, generated using glob, reads the image from the file system, and returns tuples of shape (image,), where images are numpy arrays. We can then display the image with ViewImage, where 0 indicates the column in the input sample that contains the image and pause=1 forces a pause of one second between images. See cifar/read_images.py for a complete code example.

A common method to organize image data for network training on the file system is to store them in sub-folders named after the class labels, for instance

images\
  0\
     img123.jpg
     img456.jpg
     ...
  9\
     img789.jpg

We can read these images with their corresponding class labels using the following code

ReadLabelDirs('images', '*.jpg') >> ReadImage(0) >> show_image >> Consume()

where ReadLabelDirs returns tuples of the form (filepath, label). See mnist/read_images.py for a complete example using the MNIST data.

Writing

Often we not only want to read image data but also write them, e.g. after transformation or augmentation. The following code writes the first 20 of the CIFAR-10 training images in PNG format to the file system

train_samples, _ = load_samples()
imagepath = 'images/img*.png'
train_samples >> Take(20) >> WriteImage(0, imagepath) >> Consume()

The filenames for the images are generated automatically by replacing the * in imagepath by a running number. For instance, the code above would create the following files

./images/img0.png
./images/img0.png
...
./images/img19.png

A more complex example that includes the class label of an image in its filename can be seen in cifar/write_images.py .

Prediction

After having trained and evaluated a network we usually want to apply it and predict labels for new images. Here an example

samples = glob('images/*.png') >> ReadImage(None)

pred_batch = BuildBatch(BATCH_SIZE).input(0, 'image', 'float32')

predictions = (samples >> rerange >> pred_batch >> network.predict() >>
               Map(ArgMax()) >> Collect())
print(predictions)

As before we read images from the file system with ReadImage, re-range them and build a batch. Note that it would be easy to add a transformation that resizes the new input images to the shape required by the network.

Note

For classification the batch needs to be created differently (without class labels) compared to training/evaluation, since class labels are not available - that is what we want to predict!

We call network.predict to retrieve the prediction of the network for an input image. The output is a softmax vector (see create_network()) and we use Map(ArgMax()) to get the class index. If you want the class index together with the class probability Map(ArgMax(retvalue=True)) can be called instead.

cifar/cnn_classify.py contains a more complex example that displays the image with the true and predicted class names.

Code

Here is the complete code (without imports) for the network training. The entire code can be found in cifar/cnn_train.py.

rerange = TransformImage(0).by('rerange', 0, 255, 0, 1, 'float32')
build_batch = (BuildBatch(BATCH_SIZE)
               .input(0, 'image', 'float32')
               .output(1, 'one_hot', 'uint8', NUM_CLASSES))
p = 0.1
augment = (AugmentImage(0)
           .by('identical', 1.0)
           .by('brightness', p, [0.7, 1.3])
           .by('color', p, [0.7, 1.3])
           .by('shear', p, [0, 0.1])
           .by('fliplr', p)
           .by('rotate', p, [-10, 10]))
plot_eval = PlotLines((0, 1), layout=(2, 1))

network = create_network()

train_samples, test_samples = load_samples()
train_samples, val_samples = train_samples >> SplitRandom(0.8)

for epoch in xrange(NUM_EPOCHS):
    print('EPOCH:', epoch)

    t_loss, t_acc = (train_samples >> PrintProgress(train_samples) >>
                     Pick(PICK) >> augment >> rerange >> Shuffle(100) >>
                     build_batch >> network.train() >> Unzip())
    t_loss, t_acc = t_loss >> Mean(), t_acc >> Mean()
    print("train loss : {:.6f}".format(t_loss))
    print("train acc  : {:.1f}".format(100 * t_acc))

    v_loss, v_acc = (val_samples >> rerange >>
                     build_batch >> network.validate() >> Unzip())
    v_loss, v_acc = v_loss >> Mean(), v_acc >> Mean()
    print('val loss   : {:.6f}'.format(v_loss))
    print('val acc    : {:.1f}'.format(100 * v_acc))

    network.save_best(v_acc, isloss=False)
    plot_eval((t_acc >> Mean(), v_acc))

print('testing...')
e_acc = (test_samples >> rerange >> build_batch >>
         network.evaluate([categorical_accuracy]))
print('test acc   : {:.1f}'.format(100 * e_acc))