CIFAR-10 Example¶
Prerequisites for this tutorial are a good knowledge of Python and nuts-flow. Please read the nuts-flow tutorial if you haven’t. Some knowledge of Keras, and of course deep-learning, will be helpful.
Task¶
In this example we will implement a nuts-ml pipeline to classify CIFAR-10 images. CIFAR-10 is a classical benchmark problem in image recognition. Given are 10 categories (airplane, dog, ship, …) and the task is to classify small images of these objects accordingly.
The CIFAR-10 dataset consists of 60000 RGB images of size 32x32. There are 6000 images per class and the dataset is split into 50000 training images and 10000 test images. For more details see the Tech report.
In the following we will show how to use nuts-flow/ml and Keras to train a Convolutional Neural Network (CNN) on the CIFAR-10 data. For readability some code will be omitted (e.g. import statements) but the complete code and more examples can be found under nutsml/examples.
Network¶
The network architecture for the CNN is a slightly modified version of the Keras
CNN
example (Keras version 2.x) with the notable exception of the last line,
where the model is wrapped in a KerasNetwork
.
INPUT_SHAPE = (32, 32, 3)
NUM_CLASSES = 10
def create_network():
model = Sequential()
model.add(Convolution2D(32, (3, 3), padding='same',
input_shape=INPUT_SHAPE))
model.add(Activation('relu'))
model.add(Convolution2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Convolution2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, (3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(NUM_CLASSES))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
return KerasNetwork(model, 'weights_cifar10.hd5')
The wrapping allows us using the CNN as a nut
within a nuts-flow,
which simplifies training. The wrapper also takes a path to a weights file
for check-pointing. Weights are saved in the standard Keras format as
HDF5 file.
Note
So far only wrappers for Keras and Lasagne models are provided. However, any deep-learning library that accepts an iterable over mini-batches for training will work with nuts-ml.
Loading data¶
In many image processing applications the complete set of training images is too large to fit in memory and images are loaded in a streamed fashion. See read_images.py for an example that loads images sequentially.
CIFAR-10, however, is small benchmark data set and fits in memory. We therefore
take advantage of the function cifar10.load_data()
provided by Keras,
and load all images in memory but rearrange the data slightly
def load_samples():
from tensorflow.python.keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
train_samples = list(zip(x_train, map(int, y_train)))
test_samples = list(zip(x_test, map(int, y_test)))
return train_samples, test_samples
Specifically, we convert class labels from floats to integers,
and zip inputs x
and outputs y
to create lists with training and test samples.
Sample are then tuples of format (image, label)
, where the image is a
Numpy array of shape (32,32,3)
, and the label is an integer between 0 and 9,
indicating the class. We can verify the type and shape of the samples
by running the following flow
(complete code here )
train_samples, test_samples = load_samples()
train_samples >> Take(3) >> PrintColType() >> Consume()
which takes the first three samples and prints for each sample the data type and content information for the sample columns
item 0: <tuple>
0: <ndarray> shape:32x32x3 dtype:uint8 range:0-255
1: <int> 6
item 1: <tuple>
0: <ndarray> shape:32x32x3 dtype:uint8 range:5-254
1: <int> 9
item 2: <tuple>
0: <ndarray> shape:32x32x3 dtype:uint8 range:20-255
1: <int> 9
Note
The standard formats for image data in nuts-ml are NumPy arrays
of shape (h,w,3)
for RGB images, (h,w)
for gray-scale images
and (h,w,4)
for RGBA image.
Not only can we inspect the type of the data but we can also have a look at the images themselves
train_samples, test_samples = load_samples()
train_samples >> Take(3) >> PrintType() >> ViewImage(0) >> Consume()
Training¶
We will introduce the code for the network training in pieces before showing the complete code later. First, let us create the network and load the sample data using the functions introduced above
network = create_network()
train_samples, test_samples = load_samples()
Having a network and samples we can now train the network (for one epoch) with the following nuts-flow
train_samples >> augment >> rerange >> Shuffle(100) \
>> build_batch >> network.train() >> Consume()
The flow augments the training images by random transformations, re-ranges pixel values to [0, 1], shuffles the samples, builds mini-batches, trains the network and consumes outputs of the training (losses, accuracies).
Consume
and Shuffle
are nuts from nuts-flow. Image augmentation,
re-ranging and batch-building are parts of nuts-ml that we describe
in detail in the next sections.
Augmentation¶
Deep learning requires large data sets and a common strategy to increase the amount of image data is to augment the data set with randomly perturbed copies, e.g. rotated or blurred. Here we want augment the CIFAR-10 data set by flipping images horizontally and changing the brightness
p = 0.1
augment = (AugmentImage(0)
.by('identical', 1.0)
.by('fliplr', p)
.by('brightness', p, [0.7, 1.3]))
The AugmentImage
nut takes as parameter the index of the image within the
sample (image, label)
, here position 0 and augmentations are specified
by invoking by(transformation, probability, *args)
.
We augment by passing the unchanged image ('identical'
) through with
probability 1.0 (all of them), flipping images horizontally for 10%
of the samples (p = 0.1
), and randomly changing the brightness
in range [0.7, 1.3]
, again with 10% probability p
. We could have
a look at the augmented images and their labels using the following flow
(complete code here )
train_samples, test_samples = load_samples()
train_samples >> augment >> ViewImageAnnotation(0, 1, pause=1) >> Consume()
In detail: for every sample processed by AugmentImage
, the image is
extracted from position 0 of the sample tuple and new samples with the same label
but with augmented images are outputted. For each input image the identical
output image is generated (identical
), and additional augmented samples
(fliplr
, brightness
) are created with 10% probability each, resulting
in 20% more training data.
Transformation¶
Images returned by load_samples()
are NumPy arrays with integers in range
[0, 255]
. The network, however, expects floating point numbers (float32
)
in range [0,1]
. We therefore transform images by reranging
rerange = TransformImage(0).by('rerange', 0, 255, 0, 1, 'float32')
where TransformImage
takes as parameter the index of the image within
the sample and transformation are defined by invoking by(transformation, *args)
.
Note
Transformation are chained, meaning that an input image is transformed by sequentially applying all transformations to the image, resulting in one output image. Consequently, the number of input and output images after transformation are the same. Augmentations, on the other hand, are applied independently and the number of input and output images can differ.
See TransformImage
in transformer.py
for a list of available transformations. Each transformation can also be used for
augmentation. Custom transformations can be added via register
>>> from nutsml import TransformImage, AugmentImage
>>> my_brightness = lambda image, c: image * c
>>> TransformImage.register('my_brightness', my_brightness)
>>> transform = TransformImage(0).by('my_brightness', 1.5)
>>> augment = AugmentImage(0).by('my_brightness', [0.7, 1.3])
While transformations take a specific parameter values, e.g. 1.5
for brightness,
augmentations take ranges, e.g. [0.7, 1.3]
, where parameter values are
uniformly sampled from.
Batching¶
Networks are trained with mini-batches of samples, e.g. a stack of images
with their corresponding class labels. BuildBatch(batchsize)
is used to build these batches. The following example creates a batcher that
extracts images from column 0 of the samples and class labels from column 1.
Class labels are encode as one-hot vectors, while images within the batch
are represented as NumPy arrays with dtype float32
.
NUM_CLASSES = 10
BATCH_SIZE = 32
build_batch = (BuildBatch(BATCH_SIZE)
.input(0, 'image', 'float32')
.output(1, 'one_hot', 'uint8', NUM_CLASSES))
Having a batcher we can now build a complete pipeline that trains the network for one epoch
train_samples >> augment >> rerange >> build_batch >> network.train() >> Consume()
Note
Consume()
, Collect()
, Unzip()
or some other data sink is needed.
Without a consumer at the end of the pipeline no data is processed.
Usually it is a good idea to shuffle the data (especially after augmentation) to ensure that each mini-batch contains a nice distribution of different class examples. Complete shuffling is not feasible if the training images do not fit in memory but we can perform a partial shuffling, e.g. over 100 samples. Let’s also train for more than one epoch
EPOCHS = 20
for epoch in range(EPOCHS):
(train_samples >> augment >> rerange >> Shuffle(100) >> build_batch >>
network.train() >> Consume())
Training results¶
Instead of consuming (and throwing away) the outputs of the training we can collect and print the results (loss, accuracy)
for epoch in range(EPOCHS):
t_loss, t_acc = (train_samples >> augment >> rerange >> Shuffle(100) >>
build_batch >> network.train() >> Unzip())
print("train loss :", t_loss >> Mean())
print("train acc :", t_acc >> Mean())
network.train()
takes mini-batches as input and outputs loss and accuracy
per mini-batch as specified in create_network()
. Unzip()
transforms the
outputted sequence of (loss, accuracy)
tuples into a sequence of losses
t_loss
and a sequence of accuracies t_acc
.
Finally, we print the mean (over mini-batches) for training loss and accuracy.
Validation¶
The CIFAR-10 data set is divided into a training and a test set but does not come with a validation set per default. However, we can easily split the training set into a new training set and a validation set
train_samples, val_samples = train_samples >> SplitRandom(0.8)
The new training set will contain 80% of the original set and the validation set the remainder.
Note
SplitRandom()
can split into more than two sets and can take constraints
into account and SplitLeaveOneOut()
performs leave-one-out splits.
The performance of the network on the validation data can then be computed analogous
to the way the training results were computed. Important differences are
that we are using the validation data, calling network.validate()
instead of
network.train()
, do not perform augmentation and there is no need to shuffle the data
for epoch in range(EPOCHS):
v_loss, v_acc = val_samples >> rerange >> build_batch >> network.validate() >> Unzip()
print("val loss :", v_loss >> Mean())
print("val acc :", v_acc >> Mean())
Again, printed results are mean values over mini-batch losses and accuracies.
Evaluation¶
Validation accuracy averaged over mini-batches provides a reasonable estimate for the
prediction accuracy and is, for instance, useful for early stopping,
but is not an accurate measure of the true classification performance. Typically
we want to evaluate on an independent test set and average over samples, not mini-batches.
The code below calls network.evaluate()
to compute the categorical_accuracy
over all test samples
e_acc = test_samples >> rerange >> build_batch >> network.evaluate([categorical_accuracy])
print("evaluation acc :", e_acc)
In contrast to the training or validation accuracies computed by network.train()
or network.validate()
, network.evaluate()
returns a single number per metric and
no averaging is required.
Check-pointing¶
A common method to enable the continuation of an interrupted training or to implement
early-stopping is to save the network weights, either at regular intervals (e.g. at
each epoch) or when the validation accuracy reaches a new high.
Network weights can be easily be saved by invoking the save()
method
network.save()
where the path to the weights file was specified when wrapping the model via
KerasNetwork(model, weightsfile)
in create_network()
.
For early-stopping we want to save the weights depending on the validation loss
or accuracy. The following code shows how to compute the validation accuracy
and uses save_best()
to save the weights for the network with the highest
accuracy
v_acc = val_samples >> rerange >> build_batch >> network.validate() >> Get(1) >> Mean()
network.save_best(v_acc, isloss=False)
Note that the computation of the validation accuracy is slightly different than shown
before. Here we need only the accuracies but not the losses and therefore call Get(1)
to extract them. Since the output then contains only accuracies and not tuples
(loss, acc)
anymore, we can directly call Mean()
and don’t need to Unzip
.
If we want to save the network with the smallest loss instead, we can write
v_loss = val_samples >> rerange >> build_batch >> network.validate() >> Get(0) >> Mean()
network.save_best(v_loss, isloss=True)
Reading¶
The CIFAR-10 benchmark dataset is small enough to fit in memory. However, in many practical applications the image datasets are too large to be loaded in memory entirely and images need to be read sequentially from the file system. The following example shows how to read PNG images from a folder and to display them
show_image = ViewImage(0, pause=1, figsize=(2, 2), interpolation='spline36')
glob('images/*.png') >> ReadImage(None) >> show_image >> Consume()
ReadImage
takes a sequence of file paths as input, generated using glob
,
reads the image from the file system, and returns tuples of shape (image,)
,
where images are numpy arrays. We can then display the image with ViewImage
,
where 0
indicates the column in the input sample that contains the image
and pause=1
forces a pause of one second between images.
See cifar/read_images.py for a complete code example.
A common method to organize image data for network training on the file system is to store them in sub-folders named after the class labels, for instance
images\
0\
img123.jpg
img456.jpg
...
9\
img789.jpg
We can read these images with their corresponding class labels using the following code
ReadLabelDirs('images', '*.jpg') >> ReadImage(0) >> show_image >> Consume()
where ReadLabelDirs
returns tuples of the form (filepath, label)
.
See mnist/read_images.py for a complete example using the MNIST data.
Writing¶
Often we not only want to read image data but also write them, e.g. after transformation or augmentation. The following code writes the first 20 of the CIFAR-10 training images in PNG format to the file system
train_samples, _ = load_samples()
imagepath = 'images/img*.png'
train_samples >> Take(20) >> WriteImage(0, imagepath) >> Consume()
The filenames for the images are generated automatically by replacing the
*
in imagepath
by a running number. For instance, the code above
would create the following files
./images/img0.png
./images/img0.png
...
./images/img19.png
A more complex example that includes the class label of an image in its filename can be seen in cifar/write_images.py .
Prediction¶
After having trained and evaluated a network we usually want to apply it and predict labels for new images. Here an example
samples = glob('images/*.png') >> ReadImage(None)
pred_batch = BuildBatch(BATCH_SIZE).input(0, 'image', 'float32')
predictions = (samples >> rerange >> pred_batch >> network.predict() >>
Map(ArgMax()) >> Collect())
print(predictions)
As before we read images from the file system with ReadImage
, re-range
them and build a batch. Note that it would be easy to add a transformation
that resizes the new input images to the shape required by the network.
Note
For classification the batch needs to be created differently (without class labels) compared to training/evaluation, since class labels are not available - that is what we want to predict!
We call network.predict
to retrieve the prediction of the network for an
input image. The output is a softmax vector (see create_network()
) and
we use Map(ArgMax())
to get the class index. If you want the class index
together with the class probability Map(ArgMax(retvalue=True))
can be
called instead.
cifar/cnn_classify.py contains a more complex example that displays the image with the true and predicted class names.
Code¶
Here is the complete code (without imports) for the network training. The entire code can be found in cifar/cnn_train.py.
rerange = TransformImage(0).by('rerange', 0, 255, 0, 1, 'float32')
build_batch = (BuildBatch(BATCH_SIZE)
.input(0, 'image', 'float32')
.output(1, 'one_hot', 'uint8', NUM_CLASSES))
p = 0.1
augment = (AugmentImage(0)
.by('identical', 1.0)
.by('brightness', p, [0.7, 1.3])
.by('color', p, [0.7, 1.3])
.by('shear', p, [0, 0.1])
.by('fliplr', p)
.by('rotate', p, [-10, 10]))
plot_eval = PlotLines((0, 1), layout=(2, 1))
network = create_network()
train_samples, test_samples = load_samples()
train_samples, val_samples = train_samples >> SplitRandom(0.8)
for epoch in xrange(NUM_EPOCHS):
print('EPOCH:', epoch)
t_loss, t_acc = (train_samples >> PrintProgress(train_samples) >>
Pick(PICK) >> augment >> rerange >> Shuffle(100) >>
build_batch >> network.train() >> Unzip())
t_loss, t_acc = t_loss >> Mean(), t_acc >> Mean()
print("train loss : {:.6f}".format(t_loss))
print("train acc : {:.1f}".format(100 * t_acc))
v_loss, v_acc = (val_samples >> rerange >>
build_batch >> network.validate() >> Unzip())
v_loss, v_acc = v_loss >> Mean(), v_acc >> Mean()
print('val loss : {:.6f}'.format(v_loss))
print('val acc : {:.1f}'.format(100 * v_acc))
network.save_best(v_acc, isloss=False)
plot_eval((t_acc >> Mean(), v_acc))
print('testing...')
e_acc = (test_samples >> rerange >> build_batch >>
network.evaluate([categorical_accuracy]))
print('test acc : {:.1f}'.format(100 * e_acc))