.. _cifar-example: CIFAR-10 Example ================ Prerequisites for this tutorial are a good knowledge of Python and `nuts-flow `_. Please read the `nuts-flow tutorial `_ if you haven't. Some knowledge of `Keras `_, and of course deep-learning, will be helpful. Task ---- In this example we will implement a **nuts-ml** pipeline to classify CIFAR-10 images. `CIFAR-10 `_ is a classical benchmark problem in image recognition. Given are 10 categories (airplane, dog, ship, ...) and the task is to classify small images of these objects accordingly. .. image:: pics/cifar10.png The CIFAR-10 dataset consists of 60000 RGB images of size 32x32. There are 6000 images per class and the dataset is split into 50000 training images and 10000 test images. For more details see the `Tech report `_. In the following we will show how to use **nuts-flow/ml** and `Keras `_ to train a Convolutional Neural Network (CNN) on the CIFAR-10 data. For readability some code will be omitted (e.g. import statements) but the complete code and more examples can be found under `nutsml/examples `_. Network ------- The network architecture for the CNN is a slightly modified version of the Keras `CNN `_ example (Keras version 2.x) with the notable exception of the last line, where the model is wrapped in a ``KerasNetwork``. .. code:: Python INPUT_SHAPE = (32, 32, 3) NUM_CLASSES = 10 def create_network(): model = Sequential() model.add(Convolution2D(32, (3, 3), padding='same', input_shape=INPUT_SHAPE)) model.add(Activation('relu')) model.add(Convolution2D(32, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.5)) model.add(Convolution2D(64, (3, 3), padding='same')) model.add(Activation('relu')) model.add(Convolution2D(64, (3, 3)) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.5)) model.add(Flatten()) model.add(Dense(512)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(NUM_CLASSES)) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) return KerasNetwork(model, 'weights_cifar10.hd5') The wrapping allows us using the CNN as a ``nut`` within a **nuts-flow**, which simplifies training. The wrapper also takes a path to a weights file for check-pointing. Weights are saved in the standard Keras format as `HDF5 `_ file. .. note:: So far only wrappers for Keras and Lasagne models are provided. However, any deep-learning library that accepts an iterable over mini-batches for training will work with **nuts-ml**. Loading data ------------ In many image processing applications the complete set of training images is too large to fit in memory and images are loaded in a streamed fashion. See `read_images.py `_ for an example that loads images sequentially. CIFAR-10, however, is small benchmark data set and fits in memory. We therefore take advantage of the function ``cifar10.load_data()`` provided by Keras, and load all images in memory but rearrange the data slightly .. code:: Python def load_samples(): from tensorflow.python.keras.datasets import cifar10 (x_train, y_train), (x_test, y_test) = cifar10.load_data() train_samples = list(zip(x_train, map(int, y_train))) test_samples = list(zip(x_test, map(int, y_test))) return train_samples, test_samples Specifically, we convert class labels from floats to integers, and zip inputs ``x`` and outputs ``y`` to create lists with training and test samples. Sample are then tuples of format ``(image, label)``, where the image is a Numpy array of shape ``(32,32,3)``, and the label is an integer between 0 and 9, indicating the class. We can verify the type and shape of the samples by running the following flow (`complete code here `_ ) .. code:: Python train_samples, test_samples = load_samples() train_samples >> Take(3) >> PrintColType() >> Consume() which takes the first three samples and prints for each sample the data type and content information for the sample columns .. code:: Python item 0: 0: shape:32x32x3 dtype:uint8 range:0-255 1: 6 item 1: 0: shape:32x32x3 dtype:uint8 range:5-254 1: 9 item 2: 0: shape:32x32x3 dtype:uint8 range:20-255 1: 9 .. note:: The standard formats for image data in **nuts-ml** are NumPy arrays of shape ``(h,w,3)`` for RGB images, ``(h,w)`` for gray-scale images and ``(h,w,4)`` for RGBA image. Not only can we inspect the type of the data but we can also have a look at the images themselves .. code:: Python train_samples, test_samples = load_samples() train_samples >> Take(3) >> PrintType() >> ViewImage(0) >> Consume() .. image:: ../pics/viewimage_cifar10.png Training -------- We will introduce the code for the network training in pieces before showing the complete code later. First, let us create the network and load the sample data using the functions introduced above .. code:: Python network = create_network() train_samples, test_samples = load_samples() Having a network and samples we can now train the network (for one epoch) with the following **nuts-flow** .. code:: Python train_samples >> augment >> rerange >> Shuffle(100) \ >> build_batch >> network.train() >> Consume() The flow *augments* the training images by random transformations, *re-ranges* pixel values to [0, 1], *shuffles* the samples, *builds* mini-batches, *trains* the network and *consumes* outputs of the training (losses, accuracies). ``Consume`` and ``Shuffle`` are *nuts* from **nuts-flow**. Image augmentation, re-ranging and batch-building are parts of **nuts-ml** that we describe in detail in the next sections. Augmentation ^^^^^^^^^^^^ Deep learning requires large data sets and a common strategy to increase the amount of image data is to augment the data set with randomly perturbed copies, e.g. rotated or blurred. Here we want augment the CIFAR-10 data set by flipping images horizontally and changing the brightness .. code:: Python p = 0.1 augment = (AugmentImage(0) .by('identical', 1.0) .by('fliplr', p) .by('brightness', p, [0.7, 1.3])) The ``AugmentImage`` nut takes as parameter the index of the image within the sample ``(image, label)``, here position 0 and augmentations are specified by invoking ``by(transformation, probability, *args)``. We augment by passing the unchanged image (``'identical'``) through with probability 1.0 (all of them), flipping images horizontally for 10% of the samples (``p = 0.1``), and randomly changing the brightness in range ``[0.7, 1.3]``, again with 10% probability ``p``. We could have a look at the augmented images and their labels using the following flow (`complete code here `_ ) .. code:: Python train_samples, test_samples = load_samples() train_samples >> augment >> ViewImageAnnotation(0, 1, pause=1) >> Consume() In detail: for every sample processed by ``AugmentImage``, the image is extracted from position 0 of the sample tuple and new samples with the same label but with augmented images are outputted. For each input image the identical output image is generated (``identical``), and additional augmented samples (``fliplr``, ``brightness``) are created with 10% probability each, resulting in 20% more training data. Transformation ^^^^^^^^^^^^^^ Images returned by ``load_samples()`` are NumPy arrays with integers in range ``[0, 255]``. The network, however, expects floating point numbers (``float32``) in range ``[0,1]``. We therefore transform images by *reranging* .. code:: Python rerange = TransformImage(0).by('rerange', 0, 255, 0, 1, 'float32') where ``TransformImage`` takes as parameter the index of the image within the sample and transformation are defined by invoking ``by(transformation, *args)``. .. note:: Transformation are chained, meaning that an input image is transformed by sequentially applying all transformations to the image, resulting in one output image. Consequently, the number of input and output images after transformation are the same. Augmentations, on the other hand, are applied independently and the number of input and output images can differ. See ``TransformImage`` in `transformer.py `_ for a list of available transformations. Each transformation can also be used for augmentation. Custom transformations can be added via ``register`` .. doctest:: >>> from nutsml import TransformImage, AugmentImage >>> my_brightness = lambda image, c: image * c >>> TransformImage.register('my_brightness', my_brightness) >>> transform = TransformImage(0).by('my_brightness', 1.5) >>> augment = AugmentImage(0).by('my_brightness', [0.7, 1.3]) While transformations take a specific parameter values, e.g. ``1.5`` for brightness, augmentations take ranges, e.g. ``[0.7, 1.3]``, where parameter values are uniformly sampled from. Batching ^^^^^^^^ Networks are trained with *mini-batches* of samples, e.g. a stack of images with their corresponding class labels. ``BuildBatch(batchsize)`` is used to build these batches. The following example creates a batcher that extracts images from column 0 of the samples and class labels from column 1. Class labels are encode as one-hot vectors, while images within the batch are represented as NumPy arrays with dtype ``float32``. .. code:: Python NUM_CLASSES = 10 BATCH_SIZE = 32 build_batch = (BuildBatch(BATCH_SIZE) .input(0, 'image', 'float32') .output(1, 'one_hot', 'uint8', NUM_CLASSES)) Having a batcher we can now build a complete pipeline that trains the network for one epoch .. code:: Python train_samples >> augment >> rerange >> build_batch >> network.train() >> Consume() .. note:: ``Consume()``, ``Collect()``, ``Unzip()`` or some other data sink is needed. Without a consumer at the end of the pipeline no data is processed. Usually it is a good idea to shuffle the data (especially after augmentation) to ensure that each mini-batch contains a nice distribution of different class examples. Complete shuffling is not feasible if the training images do not fit in memory but we can perform a partial shuffling, e.g. over 100 samples. Let's also train for more than one epoch .. code:: Python EPOCHS = 20 for epoch in range(EPOCHS): (train_samples >> augment >> rerange >> Shuffle(100) >> build_batch >> network.train() >> Consume()) Training results ^^^^^^^^^^^^^^^^ Instead of consuming (and throwing away) the outputs of the training we can collect and print the results (loss, accuracy) .. code:: Python for epoch in range(EPOCHS): t_loss, t_acc = (train_samples >> augment >> rerange >> Shuffle(100) >> build_batch >> network.train() >> Unzip()) print("train loss :", t_loss >> Mean()) print("train acc :", t_acc >> Mean()) ``network.train()`` takes mini-batches as input and outputs loss and accuracy per mini-batch as specified in ``create_network()``. ``Unzip()`` transforms the outputted sequence of ``(loss, accuracy)`` tuples into a sequence of losses ``t_loss`` and a sequence of accuracies ``t_acc``. Finally, we print the mean (over mini-batches) for training loss and accuracy. Validation ---------- The CIFAR-10 data set is divided into a training and a test set but does not come with a validation set per default. However, we can easily split the training set into a new training set and a validation set .. code:: Python train_samples, val_samples = train_samples >> SplitRandom(0.8) The new training set will contain 80% of the original set and the validation set the remainder. .. note:: ``SplitRandom()`` can split into more than two sets and can take constraints into account and ``SplitLeaveOneOut()`` performs leave-one-out splits. The performance of the network on the validation data can then be computed analogous to the way the training results were computed. Important differences are that we are using the validation data, calling ``network.validate()`` instead of ``network.train()``, do not perform augmentation and there is no need to shuffle the data .. code:: Python for epoch in range(EPOCHS): v_loss, v_acc = val_samples >> rerange >> build_batch >> network.validate() >> Unzip() print("val loss :", v_loss >> Mean()) print("val acc :", v_acc >> Mean()) Again, printed results are mean values over mini-batch losses and accuracies. Evaluation ---------- Validation accuracy averaged over mini-batches provides a reasonable estimate for the prediction accuracy and is, for instance, useful for early stopping, but is not an accurate measure of the true classification performance. Typically we want to evaluate on an independent test set and average over samples, not mini-batches. The code below calls ``network.evaluate()`` to compute the ``categorical_accuracy`` over all test samples .. code:: Python e_acc = test_samples >> rerange >> build_batch >> network.evaluate([categorical_accuracy]) print("evaluation acc :", e_acc) In contrast to the training or validation accuracies computed by ``network.train()`` or ``network.validate()``, ``network.evaluate()`` returns a single number per metric and no averaging is required. Check-pointing -------------- A common method to enable the continuation of an interrupted training or to implement early-stopping is to save the network weights, either at regular intervals (e.g. at each epoch) or when the validation accuracy reaches a new high. Network weights can be easily be saved by invoking the ``save()`` method .. code:: Python network.save() where the path to the weights file was specified when wrapping the model via ``KerasNetwork(model, weightsfile)`` in ``create_network()``. For *early-stopping* we want to save the weights depending on the validation loss or accuracy. The following code shows how to compute the validation accuracy and uses ``save_best()`` to save the weights for the network with the highest accuracy .. code:: Python v_acc = val_samples >> rerange >> build_batch >> network.validate() >> Get(1) >> Mean() network.save_best(v_acc, isloss=False) Note that the computation of the validation accuracy is slightly different than shown before. Here we need only the accuracies but not the losses and therefore call ``Get(1)`` to extract them. Since the output then contains only accuracies and not tuples ``(loss, acc)`` anymore, we can directly call ``Mean()`` and don't need to ``Unzip``. If we want to save the network with the smallest loss instead, we can write .. code:: Python v_loss = val_samples >> rerange >> build_batch >> network.validate() >> Get(0) >> Mean() network.save_best(v_loss, isloss=True) Reading ------- The CIFAR-10 benchmark dataset is small enough to fit in memory. However, in many practical applications the image datasets are too large to be loaded in memory entirely and images need to be read sequentially from the file system. The following example shows how to read PNG images from a folder and to display them .. code:: Python show_image = ViewImage(0, pause=1, figsize=(2, 2), interpolation='spline36') glob('images/*.png') >> ReadImage(None) >> show_image >> Consume() ``ReadImage`` takes a sequence of file paths as input, generated using ``glob``, reads the image from the file system, and returns tuples of shape ``(image,)``, where images are numpy arrays. We can then display the image with ``ViewImage``, where ``0`` indicates the column in the input sample that contains the image and ``pause=1`` forces a pause of one second between images. See `cifar/read_images.py `_ for a complete code example. A common method to organize image data for network training on the file system is to store them in sub-folders named after the class labels, for instance .. code:: images\ 0\ img123.jpg img456.jpg ... 9\ img789.jpg We can read these images with their corresponding class labels using the following code .. code:: Python ReadLabelDirs('images', '*.jpg') >> ReadImage(0) >> show_image >> Consume() where ``ReadLabelDirs`` returns tuples of the form ``(filepath, label)``. See `mnist/read_images.py `_ for a complete example using the MNIST data. Writing ------- Often we not only want to read image data but also write them, e.g. after transformation or augmentation. The following code writes the first 20 of the CIFAR-10 training images in PNG format to the file system .. code:: Python train_samples, _ = load_samples() imagepath = 'images/img*.png' train_samples >> Take(20) >> WriteImage(0, imagepath) >> Consume() The filenames for the images are generated automatically by replacing the ``*`` in ``imagepath`` by a running number. For instance, the code above would create the following files .. code:: Python ./images/img0.png ./images/img0.png ... ./images/img19.png A more complex example that includes the class label of an image in its filename can be seen in `cifar/write_images.py `_ . Prediction ---------- After having trained and evaluated a network we usually want to apply it and predict labels for new images. Here an example .. code:: Python samples = glob('images/*.png') >> ReadImage(None) pred_batch = BuildBatch(BATCH_SIZE).input(0, 'image', 'float32') predictions = (samples >> rerange >> pred_batch >> network.predict() >> Map(ArgMax()) >> Collect()) print(predictions) As before we read images from the file system with ``ReadImage``, re-range them and build a batch. Note that it would be easy to add a transformation that resizes the new input images to the shape required by the network. .. note:: For classification the batch needs to be created differently (without class labels) compared to training/evaluation, since class labels are not available - that is what we want to predict! We call ``network.predict`` to retrieve the prediction of the network for an input image. The output is a softmax vector (see ``create_network()``) and we use ``Map(ArgMax())`` to get the class index. If you want the class index together with the class probability ``Map(ArgMax(retvalue=True))`` can be called instead. `cifar/cnn_classify.py `_ contains a more complex example that displays the image with the true and predicted class names. Code ---- Here is the complete code (without imports) for the network training. The entire code can be found in `cifar/cnn_train.py `_. .. code:: Python rerange = TransformImage(0).by('rerange', 0, 255, 0, 1, 'float32') build_batch = (BuildBatch(BATCH_SIZE) .input(0, 'image', 'float32') .output(1, 'one_hot', 'uint8', NUM_CLASSES)) p = 0.1 augment = (AugmentImage(0) .by('identical', 1.0) .by('brightness', p, [0.7, 1.3]) .by('color', p, [0.7, 1.3]) .by('shear', p, [0, 0.1]) .by('fliplr', p) .by('rotate', p, [-10, 10])) plot_eval = PlotLines((0, 1), layout=(2, 1)) network = create_network() train_samples, test_samples = load_samples() train_samples, val_samples = train_samples >> SplitRandom(0.8) for epoch in xrange(NUM_EPOCHS): print('EPOCH:', epoch) t_loss, t_acc = (train_samples >> PrintProgress(train_samples) >> Pick(PICK) >> augment >> rerange >> Shuffle(100) >> build_batch >> network.train() >> Unzip()) t_loss, t_acc = t_loss >> Mean(), t_acc >> Mean() print("train loss : {:.6f}".format(t_loss)) print("train acc : {:.1f}".format(100 * t_acc)) v_loss, v_acc = (val_samples >> rerange >> build_batch >> network.validate() >> Unzip()) v_loss, v_acc = v_loss >> Mean(), v_acc >> Mean() print('val loss : {:.6f}'.format(v_loss)) print('val acc : {:.1f}'.format(100 * v_acc)) network.save_best(v_acc, isloss=False) plot_eval((t_acc >> Mean(), v_acc)) print('testing...') e_acc = (test_samples >> rerange >> build_batch >> network.evaluate([categorical_accuracy])) print('test acc : {:.1f}'.format(100 * e_acc))