Training networks
=================

In this section we will learn how to feed mini-batches into a network for training or
inference. Let us assume we have some Keras model of a classification network

.. code:: Python

      model = Sequential()
      model.add(Convolution2D(32, (3, 3), input_shape=INPUT_SHAPE))
      ...
      model.add(Dense(NUM_CLASSES))
      model.add(Activation('softmax'))
      
      model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
   

and let us further assume we have a pipeline that generates mini-batches as
described in the previous section

.. code:: Python

   batches = train_samples >> read_image >> ... >> build_batch    
   
we then could train the model (for a single epoch) using the 
``train_on_batch`` method provided by Keras
   
.. code:: Python
    
    for batch in batches:
        model.train_on_batch(*batch)

or a bit more explicitly

.. code:: Python

    for inputs, outputs in batches:
        model.train_on_batch(inputs, outputs)
        
Note that ``batches`` is a generator and not a list of batches -- there is no
consumer such as ``Consume`` or ``Collect()`` at the end of the pipeline. 
Also we have to ensure that the shape of the batches matches the ``INPUT_SHAPE`` 
of the model -- a common problem. Use ``PrintType()`` to print the shape of the 
generated batches.

Keras supports another method for training, ``fit_generator``, which expects an
infinite stream of mini-batches. This can easily be achieved by adding a ``Cycle``
nut after the loading of the training samples:        
                
.. code:: Python
        
    batches = train_samples >> Cycle() >> read_image >> ... >> build_batch    
    model.fit_generator(batches)

However, the easiest way to train a Keras network is to take advantage of the 
``KerasNetwork`` wrapper provided by **nuts-ml**. It takes a Keras model and 
wraps it into a nut that can directly be plugged into a pipeline:

.. code:: Python

   network = KerasNetwork(model)   
   train_samples >> read_image >> ... >> build_batch >> network.train() >> Consume()

Note that we need a consume at the end of the pipeline to pull the data. In the examples
above, ``train_on_batch`` and ``fit_generator`` were the consumers.
``network.train()`` trains the network and emits the loss and any specified metric 
(e.g. accuracy in this example) per mini-batch. We can collect this output and 
report average loss and accuracy per epoch.

.. code:: Python

  network = KerasNetwork(model)   
  for epoch in range(EPOCHS):
      t_loss, t_acc = train_samples >> ... >> build_batch >> network.train() >> Unzip()
      print("train loss  :", t_loss >> Mean())
      print("train acc   :", t_acc >> Mean())
   
Apart from the training loss (and accuracy) we often want to know the networks performance
on a validation set. The data preprocessing pipelines in both cases are very similar
but typically we do not augment when validating. In the following, a code sketch 
for training and validation:

.. code:: Python

  network = KerasNetwork(model)   
  for epoch in range(EPOCHS):
      t_loss, t_acc = (train_samples >> read_image >> transform >> augment >> 
                       Shuffle(100) >> build_batch >> network.train() >> Unzip())
      print("train loss  :", t_loss >> Mean())
      print("train acc   :", t_acc >> Mean())
      
      v_loss, v_acc = (val_samples >> read_image >> transform >> 
                       build_batch >> network.validate() >> Unzip())
      print("val loss  :", v_loss >> Mean())
      print("val acc   :", v_acc >> Mean())
      
Note that we skip the augmentation and shuffling that are part of the training pipeline
when validating.

Training and validation performance are averaged over batches. The true performance,
however, needs to be computed on a per-sample bases. **nuts-ml** provides ``evaluate()``
for this purpose. For instance, the code sketch below calls ``network.evaluate()`` 
to compute the ``categorical_accuracy`` over all test samples

.. code:: Python

  e_acc = (test_samples >> read_image >> transform >> build_batch >>
           network.evaluate([categorical_accuracy])
  print("evaluation acc  :", e_acc)

This code typically would run after the epoch loop when the network training is complete.
Note that ``evaluate`` is a sink (no ``Collect`` needed) and returns a single number per metric (no averaging required).

Finally, once we trained the network and are happy with the classification accuracy
we would like to use the network for inference/prediction. Prediction is different
from training, validation and evaluation in that we don't know the target/output values
-- those we want to infer. Consequently, the mini-batches need to be constructed
without outputs and then can be feed into the ``predict()`` function, that returns
the softmax vectors:

.. code:: Python

  build_pred_batch = BuildBatch(BATCH_SIZE).input(...)

  predictions = (samples >> read_image >> transform >> build_pred_batch >> 
                 network.predict() >> Map(ArgMax()) >> Collect())
       
We use ``Map(ArgMax())`` to retrieve the class index of the class with the highest
softmax probability and collect those indices as network predictions. Note that we
easily could convert the class indices to labels using ``ConvertLabel``.