Printing and Debugging

When creating longer, more complex data flows printing and debugging is often necessary. nuts-flow provides methods for this purpose.

Print

Typical data flows are largely composed of pure functions without side-effects and only the final result is accessible. To display intermediate results the Print nut can be used:

>>> from nutsflow import *
>>> Range(3) >> Print() >> Consume()
0
1
2

Print takes a format parameter (either a format string or a function) that allows to tailor its output

>>> Range(3) >> Print('number: {}') >> Consume()
number: 0
number: 1
number: 2
>>> Range(3) >> Print(lambda x: 'odd : %s' % bool(x % 2)) >> Consume()
odd : False
odd : True
odd : False

Print furthermore supports to print every n-ths element, print at certain time intervals, or filter for the elements to be displayed:

>>> Range(6) >> Print(every_n=2) >> Consume()
1
3
5
>>> Range(6) >> Sleep(1.0) >> Print(every_sec=1.5) >> Consume()
1
3
5
>>> Range(6) >> Print(filterfunc=lambda x: x > 2) >> Consume()
3
4
5

PrintType

When working with Numpy arrays or Pytorch/Tensorflow tensors, Print() is not a good choice, since it prints the (potentially large) array/tensor data. PrintType() on the other hand, prints only the shape and data type of of array/tensor and the value and type for other data.

>>> import numpy as np
>>> mat = np.ones((1024,512), dtype=np.uint8)
>>> data = [(mat, 0), (mat, 1), (mat, 2)]
>>> data >> PrintType() >> Consume()
(<ndarray> 1024x512:uint8, <int> 0)
(<ndarray> 1024x512:uint8, <int> 1)
(<ndarray> 1024x512:uint8, <int> 2)

PrintType() is especially useful to print complex, nested data structures that contain array/tensor data.

>>> batch = [[mat, mat], mat]
>>> batches = [(batch, batch]
>>> batches >> PrintType() >> Consume()
[[<ndarray> 1024x512:uint8, <ndarray> 1024x512:uint8], <ndarray> 1024x512:uint8]
[[<ndarray> 1024x512:uint8, <ndarray> 1024x512:uint8], <ndarray> 1024x512:uint8]

PrintColType

If the data is organized in columns (e.g. tuples) as shown above, PrintColType() can be used to print additonal information such as the range of array/tensor data:

>>> image = np.ones((1024,512), dtype=np.uint8)
>>> images = [(image*1, 1), (image*10, 2), (image*100, 3)]
>>> images >> PrintColType() >> Consume()
item 0: <tuple>
  0: <ndarray> shape:1024x512 dtype:uint8 range:1..1
  1: <int> 1
item 1: <tuple>
  0: <ndarray> shape:1024x512 dtype:uint8 range:10..10
  1: <int> 2
item 2: <tuple>
  0: <ndarray> shape:1024x512 dtype:uint8 range:100..100
  1: <int> 3

This is especially nice, when working with named tuples:

>>> from collections import namedtuple
>>> Sample = namedtuple('Sample', 'image,label')
>>> samples = [Sample(image, 'good'), Sample(image, 'bad')]
>>> samples >> PrintColType() >> Consume()
item 0: <Sample>
  image: <ndarray> shape:1024x512 dtype:uint8 range:1..1
  label: <str> good
item 1: <Sample>
  image: <ndarray> shape:1024x512 dtype:uint8 range:1..1
  label: <str> bad

PrintProgress

For long running flows printing progress information can be displayed by inserting a PrintProgress nut. It, however, requires that the number of elements to be processed is known beforehand.

>>> n = 10
>>> Range(n) >> Sleep(0.1) >> PrintProgress(n, update=0.1) >> Consume() 
progress: 100%

Limit data

Instead of printing all the data the size of data processed can be limited, which is much more efficient. For instance, the Take(n) nut takes the first n elements only:

>>> Range(1000) >> Take(3) >> Collect()
[0, 1, 2]

Alternatively the Head(n) nut can be used that takes n elements and collects them:

>>> Range(1000) >> Head(3)
[0, 1, 2]

The last elements of a flow can be captured by Tail but note that the entire flow is consumed:

>>> Range(1000) >> Tail(3)
[997, 998, 999]

Finally, Pick(n) allows to pick every n-th element:

>>> Range(1000) >> Pick(100) >> Collect()
[0, 100, 200, 300, 400, 500, 600, 700, 800, 900]

For n < 1, Pick(n) picks element with the given probabiltity, e.g. to pick 10% of the data use Pick(0.1).

No Operation

nuts-flow provides a NOP(nut) nut that can be used to temporarily disable the evaluation of a nut in a flow.

>>> Range(5) >> Square() >> Collect()  # compute squares
[0, 1, 4, 9, 16]
>>> Range(5) >> NOP(Square()) >> Collect()  # Square disabled
[0, 1, 2, 3, 4]

This is often more convenient that commenting-out or temporarily removing a nut for debugging purposes. Note that only single nuts can be disabled with this method.

Conditional

Individual nuts in a flow can also be disabled/enabled or replaced depending on a boolean flag using the If(cond, if_nut, else_nut) nut:

>>> [1, 2, 3] >> If(True, Square()) >> Collect()
[1, 4, 9]
>>> [1, 2, 3] >> If(False, Square()) >> Collect()
[1, 2, 3]
>>> [1, 2, 3] >> If(False, Square(), Take(1)) >> Collect()
[1]

Again this is largely of interest for debugging and limited to operate on single nuts.

Counter

Sometimes only the number of elements processed at a certain stage is of interest. Counter is a nut with the needed side-effect:

>>> count = Counter('cnt')
>>> Range(10) >> count >> Square() >> Sum()
285
>>> count.value
10

Note that Counter does not modify the data flow. Counter also has a filter function to count only certain elements:

>>> greater5 = Counter('gt5', filterfunc = lambda x: x > 5)
>>> Range(10) >> Square() >> greater5 >> Collect()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> greater5
gt5 = 7

Note that the actual value of the counter is stored in value and can be printed but for conveniency print(greater5) prints the name of the counter and its value as well.