Printing and Debugging¶
When creating longer, more complex data flows printing and debugging is often necessary. nuts-flow provides methods for this purpose.
Print¶
Typical data flows are largely composed of pure functions without
side-effects and only the final result is accessible.
To display intermediate results the Print
nut can be used:
>>> from nutsflow import *
>>> Range(3) >> Print() >> Consume()
0
1
2
Print
takes a format parameter (either a
format string
or a function) that allows to tailor its output
>>> Range(3) >> Print('number: {}') >> Consume()
number: 0
number: 1
number: 2
>>> Range(3) >> Print(lambda x: 'odd : %s' % bool(x % 2)) >> Consume()
odd : False
odd : True
odd : False
Print
furthermore supports to print every n-ths element, print at
certain time intervals, or filter for the elements to be displayed:
>>> Range(6) >> Print(every_n=2) >> Consume()
1
3
5
>>> Range(6) >> Sleep(1.0) >> Print(every_sec=1.5) >> Consume()
1
3
5
>>> Range(6) >> Print(filterfunc=lambda x: x > 2) >> Consume()
3
4
5
PrintType¶
When working with Numpy arrays or Pytorch/Tensorflow tensors, Print()
is
not a good choice, since it prints the (potentially large) array/tensor data.
PrintType()
on the other hand, prints only the shape and data type of
of array/tensor and the value and type for other data.
>>> import numpy as np
>>> mat = np.ones((1024,512), dtype=np.uint8)
>>> data = [(mat, 0), (mat, 1), (mat, 2)]
>>> data >> PrintType() >> Consume()
(<ndarray> 1024x512:uint8, <int> 0)
(<ndarray> 1024x512:uint8, <int> 1)
(<ndarray> 1024x512:uint8, <int> 2)
PrintType()
is especially useful to print complex, nested data structures that
contain array/tensor data.
>>> batch = [[mat, mat], mat]
>>> batches = [(batch, batch]
>>> batches >> PrintType() >> Consume()
[[<ndarray> 1024x512:uint8, <ndarray> 1024x512:uint8], <ndarray> 1024x512:uint8]
[[<ndarray> 1024x512:uint8, <ndarray> 1024x512:uint8], <ndarray> 1024x512:uint8]
PrintColType¶
If the data is organized in columns (e.g. tuples) as shown above, PrintColType()
can be used to print additonal information such as the range of array/tensor data:
>>> image = np.ones((1024,512), dtype=np.uint8)
>>> images = [(image*1, 1), (image*10, 2), (image*100, 3)]
>>> images >> PrintColType() >> Consume()
item 0: <tuple>
0: <ndarray> shape:1024x512 dtype:uint8 range:1..1
1: <int> 1
item 1: <tuple>
0: <ndarray> shape:1024x512 dtype:uint8 range:10..10
1: <int> 2
item 2: <tuple>
0: <ndarray> shape:1024x512 dtype:uint8 range:100..100
1: <int> 3
This is especially nice, when working with named tuples:
>>> from collections import namedtuple
>>> Sample = namedtuple('Sample', 'image,label')
>>> samples = [Sample(image, 'good'), Sample(image, 'bad')]
>>> samples >> PrintColType() >> Consume()
item 0: <Sample>
image: <ndarray> shape:1024x512 dtype:uint8 range:1..1
label: <str> good
item 1: <Sample>
image: <ndarray> shape:1024x512 dtype:uint8 range:1..1
label: <str> bad
PrintProgress¶
For long running flows printing progress information can be displayed
by inserting a PrintProgress
nut. It, however, requires that the
number of elements to be processed is known beforehand.
>>> n = 10
>>> Range(n) >> Sleep(0.1) >> PrintProgress(n, update=0.1) >> Consume()
progress: 100%
Limit data¶
Instead of printing all the data the size of data processed
can be limited, which is much more efficient. For instance, the
Take(n)
nut takes the first n elements only:
>>> Range(1000) >> Take(3) >> Collect()
[0, 1, 2]
Alternatively the Head(n)
nut can be used that takes n
elements and collects them:
>>> Range(1000) >> Head(3)
[0, 1, 2]
The last elements of a flow can be captured by Tail
but note
that the entire flow is consumed:
>>> Range(1000) >> Tail(3)
[997, 998, 999]
Finally, Pick(n)
allows to pick every n-th element:
>>> Range(1000) >> Pick(100) >> Collect()
[0, 100, 200, 300, 400, 500, 600, 700, 800, 900]
For n < 1
, Pick(n)
picks element with the given probabiltity,
e.g. to pick 10% of the data use Pick(0.1)
.
No Operation¶
nuts-flow provides a NOP(nut)
nut that can be used to
temporarily disable the evaluation of a nut in a flow.
>>> Range(5) >> Square() >> Collect() # compute squares
[0, 1, 4, 9, 16]
>>> Range(5) >> NOP(Square()) >> Collect() # Square disabled
[0, 1, 2, 3, 4]
This is often more convenient that commenting-out or temporarily removing a nut for debugging purposes. Note that only single nuts can be disabled with this method.
Conditional¶
Individual nuts in a flow can also be disabled/enabled or replaced
depending on a boolean flag using the If(cond, if_nut, else_nut)
nut:
>>> [1, 2, 3] >> If(True, Square()) >> Collect()
[1, 4, 9]
>>> [1, 2, 3] >> If(False, Square()) >> Collect()
[1, 2, 3]
>>> [1, 2, 3] >> If(False, Square(), Take(1)) >> Collect()
[1]
Again this is largely of interest for debugging and limited to operate on single nuts.
Counter¶
Sometimes only the number of elements processed at a certain stage
is of interest. Counter
is a nut with the needed side-effect:
>>> count = Counter('cnt')
>>> Range(10) >> count >> Square() >> Sum()
285
>>> count.value
10
Note that Counter
does not modify the data flow. Counter
also
has a filter function to count only certain elements:
>>> greater5 = Counter('gt5', filterfunc = lambda x: x > 5)
>>> Range(10) >> Square() >> greater5 >> Collect()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> greater5
gt5 = 7
Note that the actual value of the counter is stored in value
and can
be printed but for conveniency print(greater5)
prints the name of the
counter and its value as well.