Printing and Debugging ====================== When creating longer, more complex data flows printing and debugging is often necessary. **nuts-flow** provides methods for this purpose. Print ----- Typical data flows are largely composed of pure functions without side-effects and only the final result is accessible. To display intermediate results the ``Print`` nut can be used: >>> from nutsflow import * >>> Range(3) >> Print() >> Consume() 0 1 2 ``Print`` takes a format parameter (either a `format string `_ or a function) that allows to tailor its output >>> Range(3) >> Print('number: {}') >> Consume() number: 0 number: 1 number: 2 >>> Range(3) >> Print(lambda x: 'odd : %s' % bool(x % 2)) >> Consume() odd : False odd : True odd : False ``Print`` furthermore supports to print every *n-ths* element, print at certain time intervals, or filter for the elements to be displayed: >>> Range(6) >> Print(every_n=2) >> Consume() 1 3 5 >>> Range(6) >> Sleep(1.0) >> Print(every_sec=1.5) >> Consume() 1 3 5 >>> Range(6) >> Print(filterfunc=lambda x: x > 2) >> Consume() 3 4 5 PrintType --------- When working with Numpy arrays or Pytorch/Tensorflow tensors, ``Print()`` is not a good choice, since it prints the (potentially large) array/tensor data. ``PrintType()`` on the other hand, prints only the shape and data type of of array/tensor and the value and type for other data. >>> import numpy as np >>> mat = np.ones((1024,512), dtype=np.uint8) >>> data = [(mat, 0), (mat, 1), (mat, 2)] >>> data >> PrintType() >> Consume() ( 1024x512:uint8, 0) ( 1024x512:uint8, 1) ( 1024x512:uint8, 2) ``PrintType()`` is especially useful to print complex, nested data structures that contain array/tensor data. >>> batch = [[mat, mat], mat] >>> batches = [(batch, batch] >>> batches >> PrintType() >> Consume() [[ 1024x512:uint8, 1024x512:uint8], 1024x512:uint8] [[ 1024x512:uint8, 1024x512:uint8], 1024x512:uint8] PrintColType ------------ If the data is organized in columns (e.g. tuples) as shown above, ``PrintColType()`` can be used to print additonal information such as the range of array/tensor data: >>> image = np.ones((1024,512), dtype=np.uint8) >>> images = [(image*1, 1), (image*10, 2), (image*100, 3)] >>> images >> PrintColType() >> Consume() item 0: 0: shape:1024x512 dtype:uint8 range:1..1 1: 1 item 1: 0: shape:1024x512 dtype:uint8 range:10..10 1: 2 item 2: 0: shape:1024x512 dtype:uint8 range:100..100 1: 3 This is especially nice, when working with named tuples: >>> from collections import namedtuple >>> Sample = namedtuple('Sample', 'image,label') >>> samples = [Sample(image, 'good'), Sample(image, 'bad')] >>> samples >> PrintColType() >> Consume() item 0: image: shape:1024x512 dtype:uint8 range:1..1 label: good item 1: image: shape:1024x512 dtype:uint8 range:1..1 label: bad PrintProgress ------------- For long running flows printing progress information can be displayed by inserting a ``PrintProgress`` nut. It, however, requires that the number of elements to be processed is known beforehand. >>> n = 10 >>> Range(n) >> Sleep(0.1) >> PrintProgress(n, update=0.1) >> Consume() # doctest: +SKIP progress: 100% Limit data ---------- Instead of printing all the data the size of data processed can be limited, which is much more efficient. For instance, the ``Take(n)`` nut takes the first *n* elements only: >>> Range(1000) >> Take(3) >> Collect() [0, 1, 2] Alternatively the ``Head(n)`` nut can be used that takes *n* elements and collects them: >>> Range(1000) >> Head(3) [0, 1, 2] The last elements of a flow can be captured by ``Tail`` but note that the entire flow is consumed: >>> Range(1000) >> Tail(3) [997, 998, 999] Finally, ``Pick(n)`` allows to pick every *n-th* element: >>> Range(1000) >> Pick(100) >> Collect() [0, 100, 200, 300, 400, 500, 600, 700, 800, 900] For ``n < 1``, ``Pick(n)`` picks element with the given probabiltity, e.g. to pick 10% of the data use ``Pick(0.1)``. No Operation ------------ **nuts-flow** provides a ``NOP(nut)`` nut that can be used to temporarily disable the evaluation of a nut in a flow. >>> Range(5) >> Square() >> Collect() # compute squares [0, 1, 4, 9, 16] >>> Range(5) >> NOP(Square()) >> Collect() # Square disabled [0, 1, 2, 3, 4] This is often more convenient that commenting-out or temporarily removing a nut for debugging purposes. Note that only single nuts can be disabled with this method. Conditional ----------- Individual nuts in a flow can also be disabled/enabled or replaced depending on a boolean flag using the ``If(cond, if_nut, else_nut)`` nut: >>> [1, 2, 3] >> If(True, Square()) >> Collect() [1, 4, 9] >>> [1, 2, 3] >> If(False, Square()) >> Collect() [1, 2, 3] >>> [1, 2, 3] >> If(False, Square(), Take(1)) >> Collect() [1] Again this is largely of interest for debugging and limited to operate on single nuts. Counter ------- Sometimes only the number of elements processed at a certain stage is of interest. ``Counter`` is a nut with the needed side-effect: >>> count = Counter('cnt') >>> Range(10) >> count >> Square() >> Sum() 285 >>> count.value 10 Note that ``Counter`` does not modify the data flow. ``Counter`` also has a filter function to count only certain elements: >>> greater5 = Counter('gt5', filterfunc = lambda x: x > 5) >>> Range(10) >> Square() >> greater5 >> Collect() [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> greater5 gt5 = 7 Note that the actual value of the counter is stored in ``value`` and can be printed but for conveniency ``print(greater5)`` prints the name of the counter and its value as well.