Performance =========== **nuts-flow** does not support concurrency in general but provides nuts that can improve performance by caching or parallelization. MapPar ------ Applying a function concurrently to the elements of a flow can be achieved with the ``MapPar`` nut. The following toy example converts numbers to their *absolute values* by applying the ``abs`` function in parallel >>> from nutsflow import * >>> [-1, -2, -3] >> MapPar(abs) >> Collect() [1, 2, 3] Note that the order of the elements in the iterable is preserved. Currently, ``MapPar`` is of limited use, since 1) the function applied must be `pickable `_ and 2) ``MapPar`` creates parallel processes, which are computationally expensive to start. Cache ----- **nuts-flow** supports the *caching* of results to disk. Here an example in pseudo code .. code:: python with Cache() as cache: for i in range: data >> expensive_op >> cache >> ... >> Collect() Note that *caching* is only useful if 1) the elements to cache are time-consuming to compute, 2) can be loaded faster than recreated, and 3) the same data flow is executed multiple times, where in the first run the cache is filled and then is used in all subsequent runs. A common use case is *machine learning for vision*, where images are preprocessed and a classifier is trained by repeatedly executing a data flow: .. code:: python with Cache() as cache: for epoch in xrange(100): images >> preprocess >> cache >> network.train() >> Consume() Cached elements are pickled to a temporary folder which is deleted when the ``with`` block is exited. The cache can be cleared as follows: .. code:: python with Cache() as cache: ... cache.clear() Prefetch -------- *Prefetching* is another common method employed in (GPU-based) machine learning to speed up a data flow. Here data is pre-fetched (and pre-processed) on a separate CPU thread while the GPU is performing machine learning on another chunk of data: .. code:: python images >> preprocess >> Prefetch() >> network.train() >> Consume() The following two examples demonstrate the difference between processing a data flow with and without pre-fetching. First a flow *without pre-fetching* that takes one number and prints it .. code:: python >>> Range(5) >> Print() >> Take(1) >> Consume() 0 now the same flow but *with pre-fetching* .. code:: python >>> Range(5) >> Print() >> Prefetch() >> Take(1) >> Consume() 0 1