Divide and conquer

It is frequently necessary to either split a data flow into multiple flows or combine data flows. The following nuts are specifically designed for this purpose. In this context the Partition and the MapMulti nuts might be of interest as well.

Zip

Zip(*iterables) combines two or more iterables like a zipper taking at every step an element from each iterable and outputting a tuple of the grouped elements. Here an example

>>> from nutsflow import *
>>> numbers = [0, 1, 2]
>>> letters = ['a', 'b', 'c']
>>> numbers >> Zip(letters) >> Collect()
[(0, 'a'), (1, 'b'), (2, 'c')]

Zip finishes when the shortest iterable is exhausted. See

>>> Range(100) >> Zip('abc') >> Collect()
[(0, 'a'), (1, 'b'), (2, 'c')]

Note that Zip can zip more than two iterables:

>>> '12' >> Zip('ab', '+-') >> Collect()
[('1', 'a', '+'), ('2', 'b', '-')]

If the output of Zip is required to be flat Flatten can be called

>>> [0, 1, 2] >> Zip('abc') >> Flatten() >> Collect()
[0, 'a', 1, 'b', 2, 'c']

but using Interleave is simpler in this case.

Instead of nuts-flow’s Zip, Python’s zip could be used alternatively:

>>> zip(numbers, letters) >> Print() >> Consume()
(0, 'a')
(1, 'b')
(2, 'c')

Unzip

Unzip(container=None) reverses a Zip operation:

>>> numbers, letters = [0, 1, 2] >> Zip('abc') >> Unzip()
>>> list(numbers)
[0, 1, 2]
>>> list(letters)
['a', 'b', 'c']

Per default Unzip returns iterators but often the results are required as lists or other collections (see above). Unzip allows to provide a container to collect the results:

>>> zip([0, 1, 2], 'abc') >> Unzip(list) >> Collect()
[[0, 1, 2], ['a', 'b', 'c']]

This equivalent to Unzip() >> Map(list) >> Collect() but shorter.

Interleave

Interleave works like Zip but does not group zipped results in tuples. Instead an iterator over a flattened sequence of interleaved elements is returned:

>>> numbers = [0, 1, 2]
>>> letters = ['a', 'b', 'c']
>>> numbers >> Interleave(letters) >> Collect()
[0, 'a', 1, 'b', 2, 'c']

Also in contrast to Zip, Interleave does not stop when the shortest input iterable is depleted. Elements are returned until all inputs are depleted:

>>> Range(10) >> Interleave('abc') >> Collect()
[0, 'a', 1, 'b', 2, 'c', 3, 4, 5, 6, 7, 8, 9]

Concat

Apart from zipping or interleaving iterators, they can also be concatenated using Concat:

>>> Range(5) >> Concat('abc') >> Collect()
[0, 1, 2, 3, 4, 'a', 'b', 'c']
>>> '12' >> Concat('abcd', [3, 4, 5]) >> Collect()
['1', '2', 'a', 'b', 'c', 'd', 3, 4, 5]

Note that Concat is memory efficient and does not materialize any of the input iterables or the concatenated result in memory; e.g. in contrast to the following code:

>>> list(Range(5)) + list('abc')
[0, 1, 2, 3, 4, 'a', 'b', 'c']

Tee

Tee([n=2]) creates multiple independent iterators from a single iterable.

>>> numbers1, numbers2  = Range(5) >> Tee(2)
>>> numbers1 >> Collect()
[0, 1, 2, 3, 4]
>>> numbers2 >> Collect()
[0, 1, 2, 3, 4]

Tee is only useful if the returned iterators are advanced largely synchronously. Otherwise the memory consumption is identical to simply materializing the input iterable and referencing it, e.g.

>>> numbers1 = Range(5) >> Collect()
>>> numbers2 = numbers1

A simple example where Tee is useful would be to add each number in the input iterable to its predecessor:

>>> add = lambda a, b: a + b
>>> numbers1, numbers2  = Range(5) >> Tee(2)
>>> numbers1 >> Drop(1) >> Map(add, numbers2) >> Collect()
[1, 3, 5, 7]

Iterators, in contrast to streams, do not allow to go back and Tee provides a way to overcome this limitation.