Rearranging data

Another common need is to rearrange or restructure data. The following nuts can help with that.

>>> from nutsflow import *

Slice

Slice([start,] stop, [stride]) takes a slice of the data. Similar to Python’s slicing operation it extracts a section of the data. If no start or stride are provided, Slice extracts the first stop elements

>>> [1, 2, 3, 4] >> Slice(2) >> Collect()
[1, 2]

If start and stop are provided the elements from start index to stop index (excluded) are extracted

>>> [1, 2, 3, 4] >> Slice(1, 3) >> Collect()
[2, 3]

Finally the third parameter allows to specify a stride. In this example every second element in the slice starting at index 0 and ending at index 4 (exclusive) is extracted

>>> [1, 2, 3, 4] >> Slice(0, 4, 2) >> Collect()
[1, 3]

Chunk

Chunk(n) is a nut to group data in chunks of size n:

>>> Range(5) >> Chunk(2) >> Map(list) >> Collect()
[[0, 1], [2, 3], [4]]

Note that each chunk is an iterator over the elements in the chunk, which is why Map(list) is required to convert the chunks to printable lists. A more interesting example might be the sum of the elements within each chunk

>>> Range(5) >> Chunk(2) >> Map(sum) >> Collect()
[1, 5, 4]

Window

Window(n) provides a sliding window of size n over the elements in the input data. For example:

>>> [1, 2, 3, 4, 5] >> Window(3) >> Collect()
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]

This works for strings as well. We use Join() to convert the individual characters in the generated windows to strings:

>>> 'abcdefg' >> Window(4) >> Map(Join()) >> Collect()
['abcd', 'bcde', 'cdef', 'defg']

Cycle

Sometimes it is necessary to repeatedly process an iterable. Cycle takes all elements from its input iterable, stores them in memory and returns an iterator that cycles through the elements indefinitely. Here an example that cycles through 1, 2, 3 and takes the first 10 elements

>>> [1, 2, 3] >> Cycle() >> Take(10) >> Collect()
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1]

Note that Cycle will consume large amounts of memory if the input iterable is large.

Permutate

Permutate([,r]) returns successive r length permutations of the elements in the input iterable.

>>> [1, 2, 3] >> Permutate(2) >> Collect()
[(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]

Maybe a more interesting example: What is the number of distinctive palindroms for a given string:

>>> IsPalindrom = nut_filter(lambda x: x == x[::-1])
>>> 'devoved' >> Permutate() >> IsPalindrom() >> Collect(set) >> Count()
6

If no permutation size r is specified then all possible full-length permutations are generated (r!) and the computation will not finish in any reasonable time for non-small values of r !

Combine

Combine(r) return r length subsequences of the elements from the input iterable.

>>> [1, 2, 3] >> Combine(2) >> Collect()
[(1, 2), (1, 3), (2, 3)]

Note that Combine(r) returns a subset of Permutate(r) with permutations where the order of the elements (as given in the input iterable) is preserved.

Dedupe

A very common task is to remove all duplicates from a data set. Dedupe([key]) performs this task and also takes a key function that defines which elements are treated as equal.

Dedupe() preserves the order of the element in the input. See the following example

>>> [2, 3, 1, 1, 2, 4] >> Dedupe() >> Collect()
[2, 3, 1, 4]

More complex data often require a more sophisticated definition of equality and the key functions provides this

>>> data = [(1, 'a'), (2, 'a'), (3, 'b')]
>>> data >> Dedupe(lambda (x, y): y) >> Collect()
[(1, 'a'), (3, 'b')]

Dedupe() memorizes all unique elements of the input iterable in a set and can potentially consume large amounts of memory!