.. _rearranging: Rearranging data ================ Another common need is to rearrange or restructure data. The following nuts can help with that. >>> from nutsflow import * Slice ----- ``Slice([start,] stop, [stride])`` takes a slice of the data. Similar to Python's `slicing `_ operation it extracts a section of the data. If no ``start`` or ``stride`` are provided, ``Slice`` extracts the first ``stop`` elements >>> [1, 2, 3, 4] >> Slice(2) >> Collect() [1, 2] If ``start`` and ``stop`` are provided the elements from ``start`` index to ``stop`` index (excluded) are extracted >>> [1, 2, 3, 4] >> Slice(1, 3) >> Collect() [2, 3] Finally the third parameter allows to specify a ``stride``. In this example every second element in the slice starting at index 0 and ending at index 4 (exclusive) is extracted >>> [1, 2, 3, 4] >> Slice(0, 4, 2) >> Collect() [1, 3] Chunk ----- ``Chunk(n)`` is a nut to group data in chunks of size ``n``: >>> Range(5) >> Chunk(2) >> Map(list) >> Collect() [[0, 1], [2, 3], [4]] Note that each chunk is an iterator over the elements in the chunk, which is why ``Map(list)`` is required to convert the chunks to printable lists. A more interesting example might be the sum of the elements within each chunk >>> Range(5) >> Chunk(2) >> Map(sum) >> Collect() [1, 5, 4] Window ------ ``Window(n)`` provides a sliding window of size ``n`` over the elements in the input data. For example: >>> [1, 2, 3, 4, 5] >> Window(3) >> Collect() [(1, 2, 3), (2, 3, 4), (3, 4, 5)] This works for strings as well. We use ``Join()`` to convert the individual characters in the generated windows to strings: >>> 'abcdefg' >> Window(4) >> Map(Join()) >> Collect() ['abcd', 'bcde', 'cdef', 'defg'] Cycle ----- Sometimes it is necessary to repeatedly process an iterable. ``Cycle`` takes all elements from its input iterable, stores them in memory and returns an iterator that cycles through the elements indefinitely. Here an example that cycles through 1, 2, 3 and takes the first 10 elements >>> [1, 2, 3] >> Cycle() >> Take(10) >> Collect() [1, 2, 3, 1, 2, 3, 1, 2, 3, 1] Note that ``Cycle`` will consume large amounts of memory if the input iterable is large. Permutate --------- ``Permutate([,r])`` returns successive ``r`` length permutations of the elements in the input iterable. >>> [1, 2, 3] >> Permutate(2) >> Collect() [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)] Maybe a more interesting example: What is the number of distinctive palindroms for a given string: >>> IsPalindrom = nut_filter(lambda x: x == x[::-1]) >>> 'devoved' >> Permutate() >> IsPalindrom() >> Collect(set) >> Count() 6 If no permutation size ``r`` is specified then all possible full-length permutations are generated (r!) and the computation will not finish in any reasonable time for non-small values of ``r`` ! Combine ------- ``Combine(r)`` return ``r`` length subsequences of the elements from the input iterable. >>> [1, 2, 3] >> Combine(2) >> Collect() [(1, 2), (1, 3), (2, 3)] Note that ``Combine(r)`` returns a subset of ``Permutate(r)`` with permutations where the order of the elements (as given in the input iterable) is preserved. Dedupe ------ A very common task is to remove all duplicates from a data set. ``Dedupe([key])`` performs this task and also takes a key function that defines which elements are treated as equal. ``Dedupe()`` preserves the order of the element in the input. See the following example >>> [2, 3, 1, 1, 2, 4] >> Dedupe() >> Collect() [2, 3, 1, 4] More complex data often require a more sophisticated definition of equality and the key functions provides this >>> data = [(1, 'a'), (2, 'a'), (3, 'b')] >>> data >> Dedupe(lambda (x, y): y) >> Collect() [(1, 'a'), (3, 'b')] ``Dedupe()`` memorizes all unique elements of the input iterable in a set and can potentially consume large amounts of memory!