Nuts basics

Flows

nuts-flow data pipelines are composed of nuts that are chained together with the >> operator. For instance, in the following data flow, Range generates number from 0 to 4, the Square nut squares those numbers and the Collect nut collects the results in a list:

>>> from nutsflow import Range, Square, Collect
>>> Range(5) >> Square() >> Collect()
[0, 1, 4, 9, 16]

The data elements of a flow are typically processed element by element, avoiding loading large amounts of data into memory or processing data if not needed. For instance,

>>> from nutsflow import *
>>> Range(10000000) >> Square() >> Take(3) >> Collect()
[0, 1, 4]

works just fine and does not store 10 million integers in memory.

Sources and Sinks

Every data flow starts with a source, which can be any iterable such as iterators, generators, iterable nuts or plain Python data structures (string, lists, sets, dictionaries, …):

>>> [0, 1, 2, 3, 4] >> Square() >> Collect()
[0, 1, 4, 9, 16]
>>> "Macadamia" >> Take(4) >> Collect()
['M', 'a', 'c', 'a']
>>> range(5) >> Collect()
[0, 1, 3, 4, 4]

In addition to the usual Python data sources, nuts-flow has its own sources, e.g.

>>> Range(5) >> Collect()   # Range() == range()
[0, 1, 3, 4, 4]
>>> Repeat(1) >> Take(3) >> Collect()
[1, 1, 1]

Apart from a source, every data flow needs a sink at the end that pulls the data. Without a sink the data flow does not process any data (most nuts are lazy). For example

>>> Range(5) >> Square()
<itertools.imap object at ...>

simply returns an iterator object but does neither create any ranged numbers nor computes the square. Sinks take iterables as input and return a result of any type or even nothing

>>> Range(5) >> Collect()
[0, 1, 2, 3, 4]
>>> Range(5) >> Sum()
10
>>> Range(5) >> Consume()  # returns nothing

Here the sinks are Collect(), Sum() and Consume().

Functions and Processors

Between sources and sinks a data flow typically contains a sequence of nut functions or nut processors. Nut functions read from an iterator and for each processed element return a new element. Square is such a nut function.

Nut processors, on the other hand, can modulate the data flow and might return more or less elements than read from the input. For instance, Pick(n) is a processor that returns only every n-ths element from the input iterable

>>> from nutsflow import Range, Pick, Collect
>>> Range(10) >> Pick(3) >> Collect()
[0, 3, 6, 9]

Note that nut functions can be used as normal functions as well but must be called with additional brackets

>>> Square()(3)
9

Iterator depletion

It is important to remember that nuts usually return iterators that will deplete when used multiple times. See the following example, where Take(2) always takes the first 2 elements from its input:

>>> from nutsflow import Range, Take, Collect
>>> numbers = Range(5)
>>> numbers >> Take(2) >> Collect()
[0, 1]
>>> numbers >> Take(2) >> Collect()
[2, 3]
>>> numbers >> Take(2) >> Collect()
[4]
>>> numbers >> Take(2) >> Collect()
[]

New nuts

nuts-flow can easily be extended with new nuts (for details see Custom nuts )

>>> Tripple = nut_function(lambda x: x * 3)
>>> Range(5) >> Tripple() >> Collect()
[0, 3, 6, 9, 12]

or combined with plain Python functions as any other iterator:

>>> def Squares(n): return Range(n) >> Square()
>>> Squares(3) >> Collect()
[0, 1, 4]
>>> sum(Range(5) >> Square())
30

When implementing new nuts, or Python functions/classes that behave like nuts, the name of the nut should start with an uppercase letter. This makes it easy to distiguish standard functions from nuts:

>>> from nutsflow import Range, Sum
>>> Range(5) >> Sum()
10
>>> sum(Range(5))
10
>>> range(5) >> Sum()
10

Line breaks

Sometimes data flows get longer than the 79 character limit that the Python style guide PEP 8 recommends. In such a case flows can be wrapped in brackets to allow for line breaks:

>>> (Range(10) >> Pick(2) >> Square() >> Square() >>
... Take(3) >> Collect())
[0, 16, 256]

Alternatively, a flow can be broken into shorter pieces:

>>> squared = Range(10) >> Pick(2) >> Square() >> Square()
>>> squared >> Take(3) >> Collect()
[0, 16, 256]

Summary

nuts-flows are composed of nuts that are connected to flows via the >> operator. A data flow starts with a source, ends with a sink and typically contains nut processors or nut functions inbetween:

source >> processor|function >> ... >> sink

nut sources return iterators or iterables when called. nut sinks take iterables as input and return results of any type. nut functions transform the elements of a flow but do not change the number (or order) of the elements, while nut processors can modify the flow in any way.