Nuts basics¶
Flows¶
nuts-flow data pipelines are composed of nuts that
are chained together with the >>
operator. For instance, in the
following data flow, Range
generates number from 0 to 4, the Square
nut squares those numbers and the Collect
nut collects the results
in a list:
>>> from nutsflow import Range, Square, Collect
>>> Range(5) >> Square() >> Collect()
[0, 1, 4, 9, 16]
The data elements of a flow are typically processed element by element, avoiding loading large amounts of data into memory or processing data if not needed. For instance,
>>> from nutsflow import *
>>> Range(10000000) >> Square() >> Take(3) >> Collect()
[0, 1, 4]
works just fine and does not store 10 million integers in memory.
Sources and Sinks¶
Every data flow starts with a source, which can be any iterable such as iterators, generators, iterable nuts or plain Python data structures (string, lists, sets, dictionaries, …):
>>> [0, 1, 2, 3, 4] >> Square() >> Collect()
[0, 1, 4, 9, 16]
>>> "Macadamia" >> Take(4) >> Collect()
['M', 'a', 'c', 'a']
>>> range(5) >> Collect()
[0, 1, 3, 4, 4]
In addition to the usual Python data sources, nuts-flow has its own sources, e.g.
>>> Range(5) >> Collect() # Range() == range()
[0, 1, 3, 4, 4]
>>> Repeat(1) >> Take(3) >> Collect()
[1, 1, 1]
Apart from a source, every data flow needs a sink at the end that pulls the data. Without a sink the data flow does not process any data (most nuts are lazy). For example
>>> Range(5) >> Square()
<itertools.imap object at ...>
simply returns an iterator object but does neither create any ranged numbers nor computes the square. Sinks take iterables as input and return a result of any type or even nothing
>>> Range(5) >> Collect()
[0, 1, 2, 3, 4]
>>> Range(5) >> Sum()
10
>>> Range(5) >> Consume() # returns nothing
Here the sinks are Collect()
, Sum()
and Consume()
.
Functions and Processors¶
Between sources and sinks a data flow typically contains a sequence of
nut functions or nut processors. Nut functions read from an iterator
and for each processed element return a new element. Square
is such
a nut function.
Nut processors, on the other hand, can modulate the data flow and might return
more or less elements than read from the input. For instance, Pick(n)
is a processor that returns only every n-ths element from the input iterable
>>> from nutsflow import Range, Pick, Collect
>>> Range(10) >> Pick(3) >> Collect()
[0, 3, 6, 9]
Note that nut functions can be used as normal functions as well but must be called with additional brackets
>>> Square()(3)
9
Iterator depletion¶
It is important to remember that nuts usually return iterators
that will deplete when used multiple times. See the following example,
where Take(2)
always takes the first 2 elements from its input:
>>> from nutsflow import Range, Take, Collect
>>> numbers = Range(5)
>>> numbers >> Take(2) >> Collect()
[0, 1]
>>> numbers >> Take(2) >> Collect()
[2, 3]
>>> numbers >> Take(2) >> Collect()
[4]
>>> numbers >> Take(2) >> Collect()
[]
New nuts¶
nuts-flow can easily be extended with new nuts (for details see Custom nuts )
>>> Tripple = nut_function(lambda x: x * 3)
>>> Range(5) >> Tripple() >> Collect()
[0, 3, 6, 9, 12]
or combined with plain Python functions as any other iterator:
>>> def Squares(n): return Range(n) >> Square()
>>> Squares(3) >> Collect()
[0, 1, 4]
>>> sum(Range(5) >> Square())
30
When implementing new nuts, or Python functions/classes that behave like nuts, the name of the nut should start with an uppercase letter. This makes it easy to distiguish standard functions from nuts:
>>> from nutsflow import Range, Sum
>>> Range(5) >> Sum()
10
>>> sum(Range(5))
10
>>> range(5) >> Sum()
10
Line breaks¶
Sometimes data flows get longer than the 79 character limit that the Python style guide PEP 8 recommends. In such a case flows can be wrapped in brackets to allow for line breaks:
>>> (Range(10) >> Pick(2) >> Square() >> Square() >>
... Take(3) >> Collect())
[0, 16, 256]
Alternatively, a flow can be broken into shorter pieces:
>>> squared = Range(10) >> Pick(2) >> Square() >> Square()
>>> squared >> Take(3) >> Collect()
[0, 16, 256]
Summary¶
nuts-flows are composed of nuts that are connected to flows
via the >>
operator.
A data flow starts with a source, ends with a sink and
typically contains nut processors or nut functions inbetween:
source >> processor|function >> ... >> sink
nut sources return iterators or iterables when called. nut sinks take iterables as input and return results of any type. nut functions transform the elements of a flow but do not change the number (or order) of the elements, while nut processors can modify the flow in any way.