Custom nuts

>>> from nutsflow import *

Cheat sheet

A quick overview on how to create custom nuts.

nut_function

  • input: single element

  • output: single element

  • note: number of elements in the data flow does not change

@nut_function
def Inc(element, by):
    return element + by

>>> Range(3) >> Inc(2) >> Collect()
[2, 3, 4]

nut_processor

  • input: iterable

  • output: iterable (preferably a generator)

  • note: number of elements in the data flow may change

@nut_processor
def MyClone(iterable, n):  # more outputs than inputs
    for e in iterable:
        for _ in range(n):
            yield e        # generator!

>>> Range(3) >> MyClone(3) >> Collect()
[0, 0, 0, 1, 1, 1, 2, 2, 2]
@nut_processor
def MyPick(iterable, n):  # less outputs than inputs
    for i, e in enumerate(iterable):
        if i % n == 0:
            yield e

>>> Range(9) >> MyPick(3) >> Collect()
[0, 3, 6]
@nut_processor
def Odd(iterable):
   return (e for e in iterable if e % 2)   # return <generator>!

>>> Range(9) >> Odd() >> Collect()
[1, 3, 5, 7]

nut_filter

  • input: single element

  • output: single element

  • note: number of elements in the data flow may change

@nut_filter
def InInterval(element, a, b):
    return a <= element <= b

>>> Range(10) >> InInterval(3, 6) >> Collect()
[3, 4, 5, 6]

nut_source

  • input: None

  • output: iterable

  • note: no input, must be at start of data flow

@nut_source
def MyRange(n):
    return range(1, n+1)

>>> MyRange(3) >> Collect()
[1, 2, 3]

nut_sink

  • input: iterable

  • output: any

  • note: processes/collects all data, should be at end of flow

@nut_sink
def ToFmtList(iterable, fmt):
    return [fmt % e for e in iterable]

>>> Range(3) >> ToFmtList('%02d')
['01', '02', '03']

Basics & Examples

nuts-flow can easily be extended with custom nuts using wrappers, decorators or derived classes. To clarify the differences between the approaches let us start with a simple filter. First, import nutsflow

>>> from nutsflow import *

then define a lambda function that returns True for elements greater than five

>>> greater_than_5 = lambda x: x > 5

and finally filter numbers using Filter and the defined lambda predicate

>>> Range(10) >> Filter(greater_than_5) >> Collect()
[6, 7, 8, 9]

By wrapping the lambda function via nut_filter, alternatively a custom filter nut can be created

>>> GreaterThan5 = nut_filter(lambda x: x > 5)

that operates the same way but can be directly used as a nut

>>> Range(10) >> GreaterThan5() >> Collect()
[6, 7, 8, 9]

Note the change from lowercase for greater_than_5 to uppercase for GreaterThan5 to signify the change from a Python function to a nuts-flow nut. This is strongly recommended to avoid confusion on how to use a function or nut. Nuts are generally in uppercase and invoked with brackets while Python functions are in lowercase, without brackets and passed on as parameters to nuts.

For instance, both of the following examples are invalid. Here greater_than_5 is confused as nut and invoked with brackets instead of being passed as a value to Filter

>>> Range(10) >> Filter(greater_than_5()) >> Collect()
Traceback (most recent call last):
...
TypeError: <lambda>() takes exactly 1 argument (0 given)

Similarily in the following example GreaterThan5 is invoked without brackets

>>> Range(10) >> GreaterThan5 >> Collect()
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for >>: 'Range' and 'type'

Wrappers such as nut_filter(...) are suitable for simple one-line functions with a single parameter but become less readable when additional parameters are required, e.g. filtering with a given threshold

GreaterThan = nut_filter(lambda x, threshold: x > threshold)
Range(10) >> GreaterThan(5) >> Collect()

In this case decorators are a better solution

@nut_filter
def GreaterThan(x, threshold):
    return x > threshold

Range(10) >> GreaterThan(5) >> Collect()

Invokation vs definition

Note that for wrappers and decorators there is a difference in the arguments depending on whether the nut is defined or invoked

definitions:

GreaterThan = nut_filter(lambda x, threshold: ...)
@nut_filter
def GreaterThan(x, threshold): ...

invokation:

x >> GreaterThan(threshold)

When invoked the first argument of the nut (here x) appears as input on the left side of the >> operator and the remaining parameters appear in brackets.

In rare (more advanced) cases custom nuts can be implemented as classes derived from the relevant base classes (see base.py). Here an example implementation of the GreaterThan nut as a class

class GreaterThan(Nut):
    def __init__(self, threshold):
        self.threshold = threshold

    def __rrshift__(self, iterable):  # >> operator
        for x in iterable:
            if x > self.threshold:
                yield x

However, decorators and wrappers are shortcuts to create nut classes and the preferred method to implement custom nuts.

Nut types

nuts-flow provides six different types of wrappers/decorators

nut_source

Typical cases for custom nut sources are the reading of files in specific formats or wrappers around databases. Here two toy examples for a wrapper and a decorator around a nut that generates n even numbers. First the wrapper approach

>>> EvenNumbers = nut_source(lambda n: (2*x for x in range(n)))

and here the decorator version

@nut_source
def EvenNumbers(n):
    return (2*x for x in range(n))

Both can be used as follows

>>> EvenNumbers(4) >> Collect()
[0, 2, 4, 6]

nut_sink

Sinks receive an iterable and can return any result (not necessarily an iterable). The following example re-implements the Join sink that already exists in nuts-flow using a wrapper

>>> Join = nut_sink(lambda it, sep: sep.join(map(str, it)))
>>> Range(5) >> Join(':')
'0:1:2:3:4'

or using the decorator method

@nut_sink
def Join(iterable, sep):
    return sep.join(map(str, iterable))

Note that while Join is a sink it returns an iterable (here a string) and can therefore serve as input to other nuts

>>> Range(5) >> Join(':') >> Count()
9

The general rule is, if a nut collects/aggregates data in memory or does not return an iterable result, it should be implemented as a sink (despite being able to be input to other nuts). On the other hand, if a nut processes data on-the-fly and returns an iterator it should not be a sink.

nut_function

A nut function is a nut that is applied to each element in the data flow and returns a result for each element. Consequently, when a nut function is applied to a data flow the values of the elements change but not their number. The following example function multiplies each element of the data flow by n

>>> Times = nut_function(lambda x, n: x * n)

and here the same function via a decorator

@nut_function
def Times(x, n):
    return x * n

Usage is identical for both the wrapper and the decorator

>>> Range(5) >> Times(2) >> Collect()
[0, 2, 4, 6, 8]

nut_processor

A nut processor takes an iterable and returns an iterable but the number of elements in the output iterable can differ - this is different to a nut_function. If the numbers don’t change both methods can be used but a nut_function will be simpler. For instance, here the Times nut re-implemented as a processor:

>>> Times = nut_processor(lambda iterable, n: (x * n for x in iterable))

Processors are needed if the number of elements in the flow changes, e.g. here a processor nut that duplicates each element of the flow

@nut_processor
def Duplicate(iterable):
    for e in iterable:
        yield e
        yield e

Range(5) >> Duplicate() >> Collect()
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4]

or more generic, a processor that clones each element n times

@nut_processor
def Clone(iterable, n):
    for e in iterable:
        for _ in range(n):
            yield e

Range(5) >> Clone(2) >> Collect()
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4]

Processors can be used to filter elements from a data flow but typically the filter nuts described next are more appropriate and easier to implement.

nut_filter

As described above, nut filters extract elements from a data flow. Here a nut that extracts all numbers that are in a given interval

>>> InInterval = nut_filter(lambda x, a, b: a <= x <= b)

and the same filter implemented using the decorator

@nut_filter
def InInterval(x, a, b):
    return a <= x <= b

and how it is used

>>> Range(10) >> InInterval(3, 6) >> Collect()
[3, 4, 5, 6]

nut_filterfalse

Occasionally it is easier to implement a filter that extracts element that are not meeting a given condition. The nut_filterfalse wrapper/decorator is available for this use case. For instance, the following nut filters out all elements that are not equal to given value

>>> Not = nut_filterfalse(lambda x, val: x == val)

or implemented via the decorator

@nut_filterfalse
def Not(x, val):
    return x == val

and a usage example

>>> [1, 2, 3, 4] >> Not(2) >> Collect()
[1, 3, 4]

nut_filterfalse is largely used to wrap existing predicate functions as nuts. For example, given a function isnull(x) we can simply write

IsValid = nut_filterfalse(isnull)

which is shorter and more readable than

IsValid = nut_filter(lambda x: not isnull(x))