Prerequisites¶
nuts-flow is based on iterators and makes frequent use of lambda functions. If you are already familiar with these concepts go ahead and skip this section.
Lambda functions¶
Commonly functions are defined via the def
keyword and a function name,
e.g.:
def add(a, b):
return a + b
Lambda functions or so called anonymous functions are an alternative method
to define very short functions (without a name) that are typically used only once.
For instance, the add
function above can be written as follows
lambda a, b: a + b
Since functions are first class citizens in Python they can be assigned to variables and called by name as well
>>> add = lambda a, b: a + b
>>> add(1, 2)
3
The most common use case, however, is as a anonymous function for other
functions such as sorted
, max
or filter
. For example,
to extract numbers greater than 2 from a list we could write
>>> numbers = [1, 2, 3, 4]
>>> filter(lambda x: x > 2, numbers)
[3, 4]
nuts-flow has a special notation for even shorter function definitions, following the underscore notation from Scala. Using the underscore, the above filtering can be expressed even more succinctly as
>>> from nutsflow import _
>>> filter(_ > 2, numbers)
[3, 4]
The underscore essentially serves as a place holder for the numbers of the list.
Note that the underscore notation in nuts-flow is very limited and only
simple expression (e.g. _ + 1
, _ <= 3
, …) are supported. More details
can be found in Section Underscore syntax .
Iterators¶
Iterators are needed to process data that doesn’t fit in memory, e.g. lines of a very large file, permutations of a string, …, or even infinitely large data such as counters or random numbers.
A Python Iterator is any object that
provides a next
method, which returns elements when called and raises a
StopIteration
exception when depleted. Here an iterator that returns
even numbers up to a given maximum
>>> class Even():
... def __init__(self, maximum):
... self.counter = 0
... self.maximum = maximum
...
... def __iter__(self):
... return self
...
... def __next__(self):
... self.counter += 2
... if self.counter > self.maximum:
... raise StopIteration
... return self.counter
...
The __iter__
method make the iterator iterable and enables
its usage in for
loops, list comprehensions or functions
that take iterables
>>> even = Even(6)
>>> for e in even:
... print e
2
4
6
There are three important properties of iterators to keep in mind.
Firstly, an iterator is lazy. It doesn’t produce anything until asked.
There needs to be a consumer.
For instance, even = Even(100000)
creates the iterator but does not
create any numbers.
Secondly, an iterator has state and subsequent calls will advance its state. Thirdly, once an iterator is depleted it needs to be recreated to be used again
>>> even = Even(10)
>>> [e for e in even]
[2, 4, 6, 8, 10]
>>> [e for e in even]
[]
>>> even = Even(10)
>>> [e for e in even]
[2, 4, 6, 8, 10]
Iterators can be chained to build complex data processing pipelines that consume very little memory. Python’s itertools library provides many functions for this purpose. The following toy example uses itertools to extract the first three integers greater than five in the interval [0..8[
>>> from itertools import islice, ifilter
>>> list(islice(ifilter(lambda x: x > 5, xrange(8)), 3))
[6, 7]
nuts-flow is largely based on Python’s itertools but aims to
make the data flow more explict and readable by introducing
the >>
operator for chaining
>>> from nutsflow import Range, Filter, Take, Collect, _
>>> Range(8) >> Filter(_ > 5) >> Take(3) >> Collect()
[6, 7]