Prerequisites
=============
**nuts-flow** is based on *iterators* and makes frequent use of *lambda* functions.
If you are already familiar with these concepts go ahead and skip this section.
Lambda functions
----------------
Commonly functions are defined via the ``def`` keyword and a function name,
e.g.:
.. code:: pythonun
def add(a, b):
return a + b
*Lambda* functions or so called *anonymous* functions are an alternative method
to define very short functions (without a name) that are typically used only once.
For instance, the ``add`` function above can be written as follows
.. code:: python
lambda a, b: a + b
Since functions are first class citizens in Python they can be assigned
to variables and called by name as well
>>> add = lambda a, b: a + b
>>> add(1, 2)
3
The most common use case, however, is as a anonymous function for other
functions such as ``sorted``, ``max`` or ``filter``. For example,
to extract numbers greater than 2 from a list we could write
>>> numbers = [1, 2, 3, 4]
>>> filter(lambda x: x > 2, numbers)
[3, 4]
**nuts-flow** has a special notation for even shorter function definitions,
following the *underscore notation* from `Scala `_.
Using the underscore, the above filtering can be expressed
even more succinctly as
>>> from nutsflow import _
>>> filter(_ > 2, numbers)
[3, 4]
The underscore essentially serves as a place holder for the numbers of the list.
Note that the underscore notation in **nuts-flow** is very limited and only
simple expression (e.g. ``_ + 1``, ``_ <= 3``, ...) are supported. More details
can be found in Section :ref:`Underscore syntax` .
Iterators
---------
Iterators are needed to process data that doesn't fit in memory, e.g. lines of a
very large file, permutations of a string, ..., or even infinitely large data such
as counters or random numbers.
A Python `Iterator `_ is any object that
provides a ``next`` method, which returns elements when called and raises a
``StopIteration`` exception when depleted. Here an iterator that returns
even numbers up to a given maximum
>>> class Even():
... def __init__(self, maximum):
... self.counter = 0
... self.maximum = maximum
...
... def __iter__(self):
... return self
...
... def __next__(self):
... self.counter += 2
... if self.counter > self.maximum:
... raise StopIteration
... return self.counter
...
The ``__iter__`` method make the iterator *iterable* and enables
its usage in ``for`` loops, list comprehensions or functions
that take iterables
>>> even = Even(6)
>>> for e in even:
... print e
2
4
6
There are three important properties of iterators to keep in mind.
Firstly, an iterator is lazy. It doesn't produce anything until asked.
There needs to be a consumer.
For instance, ``even = Even(100000)`` creates the iterator but does not
create any numbers.
Secondly, an iterator has state and subsequent calls will advance its state.
Thirdly, once an iterator is depleted it needs to be recreated to be used
again
>>> even = Even(10)
>>> [e for e in even]
[2, 4, 6, 8, 10]
>>> [e for e in even]
[]
>>> even = Even(10)
>>> [e for e in even]
[2, 4, 6, 8, 10]
Iterators can be chained to build complex data processing pipelines
that consume very little memory. Python's
`itertools `_
library provides many functions for this purpose. The following toy
example uses itertools to extract the first three integers greater
than five in the interval [0..8[
>>> from itertools import islice, ifilter
>>> list(islice(ifilter(lambda x: x > 5, xrange(8)), 3))
[6, 7]
**nuts-flow** is largely based on Python’s itertools but aims to
make the data flow more explict and readable by introducing
the ``>>`` operator for chaining
>>> from nutsflow import Range, Filter, Take, Collect, _
>>> Range(8) >> Filter(_ > 5) >> Take(3) >> Collect()
[6, 7]