nutsflow package

Submodules

nutsflow.base module

class Nut(*args, **kwargs)[source]

Bases: object

Base class for all Nuts. Iterables or functions wrapped in Nuts can be chained using the ‘>>’ operator. The aim is code with an explicit data flow. See the following example using Python iterators versus Nuts:

>>> from six.moves import filter, range
>>> from itertools import islice
>>> list(islice(filter(lambda x: x > 5, range(10)), 3))
[6, 7, 8]
>>> from nutsflow import Range, Filter, Take, Collect, _
>>> Range(10) >> Filter(_ > 5) >> Take(3) >> Collect()
[6, 7, 8]
__call__(iterable)[source]

Nut (processor) can be called as a function and mapped on iterable elements within an iterable.

Parameters

iterable (iterable) – Iterable to process.

Returns

Iterable

Return type

iterable

__init__(*args, **kwargs)[source]

Constructor. Nuts (and derived classes) can have arbitrary arguments.

Parameters
  • args (args) – Positional arguments.

  • kwargs (kwargs) – Keyword arguments.

__rrshift__(iterable)[source]

Chaining operator for Nuts. Needs to be overridden!

Takes an input iterable and produces some output iterable. If the number of elements in the input and the output iterable does not change consider NutFunction instead.

Parameters

iterable (iterable) – Iterable to process.

Returns

Iterable

Return type

iterable

Raise

NotImplementedError if not implemented.

class NutFunction(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

Nut functions are are mapped onto each element of the input iterable.

Example: Square is a Nut function

>>> from nutsflow import Square, Collect, _
>>> [1,2,3] >> Square() >> Collect()
[1, 4, 9]
__call__(element)[source]

Override this method to transform the elements of an iterable.

Parameters

element – Element the function is applied to.

Returns

A transformed element

Return type

any

Raise

NotImplementedError if not implemented.

__rrshift__(iterable)[source]

Map function onto iterable and return transformed iterable. Do not override!

Parameters

iterable – function is applied to the elements of the iterable.

Returns

transformed iterable.

Return type

iterable

class NutSink(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

Sinks are nuts that typically consume the entire input stream.

Sinks are typically at the end of a flow and aggregate the flow to a single output, e.g. the sum of its elements. Need to override __rrshift__()!

__call__(iterable)[source]

Sinks can serve as functions applied to iterables within a flow.

Parameters

iterable – Sink takes iterable as input

Returns

Output of sink

Return type

any

class NutSource(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

Sources are nuts that have no input iterable but produce an output iterable.

__rrshift__(iterable)[source]

Raises an exception when called. Sources have not input! Do not override! Override __iter__() instead.

Parameters

iterable (iterable) – Iterable

Raise

SyntaxError if called.

nutsflow.common module

class Redirect(channel='STDOUT')[source]

Bases: object

Redirect stdout or stderr to string.

>>> with Redirect() as out:
...     print('test')
>>> print(out.getvalue())
test
>>> with Redirect('STDERR') as out:
...     print('error', file=sys.stderr)
>>> print(out.getvalue())
error
__init__(channel='STDOUT')[source]

Initialize self. See help(type(self)) for accurate signature.

class StableRandom(seed=None)[source]

Bases: random.Random

A pseudo random number generator that is stable across Python 2.x and 3.x. Use this only for unit tests or doctests. This class is derived from random.Random and supports all methods of the base class.

>>> rand = StableRandom(0)
>>> rand.random()
0.5488135024320365
>>> rand.randint(1, 10)
6
>>> lst = [1, 2, 3, 4, 5]
>>> rand.shuffle(lst)
>>> lst
[1, 3, 2, 5, 4]
__init__(seed=None)[source]

Initialize random number generator.

Parameters

seed (None|int) – Seed. If None the system time is used.

gauss_next()[source]

Return next gaussian random number.

Returns

Random number sampled from gaussian distribution.

Return type

float

getstate()[source]

Return state of generator.

Returns

Index and Mersenne Twister array.

Return type

tuple

jumpahead(n)[source]

Set state of generator far away from current state.

Parameters

n (int) – Distance to jump.

random()[source]

Return next random number in [0,1[

seed(seed=None)[source]

Set seed.

Parameters

seed (None|int) – Seed. If None the system time is used.

setstate(state)[source]

Set state of generator.

Parameters

state (tuple) – State to set as produced by getstate()

class Timer(fmt='%M:%S')[source]

Bases: object

A simple timer with a resolution of a second.

t = Timer(fmt="Duration: %M:%S")
time.sleep(2)  # something that takes some time, here 2 seconds
print(t)  --> "Duration: 00:02"
with Timer() as t:
    time.sleep(2)
print(t)  --> "00:02"
__init__(fmt='%M:%S')[source]

Creates a timer with the given time string format.

Parameters

fmt (str) – Format for time string, see time.strftime for details.

start()[source]

Starts the timer.

Note that the construction of Timer() already starts the timer.

Returns

None

stop()[source]

Stops the timer.

Returns

None

as_list(x)[source]

Return x as list.

If x is a single item it gets wrapped into a list otherwise it is changed to a list, e.g. tuple => list

Parameters

or iterable x (item) – Any item or iterable

Returns

list(x)

Return type

list

as_set(x)[source]

Return x as set.

If x is a single item it gets wrapped into a set otherwise it is changed to a set, e.g. list => set

Parameters

or iterable x (item) – Any item or iterable

Returns

set(x)

Return type

set

as_tuple(x)[source]

Return x as tuple.

If x is a single item it gets wrapped into a tuple otherwise it is changed to a tuple, e.g. list => tuple

Parameters

or iterable x (item) – Any item or iterable

Returns

tuple(x)

Return type

tuple

colfunc(key)[source]

Return function that extracts element from columns.

Used to create key functions when only column index or tuple of column indices is given. For instance:

>>> data = ['a3', 'c1', 'b2']
>>> sorted(data, key=colfunc(0))  # == sorted(data, key=lamda s:s[0])
['a3', 'b2', 'c1']
>>> sorted(data, key=colfunc(1))
['c1', 'b2', 'a3']
>>> list(map(colfunc((1,0)), data))
[['3', 'a'], ['1', 'c'], ['2', 'b']]
Parameters

key (function|None) – function or None. If None the identity function is returned

Returns

Column extraction function.

Return type

function

console(*args, **kwargs)[source]

Print to stdout and flush.

Wrapper around Python’s print function that ensures flushing after each call.

>>> console('test')
test
Parameters
  • args – Arguments

  • kwargs – Key-Word arguments.

is_iterable(obj)[source]

Return true if object has iterator but is not a string

Parameters

obj (object) – Any object

Returns

True if object is iterable but not a string.

Return type

bool

isnan(x)[source]

Check if something is NaN.

>>> import numpy as np
>>> isnan(np.NaN)
True
>>> isnan(0)
False
Parameters

x (object) – Any object

Returns

True if x is NaN

Return type

bool

istensor(x, attrs=['shape', 'dtype', 'min', 'max'])[source]

Return true if x has shape, dtype, min and max.

Will be true for Numpy and PyTorch tensors.

>>> import numpy as np
>>> M = np.zeros((2,3))
>>> istensor(M)
True
>>> istensor([1,2,3])
False
Parameters
  • x (object) – Any object

  • attrs (list[str]) – Object attributes that ‘define’ a tensor.

Returns

True if x is some tensor object.

itemize(x)[source]

Extract item from a list/tuple with only one item.

>>> itemize([3])
3
>>> itemize([3, 2, 1])
[3, 2, 1]
>>> itemize([])
[]
Parameters

x (list|tuple) – An indexable collection

Returns

Return item in collection if there is only one, else returns the collection.

Return type

object|list|tuple

print_type(data)[source]

Print type of (structured) data

Useful when printing structured data types that contain (large) NumPy matrices or PyTorch/Tensorflow tensors.

>>> import numpy as np
>>> from nutsflow import Consume, Take
>>> a = np.zeros((3, 4), dtype='uint8')
>>> data = [[a], (1.1, 2)]
>>> print_type(data)
[[<ndarray> 3x4:uint8], (<float> 1.1, <int> 2)]
>>> from collections import namedtuple
>>> Sample = namedtuple('Sample', 'x,y')
>>> data = Sample(a, 1)
>>> print_type(data)
Sample(x=<ndarray> 3x4:uint8, y=<int> 1)
Parameters

data (object) – Any data type.

Returns

Structured representation of the data,type.

Return type

str

sec_to_hms(duration)[source]

Return hours, minutes and seconds for given duration.

>>> sec_to_hms('80')
(0, 1, 20)
Parameters

duration (int|str) – Duration in seconds. Can be int or string.

Returns

tuple (hours, minutes, seconds)

Return type

(int, int, int)

shapestr(array, with_dtype=False)[source]

Return string representation of array shape.

>>> import numpy as np
>>> a = np.zeros((3,4))
>>> shapestr(a)
'3x4'
>>> a = np.zeros((3,4), dtype='uint8')
>>> shapestr(a, True)
'3x4:uint8'
Parameters
  • array (ndarray) – Numpy array

  • with_dtype (bool) – Append dtype of array to shape string

Returns

Shape as string, e.g shape (3,4) becomes 3x4

Return type

str

stype(obj)[source]

Return string representation of structured objects.

>>> import numpy as np
>>> a = np.zeros((3,4), dtype='uint8')
>>> b = np.zeros((1,2), dtype='float32')
>>> stype(a)
'<ndarray> 3x4:uint8'
>>> stype(b)
'<ndarray> 1x2:float32'
>>> stype([a, (b, b)])
'[<ndarray> 3x4:uint8, (<ndarray> 1x2:float32, <ndarray> 1x2:float32)]'
>>> stype([1, 2.0, [a], [b]])
'[<int> 1, <float> 2.0, [<ndarray> 3x4:uint8], [<ndarray> 1x2:float32]]'
>>> stype({'a':a, 'b':b, 'c':True})
'{a:<ndarray> 3x4:uint8, b:<ndarray> 1x2:float32, c:<bool> True}'
>>> from collections import namedtuple
>>> Sample = namedtuple('Sample', 'x,y')
>>> sample = Sample(a, 1)
>>> stype(sample)
'Sample(x=<ndarray> 3x4:uint8, y=<int> 1)'
Parameters

obj (object) – Any object

Returns

String representation of object where arrays are replace by their shape and dtype descriptions

Return type

str

timestr(duration, fmt='{:d}:{:02d}:{:02d}')[source]

Return duration as formatted time string or empty string if no duration

>>> timestr('80')
'0:01:20'
Parameters
  • duration (int|str) – Duration in seconds. Can be int or string.

  • str – Format for string, e.g. ‘{:d}:{:02d}:{:02d}’

Returns

duration as formatted time, e.g. ‘0:01:20’ or ‘’ if duration shorter than one second.

Return type

string

nutsflow.factory module

nut_filter(func)[source]

Decorator for Nut filters.

Also see nut_filerfalse(). Example on how to define a custom filter nut:

@nut_filter
def Positive(x):
    return x > 0

[-1, 1, -2, 2] >> Positive() >> Collect()  --> [1, 2]
@nut_filter
def GreaterThan(x, threshold):
    return x > threshold

[1, 2, 3, 4] >> GreaterThan(2) >> Collect()  --> [3, 4] 
Parameters

func (function) – Function to decorate. Must return boolean value.

Returns

Nut filter for given function

Return type

Nut

nut_filterfalse(func)[source]

Decorator for Nut filters that are inverted.

Also see nut_filter(). Example on how to define a custom filter-false nut:

@nut_filterfalse
def NotGreaterThan(x, threshold):
    return x > threshold

[1, 2, 3, 4] >> NotGreaterThan(2) >> Collect()  --> [1, 2]
Parameters

func (function) – Function to decorate

Returns

Nut filter for given function. . Must return boolean value.

Return type

Nut

nut_function(func)[source]

Decorator for Nut functions.

Example on how to define a custom function nut:

@nut_function
def TimesN(x, n):
    return x * n

[1, 2, 3] >> TimesN(2) >> Collect()  -->  [2, 4, 6]
Parameters

func (function) – Function to decorate

Returns

Nut function for given function

Return type

NutFunction

nut_processor(func, iterpos=0)[source]

Decorator for Nut processors.

Examples on how to define a custom processor nut. Note that a processor reads an iterable and must return an iterable/generator

@nut_processor
def Twice(iterable):
    for e in iterable:
        yield e
        yield e

[1, 2, 3] >> Twice() >> Collect()  --> [1, 1, 2, 2, 3, 3]
@nut_processor
def Odd(iterable):
    return (e for e in iterable if e % 2)

[1, 2, 3, 4, 5] >> Odd() >> Collect()  --> [1, 3, 5]
@nut_processor
def Clone(iterable, n):
    for e in iterable:
        for _ in range(p):
            yield e

[1, 2, 3] >> Clone(2) >> Collect()  --> [1, 1, 2, 2, 3, 3]
Parameters
  • func (function) – Function to decorate

  • iterpos – Position of iterable in function arguments

Returns

Nut processor for given function

Return type

Nut

nut_sink(func, iterpos=0)[source]

Decorator for Nut sinks.

Example on how to define a custom sink nut:

@nut_sink
def ToList(iterable):
    return list(iterable)

range(5) >> ToList()  -->   [0, 1, 2, 3, 4]
@nut_sink
def MyCollect(iterable, container):
    return container(iterable)

range(5) >> MyCollect(tuple)  -->   (0, 1, 2, 3, 4)
@nut_sink
def MyProd(iterable):
    p = 1
    for e in iterable:
        p *= e
    return p

[1, 2, 3] >> MyProd()  --> 12
Parameters
  • func (function) – Function to decorate

  • iterpos – Position of iterable in function arguments

Returns

Nut sink for given function

Return type

NutSink

nut_source(func)[source]

Decorator for Nut sources.

Example on how to define a custom source nut. Note that a source must return an iterable/generator and does not read any input.

@nut_source
def MyRange(start, end):
    return range(start, end)

MyRange(0, 5) >> Collect()  --> [0, 1, 2, 3, 4]
@nut_source
def MyRange2(start, end):
    for i in range(start, end):
        yield i * 2

MyRange2(0, 5) >> Collect()  --> [0, 2, 4, 6, 8]
Parameters

func (function) – Function to decorate

Returns

Nut source for given function

Return type

NutSource

nutsflow.function module

class Counter(name, filterfunc=<function Counter.<lambda>>, value=0)[source]

Bases: nutsflow.base.NutFunction

Increment counter depending on elements in iterable. Intended mostly for debugging and monitoring. Avoid for standard processing of data. The function has side-effects but is thread-safe.

__call__(x)[source]

Increment counter.

Parameters

x (object) – Element in iterable

Returns

Unchanged element

Return type

Any

__init__(name, filterfunc=<function Counter.<lambda>>, value=0)[source]

counter = Counter(name, filterfunc, value) iterable >> counter

>>> from nutsflow import Consume
>>> counter = Counter('smallerthan3', lambda x: x < 3, 1)
>>> range(10) >> counter >> Consume()
>>> counter
smallerthan3 = 4
Parameters
  • name (str) – Name of the counter

  • filterfunc (func) – Filter function. Count only elements where func returns True.

  • value (int) – Initial counter value

reset(value=0)[source]

Reset counter to given value.

Parameters

value (int) – Reset value

Format(x, fmt)[source]

iterable >> Format(fmt)

Return input as formatted string. For format definition see: https://docs.python.org/2/library/string.html

>>> from nutsflow import Collect
>>> [1, 2, 3] >> Format('num:{}') >> Collect()
['num:1', 'num:2', 'num:3']
>>> [(1, 2), (3, 4)] >> Format('{0}:{1}') >> Collect()
['1:2', '3:4']
Parameters
  • iterable (iterable) – Any iterable

  • fmt (string) – Formatting string, e.g. ‘{:02d}’

Returns

Returns inputs as strings formatted as specified

Return type

str

Get(x, start, end=None, step=None)[source]

iterable >> Get(start, end, step)

Extract elements from iterable. Equivalent to slicing [start:end:step] but per element of the iterable.

>>> from nutsflow import Collect
>>> [(1, 2, 3), (4, 5, 6)] >> Get(1) >> Collect()
[2, 5]
>>> [(1, 2, 3), (4, 5, 6)] >> Get(0, 2) >> Collect()
[(1, 2), (4, 5)]
>>> [(1, 2, 3), (4, 5, 6)] >> Get(0, 3, 2) >> Collect()
[(1, 3), (4, 6)]
>>> [(1, 2, 3), (4, 5, 6)] >> Get(None) >> Collect()
[(1, 2, 3), (4, 5, 6)]
Parameters
  • iterable (iterable) – Any iterable

  • x (indexable) – Any indexable input

  • start (int) – Start index for columns to extract from x If start = None, x is returned

  • end (int) – End index (not inclusive)

  • step (int) – Step index (same as slicing)

Returns

Extracted elements

Return type

object|list

GetCols(x, *columns)[source]

iterable >> GetCols(*columns)

Extract elements in given order from x. Also useful to change the order of or clone elements in x.

>>> from nutsflow import Collect
>>> [(1, 2, 3), (4, 5, 6)] >> GetCols(1) >> Collect()
[(2,), (5,)]
>>> [[1, 2, 3], [4, 5, 6]] >> GetCols(2, 0) >> Collect()
[(3, 1), (6, 4)]
>>> [[1, 2, 3], [4, 5, 6]] >> GetCols((2, 0)) >> Collect()
[(3, 1), (6, 4)]
>>> [(1, 2, 3), (4, 5, 6)] >> GetCols(2, 1, 0) >> Collect()
[(3, 2, 1), (6, 5, 4)]
>>> [(1, 2, 3), (4, 5, 6)] >> GetCols(1, 1) >> Collect()
[(2, 2), (5, 5)]
Parameters
  • iterable (iterable) – Any iterable

  • container x (indexable) – Any indexable input

  • columns (int|tuple|args) – Indicies of elements/columns in x to extract or a tuple with these indices.

Returns

Extracted elements

Return type

tuple

Identity(x)[source]

iterable >> Identity()

Pass iterable through. Output is identical to input.

>>> from nutsflow import Collect
>>> [1, 2, 3] >> Identity() >> Collect()
[1, 2, 3]
Parameters
  • iterable (iterable) – Any iterable

  • x (any) – Any input

Returns

Returns input unaltered

Return type

object

NOP(x, *args)[source]

iterable >> Nop(*args)

No Operation. Useful to skip nuts. Same as commenting a nut out or removing it from a pipeline.

>>> from nutsflow import Collect
>>> [1, 2, 3] >> NOP(Square()) >> Collect()
[1, 2, 3]
Parameters
  • iterable (iterable) – Any iterable

  • x (object) – Any object

  • args (args) – Additional args are ignored.

Returns

Squared number

Return type

number

class Print(fmtfunc=None, every_sec=0, every_n=0, filterfunc=<function Print.<lambda>>, end='\n')[source]

Bases: nutsflow.base.NutFunction

Print elements in iterable.

__call__(x)[source]

Return element x and potentially print its value

__init__(fmtfunc=None, every_sec=0, every_n=0, filterfunc=<function Print.<lambda>>, end='\n')[source]
iterable >> Print(fmtfunc=None, every_sec=0, every_n=0,

filterfunc=lambda x: True)

Return same input as console but print for each element.

>>> from nutsflow import Consume
>>> [1, 2] >> Print() >> Consume()
1
2
>>> range(10) >> Print(every_n=3) >> Consume()
2
5
8
>>> even = lambda x: x % 2 == 0
>>> [1, 2, 3, 4] >> Print(filterfunc=even) >> Consume()
2
4
>>> [{'val': 1}, {'val': 2}] >> Print('number={val}') >> Consume()
number=1
number=2
>>> [[1, 2], [3, 4]] >> Print('number={1}:{0}') >> Consume()
number=2:1
number=4:3
>>> myfmt = lambda x: 'char='+x.upper()
>>> ['a', 'b'] >> Print(myfmt) >> Consume()
char=A
char=B
>>> range(5) >> Print('.', end=' ') >> Consume()
. . . . .
Parameters
  • x (object) – Any input

  • fmtfunc (string|function) – Format string or function. fmtfunc is a standard Python str.format() string, see https://docs.python.org/2/library/string.html or a function that returns a string.

  • every_sec (float) – Print every given second, e.g. to print every 2.5 sec every_sec = 2.5

  • every_n (int) – Print every n-th call.

  • end (str) – Ending of text printed.

  • filterfunc (function) – Boolean function to filter print.

Returns

Returns input unaltered

Return type

object

Raise

ValueError if fmtfunc is not string or function

class PrintColType(cols=None)[source]

Bases: nutsflow.base.NutFunction

__call__(data)[source]

Print data info.

Parameters

data (any) – Any type of iterable

Returns

data unchanged

Return type

same as data

__init__(cols=None)[source]

iterable >> PrintColType()

Print type and other information for column data (tuples).

>>> import numpy as np
>>> from nutsflow import Consume
>>> data = [(np.zeros((10, 20, 3)), 1), ('text', 2), 3]
>>> data >> PrintColType() >> Consume()
item 0: <tuple>
  0: <ndarray> shape:10x20x3 dtype:float64 range:0.0..0.0
  1: <int> 1
item 1: <tuple>
  0: <str> text
  1: <int> 2
item 2: <int>
  0: <int> 3
>>> [(1, 2), (3, 4)] >> PrintColType(1) >> Consume()
item 0: <tuple>
  1: <int> 2
item 1: <tuple>
  1: <int> 4
>>> from collections import namedtuple
>>> Sample = namedtuple('Sample', 'x,y')
>>> a = np.zeros((3, 4), dtype='uint8')
>>> b = np.ones((1, 2), dtype='float32')
>>> data = [Sample(a, 1), Sample(b, 2)]
>>> data >> PrintColType() >> Consume()
item 0: <Sample>
  x: <ndarray> shape:3x4 dtype:uint8 range:0..0
  y: <int> 1
item 1: <Sample>
  x: <ndarray> shape:1x2 dtype:float32 range:1.0..1.0
  y: <int> 2
Parameters

cols (int|tuple|None) – Indices of columnbs to show info for. None means all columns. Can be a single index or a tuple of indices.

Returns

input data unchanged

Return type

same as input data

class PrintType(prefix='')[source]

Bases: nutsflow.base.NutFunction

__call__(data)[source]

Print data info.

Parameters

data (object) – Any object.

Returns

data unchanged

Return type

same as object

__init__(prefix='')[source]

iterable >> PrintType()

Print type and shape information for structured data. This is especially useful for data containing (large) Numpy arrays or Pytorch/Tensorflow tensors.

>>> import numpy as np
>>> from nutsflow import Consume, Take
>>> a = np.zeros((3, 4), dtype='uint8')
>>> b = np.zeros((1, 2), dtype='float32')
>>> data = [(a, b), 1.1, [[a], 2]]
>>> data >> PrintType() >> Consume()
(<ndarray> 3x4:uint8, <ndarray> 1x2:float32)
<float> 1.1
[[<ndarray> 3x4:uint8], <int> 2]
>>> data >> Take(1) >> PrintType('dtype:') >> Consume()
dtype: (<ndarray> 3x4:uint8, <ndarray> 1x2:float32)
>>> from collections import namedtuple
>>> Sample = namedtuple('Sample', 'x,y')
>>> data = [Sample(a, 1), Sample(b, 2)]
>>> data >> PrintType() >> Consume()
Sample(x=<ndarray> 3x4:uint8, y=<int> 1)
Sample(x=<ndarray> 1x2:float32, y=<int> 2)

Note that there is also a function print_type() that allows to print individual data elements instead of data streams.

>>> data = [{'mat':a}, 2]
>>> print_type(data)
[{mat:<ndarray> 3x4:uint8}, <int> 2]
Parameters

prefix (str) – Prefix text printed before type

Returns

input data unchanged

Return type

same as input data

Sleep(x, duration=1)[source]

iterable >> Sleep(duration)

Return same input as console but sleep for each element.

>>> from nutsflow import Collect
>>> [1, 2, 3] >> Sleep(0.1) >> Collect()
[1, 2, 3]
Parameters
  • iterable (iterable) – Any iterable

  • x (object) – Any input

  • duration (float) – Sleeping time in seconds.

Returns

Returns input unaltered

Return type

object

Square(x)[source]

iterable >> Square()

Return squared input.

>>> from nutsflow import Collect
>>> [1, 2, 3] >> Square() >> Collect()
[1, 4, 9]
Parameters
  • iterable (iterable) – Any iterable over numbers

  • x (number) – Any number

Returns

Squared number

Return type

number

nutsflow.iterfunction module

class PrefetchIterator(iterable, num_prefetch=1)[source]

Bases: threading.Thread, object

Wrap an iterable in an iterator that prefetches elements.

Typically used to fetch samples or batches while the the GPU processes the batch. Keeps the CPU busy pre-processing data and not waiting for the GPU to finish the batch.

>>> from __future__ import print_function
>>> for i in PrefetchIterator(range(4)):
...    print(i)
0
1
2
3
__init__(iterable, num_prefetch=1)[source]

Constructor.

Parameters
  • iterable (iterable) – Iterable elements are fetched from.

  • num_prefetch (int) – Number of elements to pre-fetch.

run()[source]

Put elements in input iterable into queue.

chunked(iterable, n)[source]

Split iterable in chunks of size n, where each chunk is also an iterator.

for chunk in chunked(range(10), 3):
for element in chunk:

print element

>>> it = chunked(range(7), 2)
>>> list(map(tuple, it))
[(0, 1), (2, 3), (4, 5), (6,)]
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • n – Chunk size

Returns

Chunked iterable

Return type

Iterator over iterators

consume(iterable, n=None)[source]

Consume n elements of the iterable.

>>> it = iter([1,2,3,4])
>>> consume(it, 2)
>>> next(it)
3

See https://docs.python.org/2/library/itertools.html

Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • n – Number of elements to consume. For n=None all are consumed.

flatmap(func, iterable)[source]

Map function to iterable and flatten.

>>> f = lambda n: str(n) * n
>>> list( flatmap(f, [1, 2, 3]) )
['1', '2', '2', '3', '3', '3']
>>> list( map(f, [1, 2, 3]) )  # map instead of flatmap
['1', '22', '333']
Parameters
  • func (function) – Function to map on iterable.

  • iterable (iterable) – Any iterable, e.g. list, range, …

Returns

Iterator of iterable elements transformed via func and flattened.

Return type

Iterator

flatten(iterable)[source]

Return flattened iterable.

>>> list(flatten([(1,2), (3,4,5)]))
[1, 2, 3, 4, 5]
Parameters

iterable (iterable) –

Returns

Iterator over flattened elements of iterable

Return type

Iterator

interleave(*iterables)[source]

Return generator that interleaves the elements of the iterables.

>>> list(interleave(range(5), 'abc'))
[0, 'a', 1, 'b', 2, 'c', 3, 4]
>>> list(interleave('12', 'abc', '+-'))
['1', 'a', '+', '2', 'b', '-', 'c']
Parameters

iterables (iterable) – Collection of iterables, e.g. lists, range, …

Returns

Interleaved iterables.

Return type

iterator

length(iterable)[source]

Return number of elements in iterable. Consumes iterable!

>>> length(range(10))
10
Parameters

iterable (iterable) – Any iterable, e.g. list, range, …

Returns

Length of iterable.

Return type

int

nth(iterable, n, default=None)[source]

Return n-th element of iterable. Consumes iterable!

>>> nth(range(10), 2)
2
>>> nth(range(10), 100, default=-1)
-1

https://docs.python.org/2/library/itertools.html#itertools.islice

Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • n – Index of element to retrieve.

  • default – Value to return when iterator is depleted

Returns

nth element

Return type

Any or default value.

partition(iterable, pred)[source]

Split iterable into two partitions based on predicate function

>>> pred = lambda x: x < 6
>>> smaller, larger = partition(range(10), pred)
>>> list(smaller)
[0, 1, 2, 3, 4, 5]
>>> list(larger)
[6, 7, 8, 9]
Parameters
  • iterable – Any iterable, e.g. list, range, …

  • pred – Predicate function.

Returns

Partition iterators

Return type

Two iterators

take(iterable, n)[source]

Return iterator over last n elements of given iterable.

>>> list(take(range(10), 3))
[0, 1, 2]

See: https://docs.python.org/2/library/itertools.html#itertools.islice

Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • n (int) – Number of elements to take

Returns

Iterator over last n elements

Return type

iterator

unique(iterable, key=None)[source]

Return only unique elements in iterable. Potentially high mem. consumption!

>>> list(unique([2,3,1,1,2,4]))
[2, 3, 1, 4]
>>> ''.join(unique('this is a test'))
'this ae'
>>> data = [(1,'a'), (2,'a'), (3,'b')]
>>> list(unique(data, key=lambda t: t[1]))
[(1, 'a'), (3, 'b')]
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • key – Function used to compare for equality.

Returns

Iterator over unique elements.

Return type

Iterator

nutsflow.processor module

Append(iterable, items)[source]

iterable >> Append(items)

Append item(s) to lists/tuples in iterable.

>>> [(1, 2), (3, 4)] >> Append('X') >> Collect()
[(1, 2, 'X'), (3, 4, 'X')]
>>> items = ['a', 'b']
>>> [(1, 2), (3, 4)] >> Append(items) >> Collect()
[(1, 2, 'a'), (3, 4, 'b')]
>>> items = [('a', 'b'), ('c', 'd')]
>>> [(1, 2), (3, 4)] >> Append(items) >> Collect()
[(1, 2, 'a', 'b'), (3, 4, 'c', 'd')]
>>> from nutsflow import Enumerate
>>> [(1, 2), (3, 4)] >> Append(Enumerate()) >> Collect()
[(1, 2, 0), (3, 4, 1)]
Parameters
  • iterable iterable (iterable) – Any iterable over tuples or lists

  • items (iterable|object) – A single object or an iterable over objects.

Returns

iterator where items are appended to the iterable elements.

Return type

iterator over tuples

class Cache(cachepath=None, clearcache=True, pick=1)[source]

Bases: nutsflow.base.Nut

A very naive implementation of a disk cache. Pickles elements of iterable to file system and loads them the next time instead of recomputing.

__init__(cachepath=None, clearcache=True, pick=1)[source]

iterable >> Cache()

Cache elements of iterable to disk. Only worth it if elements of iterable are time-consuming to produce and can be loaded faster from disk.

The pick parameter allows to efficiently retrieve a subset of elements from the cache, e.g. every second element (pick=2) or a random subset, e.g. 30% (pick=0.3). Note that the cache is completely filled with the iterable but only subset is retrieved. This is more efficient than iterable >> Cache() >> Pick().

with Cache() as cache:
    data = range(100)
    for i in range(10):
        data >> expensive_op >> cache >> process(i) >> Consume()
cache = Cache()
for _ in range(100)
    data >> expensive_op >> cache >> Collect()
cache.clear()
with Cache('path/to/mycache') as cache:
    for _ in range(100)
        data >> expensive_op >> cache >> Collect()
with Cache(pick=2) as cache:
    for _ in range(100)
        data >> expensive_op >> cache >> Collect()
Parameters
  • iterable (iterable) – Any iterable

  • cachepath (string) – Path to a folder that stores the cached objects. If the path does not exist it will be created. The path with all its contents will be deleted when the cache is deleted. For cachepath=None a temporary folder will be created. Path to this folder is available in cache.path.

  • clearcache (bool) – Clear left-over cache if it exists.

  • pick (int|float) – Return elements from the cache with probability pick if pick is float, otherwise return evvery pitck’th element (see Pick() nut for details).

Returns

Iterator over elements

Return type

iterator

__rrshift__(iterable)[source]

Return elements in iterable considering pick.

Parameters

iterable (iterable) – Any iterable

Returns

Generator over input iterable.

Return type

Generator

clear()[source]

Clear cache

Chunk(iterable, n, container=None)[source]

iterable >> Chunk(n, container=None)

Split iterable in chunks of size n, where each chunk is also an iterator if no container is provided. see also GroupBySorted(), ChunkWhen(), ChunkBy()

>>> from nutsflow import Range, Map, Print, Join, Consume, Collect
>>> Range(5) >> Chunk(2) >> Map(list) >> Print() >> Consume()
[0, 1]
[2, 3]
[4]

The code can be shortend by providing a container in Chunk():

>>> Range(5) >> Chunk(2, list) >> Print() >> Consume()
[0, 1]
[2, 3]
[4]
>>> Range(6) >> Chunk(3, Join('_')) >> Print() >> Consume()
0_1_2
3_4_5
>>> Range(6) >> Chunk(3, sum) >> Collect()
[3, 12]
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • n (int) – Chunk size

  • container (container) – Some container, e.g. list, set, dict that can be filled from an iterable

Returns

Chunked iterable

Return type

Iterator over iterators or containers

ChunkBy(iterable, func, container=None)[source]

iterable >> ChunkBy(func, container=None)

Chunk iterable and create chunk every time func changes its return value. see also GroupBySorted(), Chunk(), ChunkWhen()

>>> [1,1, 2, 3,3,3] >> ChunkBy(lambda x: x, tuple) >> Collect()
[(1, 1), (2,), (3, 3, 3)]
>>> [1,1, 2, 3,3,3] >> ChunkBy(lambda x: x < 3, tuple)  >> Collect()
[(1, 1, 2), (3, 3, 3)]
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • func (function) – Functions the iterable is chunked by

  • container (container) – Some container, e.g. list, set, dict that can be filled from an iterable

Returns

Chunked iterable

Return type

Iterator over iterators or containers

class ChunkWhen(func, container=None)[source]

Bases: nutsflow.base.Nut

__init__(func, container=None)[source]

iterable >> ChunkWhen(func, container=None)

Chunk iterable and create new chunk every time func returns True. see also GroupBySorted(), Chunk(), ChunkBy()

>>> from nutsflow import Map, Join, Collect
>>> func = lambda x: x == 1
>>> [1,2,1,3,1,4,5] >> ChunkWhen(func, tuple) >> Collect()
[(1, 2), (1, 3), (1, 4, 5)]
>>> func = lambda x: x == 1
>>> [1,2,1,3,1,4,5] >> ChunkWhen(func, sum) >> Collect()
[3, 4, 10]
>>> func = lambda x: x == '|'
>>> '0|12|345|6' >> ChunkWhen(func, Join()) >> Collect()
['0', '|12', '|345', '|6']
Parameters
  • func (function) – Boolean function that indicates chunks. New chunk is created if return value is True.

  • container (container) – Some container, e.g. list, set, dict that can be filled from an iterable

__rrshift__(iterable)[source]
Parameters

iterable iterable (any) – iterable to create chunks for.

Returns

Iterator over chunks, where each chunk is an iterator itself if no container is provided

Return type

iterator over iterators or containers

Clone(iterable, n)[source]

iterable >> Clone(n)

Clones elements in the iterable n times.

>>> from nutsflow import Range, Collect, Join
>>> Range(4) >> Clone(2) >> Collect()
[0, 0, 1, 1, 2, 2, 3, 3]
>>> 'abc' >> Clone(3) >> Join()
'aaabbbccc'
Parameters
  • iterable (iterable) – Any iterable

  • n – Number of clones

Returns

Generator over cloned elements in iterable

Return type

generator

Combine = <function combinations>

iterable >> Combine(r)

Return r length subsequences of elements from the input iterable. See https://docs.python.org/2/library/itertools.html#itertools.combinations

>>> 'ABC' >> Combine(2) >> Collect()
[('A', 'B'), ('A', 'C'), ('B', 'C')]
>>> [1, 2, 3, 4] >> Combine(3) >> Collect()
[(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]
Parameters
  • iterable (iterable) – Any iterable

  • r (int) – Length of combinations

Returns

Iterable over combinations

Return type

Iterator

Concat(iterable, *iterables)[source]

iterable >> Concat(*iterables)

Concatenate iterables.

>>> from nutsflow import Range, Collect
>>> Range(5) >> Concat('abc') >> Collect()
[0, 1, 2, 3, 4, 'a', 'b', 'c']
>>> '12' >> Concat('abcd', '+-') >> Collect()
['1', '2', 'a', 'b', 'c', 'd', '+', '-']
Parameters
  • iterable (iterable) – Any iterable

  • iterables (iterable) – Iterables to concatenate

Returns

Concatenated iterators

Return type

iterator

Cycle = <function cycle>

iterable >> Cycle()

Cycle through iterable indefinitely. Large memory consumption if iterable is large!

>>> [1, 2] >> Cycle() >> Take(5) >> Collect()
[1, 2, 1, 2, 1]
Parameters

iterable (iterable) – Any iterable, e.g. list, range, …

Returns

Cycled input iterable

Return type

Iterator

Dedupe(iterable, key=None)

iterable >> Dedupe([key])

Return only unique elements in iterable. Can have very high memory consumption if iterable is long and many elements are unique!

>>> [2,3,1,1,2,4] >> Dedupe() >> Collect()
[2, 3, 1, 4]
>>> data = [(1,'a'), (2,'a'), (3,'b')]
>>> data >> Dedupe(key=lambda (x,y): y) >> Collect()
[(1, 'a'), (3, 'b')]
>>> data >> Dedupe(_[1]) >> Collect()
[(1, 'a'), (3, 'b')]
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • key – Function used to compare for equality.

Returns

Iterator over unique elements.

Return type

Iterator

Drop(iterable, n)[source]

iterable >> Drop(n)

Drop first n elements in iterable.

>>> [1, 2, 3, 4] >> Drop(2) >> Collect()
[3, 4]
Parameters
  • iterable (iterable) – Any iterable

  • n (int) – Number of elements to drop

Returns

Iterator without dropped elements

Return type

iterator

DropWhile(iterable, func)[source]

iterable >> DropWhile(func)

Skip elements in iterable while predicate function is True.

>>> from nutsflow import _
>>> [0, 1, 2, 3, 0] >> DropWhile(_ < 2) >> Collect()
[2, 3, 0]
Parameters
  • iterable (iterable) – Any iterable

  • func (function) – Predicate function.

Returns

Iterable

Return type

Iterator

Filter = <function filter>

iterable >> Filter(func)

Filter elements from iterable based on predicate function. See https://docs.python.org/2/library/itertools.html#itertools.ifilter

>>> [0, 1, 2, 3] >> Filter(_ < 2) >> Collect()
[0, 1]
Parameters
  • iterable (iterable) – Any iterable

  • func (function) – Predicate function. Element is removed if False.

Returns

Filtered iterable

Return type

Iterator

FilterCol(iterable, columns, func)[source]

iterable >> FilterCol(columns, func)

Filter elements from iterable based on predicate function and specified column(s).

>>> is_even = lambda n: n % 2 == 0
>>> [(0, 'e'), (1, 'o'), (2, 'e')] >> FilterCol(0, is_even) >> Collect()
[(0, 'e'), (2, 'e')]
Parameters
  • iterable (iterable) – Any iterable

  • columns (int|tuple) – Column or columns to extract from each element before passing it on to the predicate function.

  • func (function) – Predicate function. Element is removed if False.

Returns

Filtered iterable

Return type

Iterator

FilterFalse = <function filterfalse>

iterable >> FilterFalse(func)

Filter elements from iterable based on predicate function. Same as Filter but elements are removed (not kept) if predicate function returns True. See https://docs.python.org/2/library/itertools.html#itertools.ifilterfalse

>>> [0, 1, 2, 3] >> FilterFalse(_ >= 2) >> Collect()
[0, 1]
Parameters
  • iterable (iterable) – Any iterable

  • func (function) – Predicate function. Element is removed if True.

Returns

Filtered iterable

Return type

Iterator

FlatMap(func, iterable)

iterable >> FlatMap(func)

Map function on iterable and flatten. Equivalent to iterable >> Map(func) >> Flatten()

>>> [[0], [1], [2]] >> FlatMap(_) >> Collect()
[0, 1, 2]
>>> [[0], [1], [2]] >> FlatMap(_ * 2) >> Collect()
[0, 0, 1, 1, 2, 2]
Parameters
  • iterable (iterable) – Any iterable.

  • func (function) – Mapping function.

Returns

Mapped and flattened iterable

Return type

Iterator

Flatten(iterable)[source]

iterable >> Flatten()

Flatten the iterables within the iterable and non-iterables are passed through. Only one level is flattened. Chain Flatten to flatten deeper structures.

>>> from nutsflow import Collect
>>> [(1, 2), (3, 4, 5), 6] >> Flatten() >> Collect()
[1, 2, 3, 4, 5, 6]
>>> [(1, (2)), (3, (4, 5)), 6] >> Flatten() >> Flatten() >> Collect()
[1, 2, 3, 4, 5, 6]
Parameters

iterable (iterable) – Any iterable.

Returns

Flattened iterable

Return type

Iterator

FlattenCol(iterable, cols)[source]

iterable >> FlattenCol(cols)

Flattens the specified columns of the tuples/iterables within the iterable. Only one level is flattened.

(1 3) (5 7) (2 4) (6 8) >> FlattenCol((0,1) >> (1 3) (2 4) (5 7) (6 8)

If a column contains a single element (instead of an iterable) it is wrapped into a repeater. This allows to flatten columns that are iterable together with non-iterable columns, e.g.

(1 3) (6 7) (2 ) ( 8) >> FlattenCols((0,1) >> (1 3) (2 3) (6 7) (6 8)

>>> from nutsflow import Collect
>>> data = [([1, 2], [3, 4]), ([5, 6], [7, 8])]
>>> data >> FlattenCol(0) >> Collect()
[(1,), (2,), (5,), (6,)]
>>> data >> FlattenCol((0, 1)) >> Collect()
[(1, 3), (2, 4), (5, 7), (6, 8)]
>>> data >> FlattenCol((1, 0)) >> Collect()
[(3, 1), (4, 2), (7, 5), (8, 6)]
>>> data >> FlattenCol((1, 1, 0)) >> Collect()
[(3, 3, 1), (4, 4, 2), (7, 7, 5), (8, 8, 6)]
>>> data = [([1, 2], 3), (6, [7, 8])]
>>> data >> FlattenCol((0, 1)) >> Collect()
[(1, 3), (2, 3), (6, 7), (6, 8)]
Parameters

iterable (iterable) – Any iterable.

Params int|tuple columns

Column index or indices

Returns

Flattened columns of iterable

Return type

generator

GroupBy(iterable, keycol=<function <lambda>>, nokey=False)[source]

iterable >> GroupBy(keycol=lambda x: x, nokey=False)

Group elements of iterable based on a column value of the element or the function value of keycol for the element. Note that elements of iterable do not need to be sorted. GroupBy will store all elements in memory! If the iterable is sorted use GroupBySorted() instead. see also Chunk(), ChunkWhen(), ChunkBy()

>>> from nutsflow import Sort
>>> [1, 2, 1, 1, 3] >> GroupBy() >> Sort()
[(1, [1, 1, 1]), (2, [2]), (3, [3])]
>>> [1, 2, 1, 1, 3] >> GroupBy(nokey=True) >> Sort()
[[1, 1, 1], [2], [3]]
>>> ['--', '+++', '**'] >> GroupBy(len) >> Sort()
[(2, ['--', '**']), (3, ['+++'])]
>>> ['a3', 'b2', 'c1'] >> GroupBy(1) >> Sort()
 [('1', ['c1']), ('2', ['b2']), ('3', ['a3'])]
>>> [(1,3), (2,2), (3,1)] >> GroupBy(1, nokey=True) >> Sort()
[[(1, 3)], [(2, 2)], [(3, 1)]]
Parameters
  • iterable (iterable) – Any iterable

  • keycol (int|function) – Column index or key function.

  • nokey (bool) – True: results will not contain keys for groups, only the groups themselves.

Returns

Iterator over groups.

Return type

iterator

GroupBySorted(iterable, keycol=<function <lambda>>, nokey=False)[source]

iterable >> GroupBySorted(prob, keycol=lambda x: x, nokey=False)

Group elements of iterable based on a column value of the element or the function value of key_or_col for the element. Iterable needs to be sorted according to keycol! See https://docs.python.org/2/library/itertools.html#itertools.groupby If iterable is not sorted use GroupBy but be aware that it stores all elements of the iterable in memory! see also Chunk(), ChunkWhen(), ChunkBy()

>>> from nutsflow import Collect, nut_sink
>>> @nut_sink
... def ViewResult(iterable):
...     return iterable >> Map(lambda t: (t[0], list(t[1]))) >> Collect()
>>> [1, 1, 1, 2, 3] >> GroupBySorted() >> ViewResult()
[(1, [1, 1, 1]), (2, [2]), (3, [3])]
>>> [1, 1, 1, 2, 3] >> GroupBySorted(nokey=True) >> Map(list) >> Collect()
[[1, 1, 1], [2], [3]]
>>> ['--', '**', '+++'] >> GroupBySorted(len) >> ViewResult()
[(2, ['--', '**']), (3, ['+++'])]
Parameters
  • iterable (iterable) – Any iterable

  • keycol (int|function) – Column index or key function.

  • nokey (bool) – True: results will not contain keys for groups, only the groups themselves.

Returns

Iterator over groups where values are iterators.

Return type

iterator

If(iterable, cond, if_nut, else_nut=<nutsflow.factory.nut_function.<locals>.Wrapper object>)[source]

iterable >> If(cond, if_nut, [,else_nut])

Depending on condition cond execute if_nut or else_nut. Useful for conditional flows.

>>> from nutsflow import Square, Collect
>>> [1, 2, 3] >> If(True, Square()) >> Collect()
[1, 4, 9]
>>> [1, 2, 3] >> If(False, Square(), Take(1)) >> Collect()
[1]
Parameters
  • iterable (iterable) – Any iterable

  • cond (bool) – Boolean conditional value.

  • if_nut (Nut) – Nut to be executed if cond == True

  • else_nut (Nut) – Nut to be executed if cond == False

Returns

Result of if_nut or else_nut

Return type

Any

Insert(iterable, index, items)[source]

iterable >> Insert(index, items)

Insert item(s) into lists/tuples in iterable.

>>> [(1, 2), (3, 4)] >> Insert(1, 'X') >> Collect()
[(1, 'X', 2), (3, 'X', 4)]
>>> items = ['a', 'b']
>>> [(1, 2), (3, 4)] >> Insert(2, items) >> Collect()
[(1, 2, 'a'), (3, 4, 'b')]
>>> items = [('a', 'b'), ('c', 'd')]
>>> [(1, 2), (3, 4)] >> Insert(1, items) >> Collect()
[(1, 'a', 'b', 2), (3, 'c', 'd', 4)]
>>> from nutsflow import Enumerate
>>> [(1, 2), (3, 4)] >> Insert(0, Enumerate()) >> Collect()
[(0, 1, 2), (1, 3, 4)]
Parameters
  • iterable iterable (iterable) – Any iterable over tuples or lists

  • index (int) – Index at which position items are inserted.

  • items (iterable|object) – A single object or an iterable over objects.

Returns

iterator where items are inserted into the iterable elements.

Return type

iterator over tuples

Interleave(iterable, *iterables)[source]

iterable >> Interleave(*iterables)

Interleave elements of iterable with elements of given iterables. Similar to iterable >> Zip(*iterables) >> Flatten() but longest iterable determines length of interleaved iterator.

>>> from nutsflow import Range, Collect
>>> Range(5) >> Interleave('abc') >> Collect()
[0, 'a', 1, 'b', 2, 'c', 3, 4]
>>> '12' >> Interleave('abcd', '+-') >> Collect()
['1', 'a', '+', '2', 'b', '-', 'c', 'd']
Parameters
  • iterable (iterable) – Any iterable

  • iterables (iterable) – Iterables to interleave

Returns

Iterator over interleaved elements.

Return type

iterator

Map = <function map>

iterable >> Map(func, *iterables)

Map function on iterable. See https://docs.python.org/2/library/itertools.html#itertools.imap

>>> [0, 1, 2] >> Map(_ * 2) >> Collect()
[0, 2, 4]
>>> ['ab', 'cde'] >> Map(len) >> Collect()
[2, 3]

>> [2, 3, 10] >> Map(pow, [5, 2, 3]) >> Collect() [32, 9, 1000]

Parameters
  • iterable (iterable) – Any iterable

  • iterables (iterables) – Any iterables.

  • func (function) – Mapping function.

Returns

Mapped iterable

Return type

Iterator

MapCol(iterable, columns, func)[source]

iterable >> MapCol(columns, func)

Apply given function to given columns of elements in iterable.

>>> neg = lambda x: -x
>>> [(1, 2), (3, 4)] >> MapCol(0, neg) >> Collect()
[(-1, 2), (-3, 4)]
>>> [(1, 2), (3, 4)] >> MapCol(1, neg) >> Collect()
[(1, -2), (3, -4)]
>>> [(1, 2), (3, 4)] >> MapCol((0, 1), neg) >> Collect()
[(-1, -2), (-3, -4)]
Parameters
  • of iterables iterable (iterable) – Any iterable that contains iterables

  • of ints columns (int|tuple) – Column index or tuple of indexes

  • func (function) – Function to apply to elements

Returns

Iterator over lists

Return type

iterator of list

MapMulti(iterable, *funcs)[source]

iterable >> MapMulti(*funcs)

Map multiple functions on iterable. For each function a separate iterable is returned. Can consume large amounts of memory when iterables are processed sequentially!

>>> from nutsflow import Collect, _
>>> nums, twos, greater2 = [1, 2, 3] >> MapMulti(_, _ * 2, _ > 2)
>>> nums >> Collect()
[1, 2, 3]
>>> twos >> Collect()
[2, 4, 6]
>>> greater2 >> Collect()
[False, False, True]
Parameters
  • iterable (iterable) – Any iterable

  • funcs (functions) – Functions to map

Returns

Iterators for each function

Return type

(iterator, ..)

class MapPar(func, chunksize=4)[source]

Bases: nutsflow.base.Nut

__init__(func, chunksize=4)[source]

iterable >> MapPar(func, chunksize=mp.cpu_count())

Map function in parallel. Order of iterable is preserved. Note that ParMap is of limited use since ‘func’ must be pickable and only top level functions (not class methods) are pickable. See https://docs.python.org/2/library/pickle.html

>>> from nutsflow import Collect
>>> [-1, -2, -3] >> MapPar(abs) >> Collect()
[1, 2, 3]
Parameters
  • iterable (iterable) – Any iterable

  • func (function) – Function to map

  • chunksize (int) – Number of parallel processes to use for mapping.

Returns

Iterator over mapped elements

Return type

iterator

__rrshift__(iterable)[source]

Chaining operator for Nuts. Needs to be overridden!

Takes an input iterable and produces some output iterable. If the number of elements in the input and the output iterable does not change consider NutFunction instead.

Parameters

iterable (iterable) – Iterable to process.

Returns

Iterable

Return type

iterable

Raise

NotImplementedError if not implemented.

Partition(iterable, pred)

partition1, partition2 = iterable >> Partition(func)

Split iterable into two partitions based on predicate function

>>> smaller, larger = Range(5) >> Partition(_ < 3)
>>> smaller >> Collect()
[0, 1, 2]
>>> larger >> Collect()
[3, 4]
Parameters
  • iterable – Any iterable, e.g. list, range, …

  • pred – Predicate function.

Returns

Partition iterators

Return type

Two iterators

Permutate = <function permutations>

iterable >> Permutate([,r])

Return successive r length permutations of elements in the iterable. See https://docs.python.org/2/library/itertools.html#itertools.permutations

>>> 'ABC' >> Permutate(2) >> Collect()
[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
Parameters
  • iterable (iterable) – Any iterable

  • r (int) – Permutation of length r are generated. If r is not specified or is None, then r defaults to the length of the iterable and all possible full-length permutations are generated.

Returns

Iterable over permutations

Return type

Iterator

Pick(iterable, p_n, rand=None)[source]

iterable >> Pick(p_n)

Pick every p_n-th element from the iterable if p_n is an integer, otherwise pick randomly with probability p_n.

>>> from nutsflow import Range, Collect
>>> from nutsflow.common import StableRandom
>>> [1, 2, 3, 4] >> Pick(0.0) >> Collect()
[]
>>> [1, 2, 3, 4] >> Pick(1.0) >> Collect()
[1, 2, 3, 4]
>>> import random as rnd
>>> Range(10) >> Pick(0.5, StableRandom(1)) >> Collect()
[0, 4, 5, 6, 8, 9]
>>> [1, 2, 3, 4] >> Pick(2) >> Collect()
[1, 3]
Parameters
  • iterable (iterable) – Any iterable

  • p_n (float|int) – Probability p in [0, 1] or integer n for every n-th element

  • rand (Random|None) – Random number generator. If None, random.Random() is used.

Returns

Iterator over picked elements.

Return type

iterator

Prefetch(iterable, num_prefetch=1)[source]

iterable >> Prefetch(num_prefetch=1)

Prefetch elements from iterable. Typically used to keep the CPU busy while the GPU is crunching.

>>> from nutsflow import Take, Consume
>>> it = iter([1, 2, 3, 4])
>>> it >> Prefetch(1) >> Take(1) >> Consume()
>>> next(it)   
3
Parameters
  • iterable (iterable) – Any iterable

  • num_prefetch (int) – Number of elements to prefetch.

Returns

Iterator over input elements

Return type

iterator

class PrintProgress(data, title='progress:', every_sec=10.0)[source]

Bases: nutsflow.base.Nut

__init__(data, title='progress:', every_sec=10.0)[source]

iterable >> PrintProgress(data, every_sec=10.0)

Print progress on iterable. Requires that length of iterable is known beforehand. Data are just passed through. For long running computations and Estimated time of arrival (eta) is printed as well

range(10) >> PrintProgress(10, ‘numbers:’, 0) >> Consume()

Parameters
  • iterable (iterable) – Any iterable

  • data (int) – Number of elements in iterable or realized iterable. If data is provided it must not be an iterator since it will be consumed!

  • title (str) – Title of progress print out (prefix text)

  • every_sec (float) – Progress is printed every ‘every_sec’ seconds.

Returns

Iterator over input elements

Return type

iterator

__rrshift__(iterable)[source]

Chaining operator for Nuts. Needs to be overridden!

Takes an input iterable and produces some output iterable. If the number of elements in the input and the output iterable does not change consider NutFunction instead.

Parameters

iterable (iterable) – Iterable to process.

Returns

Iterable

Return type

iterable

Raise

NotImplementedError if not implemented.

Shuffle(iterable, buffersize, rand=None)[source]

iterable >> Shuffle(buffersize)

Perform (partial) random shuffle of the elements in the iterable. Elements of the iterable are stored in a buffer of the given size and shuffled within. If buffersize is smaller than the length of the iterable the shuffle is therefore partial in the sense that the ‘window’ of the shuffle is limited to buffersize. Note that for buffersize = 1 no shuffling occurs.

In the following example rand = StableRandom(0) is used to create a fixed sequence that stable across Python version 2.x and 3.x. Usually, this is not what you want. Use the default rand=None which uses random.Random() instead.

>>> from nutsflow import Range, Collect
>>> from nutsflow.common import StableRandom
>>> Range(10) >> Shuffle(5, StableRandom(0)) >> Collect()
[4, 2, 3, 6, 7, 0, 1, 9, 5, 8]
>>> Range(10) >> Shuffle(1, StableRandom(0)) >> Collect()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Parameters
  • iterable (iterable) – Any iterable

  • buffersize (int) – Number of elements stored in shuffle buffer.

  • rand (Random|None) – Random number generator. If None, random.Random() is used.

Returns

Generator over shuffled elements

Return type

generator

Slice(iterable, start=None, *args, **kwargs)[source]

iterable >> Slice([start,] stop[, stride])

Return slice of elements from iterable. See https://docs.python.org/2/library/itertools.html#itertools.islice

>>> from nutsflow import Collect
>>> [1, 2, 3, 4] >> Slice(2) >> Collect()
[1, 2]
>>> [1, 2, 3, 4] >> Slice(1, 3) >> Collect()
[2, 3]
>>> [1, 2, 3, 4] >> Slice(0, 4, 2) >> Collect()
[1, 3]
Parameters
  • iterable (iterable) – Any iterable

  • start (int) – Start index of slice.

  • stop (int) – End index of slice.

  • step (int) – Step size of slice.

Returns

Elements sliced from iterable

Return type

iterator

Take(iterable, n)[source]

iterable >> Take(n)

Return first n elements of iterable

>>> from nutsflow import Collect
>>> [1, 2, 3, 4] >> Take(2) >> Collect()
[1, 2]
Parameters
  • iterable (iterable) – Any iterable

  • n (int) – Number of elements to take

Returns

First n elements of iterable

Return type

iterator

TakeWhile = <function takewhile>

iterable >> TakeWhile(func)

Take elements from iterable while predicte function is True. See https://docs.python.org/2/library/itertools.html#itertools.takewhile

>>> [0, 1, 2, 3, 0] >> TakeWhile(_ < 2) >> Collect()
[0, 1]
Parameters
  • iterable (iterable) – Any iterable

  • func (function) – Predicate function.

Returns

Iterable

Return type

Iterator

Tee(iterable, n=2, /)

iterable >> Tee([n=2])

Return n independent iterators from a single iterable. Can consume large amounts of memory if iterable is large and tee’s are not processed in parallel. See https://docs.python.org/2/library/itertools.html#itertools.tee

>>> it1, it2  = [1, 2, 3] >> Tee(2)
>>> it1 >> Collect()
[1, 2, 3]
>>> it2 >> Collect()
[1, 2, 3]
Parameters
  • iterable (iterable) – Any iterable

  • n (int) – Number of iterators to return.

Returns

n iterators

Return type

(Iterator, ..)

Try(iterable, func, default='STDERR')[source]

iterable >> Try(nut)

Exception handling for (nut) functions. If the wrapped nut or function raises an exception it is caught and handled with the provided handler. Per default the exception and the value causing it are printed. Furthermore a default value can be specified that is returned instead of the nut output if an exception occurs. Per default no output is returned but an error message printed (STDERR).

NOTE: In the following examples ‘STDOUT’ is used only to verify the error message within the doctest. In production code use the default value of ‘STDERR’.

>>> from nutsflow import Try, Collect, nut_function  
>>> [10, 2, 1] >> Try(lambda x : 10//x) >> Collect()
[1, 5, 10]
>>> [10, 0, 1] >> Try(lambda x : 10//x, 'STDOUT') >> Collect()
ERROR: 0 : integer division or modulo by zero
[1, 10]
>>> Div = nut_function(lambda x : 10//x)
>>> [10, 2, 1] >> Try(Div()) >> Collect()
[1, 5, 10]
>>> [10, 0, 1] >> Try(Div(), 'STDOUT') >> Collect()
ERROR: 0 : integer division or modulo by zero
[1, 10]
>>> [10, 0, 1] >> Try(Div(), -1) >> Collect()
[1, -1, 10]
>>> handlezero = lambda x, e: 'FAILED: '+str(x)
>>> [10, 0, 1] >> Try(Div(), handlezero) >> Collect()
[1, 'FAILED: 0', 10]
>>> handlezero = lambda x, e: str(e)
>>> [10, 0, 1] >> Try(Div(), handlezero) >> Collect()
[1, 'integer division or modulo by zero', 10]
Parameters
  • iterable (iterable) – Iterable the nut operates on.

  • func (function|NutFunction) – (Nut) function that is wrapped for exception handling. Can be a plain Python function/method as well.

  • default (Object) – Return value if exception occurs. If default = ‘IGNORE’, no value is returned and no error is printed. If default = ‘STDERR’, no value is returned, error is printed to stderr. If default = ‘STDOUT’, no value is returned, error is printed to stdout. If default is function that takes element x and exception e as parameters its result is returned and no error is printed. Otherwise the default value is returned and no error is printed.

Returns

Iterator over input elements transformed by provided nut.

Return type

iterator

Window(iterable, n=2)[source]

iterable >> Window(n)

Sliding window of size n over elements in iterable.

>>> [1, 2, 3, 4] >> Window() >> Collect()
[(1, 2), (2, 3), (3, 4)]
>>> [1, 2, 3, 4] >> Window(3) >> Collect()
[(1, 2, 3), (2, 3, 4)]
>>> 'test' >> Window(2) >> Map(''.join) >> Collect()
['te', 'es', 'st']
Parameters
  • iterable (iterable) – Any iterable

  • n (int) – Size of window

Returns

iterator with tuples of length n

Return type

iterator over tuples

Zip(iterable, iterable2=None, *iterables)[source]

iterable >> Zip(*iterables)

Zip elements of iterable with elements of given iterables. Zip finishes when shortest iterable is exhausted. See https://docs.python.org/2/library/itertools.html#itertools.izip And https://docs.python.org/2/library/itertools.html#itertools.izip_longest

>>> from nutsflow import Collect
>>> [0, 1, 2] >> Zip('abc') >> Collect()
[(0, 'a'), (1, 'b'), (2, 'c')]
>>> '12' >> Zip('abcd', '+-') >> Collect()
[('1', 'a', '+'), ('2', 'b', '-')]
Parameters
  • iterable (iterable) – Any iterable

  • iterables (iterable) – Iterables to zip

Returns

Zipped elements from iterables.

Return type

iterator over tuples

ZipWith(iterable, f, *iterables)[source]

iterable >> ZipWith(f, *iterables)

Zips the given iterables, unpacks them and applies the given function.

>>> add = lambda a, b: a + b
>>> [1, 2, 3] >> ZipWith(add, [2, 3, 4]) >> Collect()
[3, 5, 7]
Parameters
  • iterable (iterable) – Any iterable

  • iterables (iterable) – Any iterables

  • f (function) – Function to apply to zipped input iterables

Returns

iterator of result of f() applied to zipped iterables

Return type

iterator

nutsflow.sink module

ArgMax(iterable, key=None, default=None, retvalue=False)[source]

iterable >> ArgMax(key=None, default=None, retvalue=False)

Return index of first maximum element (and maximum) in input (transformed or extracted by key function).

>>> [1, 2, 0, 2] >> ArgMax()
1
>>> ['12', '1', '123'] >> ArgMax(key=len, retvalue=True)
(2, '123')
>>> ['12', '1', '123'] >> ArgMax(key=len)
2
>>> [] >> ArgMax(default=0)
0
>>> [] >> ArgMax(default=(None, 0), retvalue=True)
(None, 0)
>>> data = [(3, 10), (2, 20), (1, 30)]
>>> data >> ArgMax(key=0)
0
>>> data >> ArgMax(1)
2
Parameters
  • iterable (iterable) – Iterable over numbers

  • key (int|tuple|function|None) – Key function to extract or transform elements. None = identity function.

  • default (object) – Value returned if iterable is empty.

  • retvalue (bool) – If True the index and the value of the maximum element is returned.

Returns

index of largest element according to key function and the largest element itself if retvalue==True

Return type

object | tuple

ArgMin(iterable, key=None, default=None, retvalue=False)[source]

iterable >> ArgMin(key=None, default=None, retvalue=True)

Return index of first minimum element (and minimum) in input (transformed or extracted by key function).

>>> [1, 2, 0, 2] >> ArgMin()
2
>>> ['12', '1', '123'] >> ArgMin(key=len, retvalue=True)
(1, '1')
>>> ['12', '1', '123'] >> ArgMin(key=len)
1
>>> [] >> ArgMin(default=0)
0
>>> [] >> ArgMin(default=(None, 0), retvalue=True)
(None, 0)
>>> data = [(3, 10), (2, 20), (1, 30)]
>>> data >> ArgMin(key=0)
2
>>> data >> ArgMin(1)
0
Parameters
  • iterable (iterable) – Iterable over numbers

  • key (int|tuple|function|None) – Key function to extract or transform elements. None = identity function.

  • default (object) – Value returned if iterable is empty.

  • retvalue (bool) – If True the index and the value of the minimum element is returned.

Returns

index of smallest element according to key function and the smallest element itself if retvalue==True.

Return type

object | tuple

Collect(iterable, container=<class 'list'>)[source]

iterable >> Collect(container)

Collects all elements of the iterable input in the given container.

>>> range(5) >> Collect()
[0, 1, 2, 3, 4]
>>> [1, 2, 3, 2] >> Collect(set)  
{1, 2, 3}
>>> [('one', 1), ('two', 2)] >> Collect(dict)  
{'one': 1, 'two': 2}
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • container (container) – Some container, e.g. list, set, dict that can be filled from an iterable

Returns

Container

Return type

container

Consume(iterable, n=None)

iterable >> Consume(n=None)

Consume n elements of the iterable.

>>> [1,2,3] >> Print() >> Consume()   # Without Consume nothing happens!
1
2
3
>>> [1,2,3] >> Print() >> Consume(2)
1
2
Parameters
  • iterable (iterable) – Iterable

  • n (int) – Number of elements to consume. n = None means the whole iterable is consumed.

Count(iterable)

iterable >> Count()

Return number elements in input iterable. This consumes the iterable!

>>> [0, 1, 2] >> Count()
3
Parameters

iterable (iterable) – Any iterable

Returns

Number elements in interable

Return type

int

CountValues(iterable, column=None, relative=False)[source]

iterable >> CountValues(relative=False)

Return dictionary with (relative) counts of the values in the input iterable.

>>> 'abaacc' >> CountValues()  
{'a': 3, 'b': 1, 'c': 2}
>>> 'aabaab' >> CountValues(relative=True)  
{'a': 1.0, 'b': 0.5}
>>> data = [('a', 'X'), ('b', 'Y'), ('a', 'Y')]
>>> data >> CountValues(column=0)  
{'a': 2, 'b': 1}
>>> data >> CountValues(column=1)  
{'Y': 2, 'X': 1}
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • column (int|None) – Column of values in iterable to extract values from. If colum=None the values in the iterable themselves will be counted.

  • relative (bool) – True: return relative counts otherwise absolute counts

Returns

Dictionary with (relative) counts for elements in iterable.

Return type

dict

Head(iterable, n, container=<class 'list'>)[source]

iterable >> Head(n, container=list)

Collect first n elements of iterable in specified container.

>>> [1, 2, 3, 4] >> Head(2)
[1, 2]
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • n (int) – Number of elements to take.

  • container (container) – Container to collect elements in, e.g. list, set

Returns

Container with head elements

Return type

container

Join(iterable, separator='')[source]

iterable >> Join(separator=’’)

Same as Python’s sep.join(iterable). Concatenates the elements in the iterable to a string using the given separator. In addition to Python’s sep.join(iterable) it also automatically converts elements to strings.

Parameters
  • iterable (iterable) – Any iterable

  • separator (string) – Seperator string between elements.

Returns

String of with concatenated elements of iterable.

Return type

str

Max(iterable, key=None, default=None)[source]

iterable >> Max(key=None, default=None)

Return maximum of inputs (transformed or extracted by key function).

>>> [1, 2, 3, 2] >> Max()
3
>>> ['1', '123', '12'] >> Max(key=len)
'123'
>>> [] >> Max(default=0)
0
>>> data = [(3, 10), (2, 20), (1, 30)]
>>> data >> Max(key=0)
(3, 10)
>>> data >> Max(1)
(1, 30)
Parameters
  • iterable (iterable) – Iterable over numbers

  • key (int|tuple|function|None) – Key function to extract or transform elements. None = identity function.

  • default (object) – Value returned if iterable is empty.

Returns

largest element according to key function

Return type

object

Mean(iterable, key=None, default=None)[source]

iterable >> Mean(key=None, default=None)

Return mean value of inputs (transformed or extracted by key function).

>>> [1, 2, 3] >> Mean()
2.0
>>> [] >> Mean(default=0)
0
>>> data = [(1, 10), (2, 20), (3, 30)]
>>> data >> Mean(key=0)
2.0
>>> data >> Mean(key=1)
20.0
Parameters
  • iterable (iterable) – Iterable over numbers

  • default (object) – Value returned if iterable is empty.

  • key (int|tuple|function|None) – Key function to extract elements.

Returns

Mean of numbers or default value

Return type

number

MeanStd(iterable, key=None, default=None, ddof=1)[source]

iterable >> MeanStd(key=None, default=None, ddof=1)

Return mean and standard deviation of inputs (transformed or extracted by key function). Standard deviation is with degrees of freedom = 1

>>> [1, 2, 3] >> MeanStd()
(2.0, 1.0)
>>> data = [(1, 10), (2, 20), (3, 30)]
>>> data >> MeanStd(key=0)
(2.0, 1.0)
>>> data >> MeanStd(1)
(20.0, 10.0)
Parameters
  • iterable (iterable) – Iterable over numbers

  • default (object) – Value returned if iterable is empty.

  • key (int|tuple|function|None) – Key function to extract elements.

  • ddof (int) – Delta degrees of freedom (should 0 or 1)

Returns

Mean and standard deviation of numbers or default value

Return type

tuple (mean, std)

Min(iterable, key=None, default=None)[source]

iterable >> Min(key=None, default=None)

Return minimum of inputs (transformed or extracted by key function).

>>> [1, 2, 3, 2] >> Min()
1
>>> ['1', '123', '12'] >> Min(key=len)
'1'
>>> [] >> Min(default=0)
0
>>> data = [(3, 10), (2, 20), (1, 30)]
>>> data >> Min(key=0)
(1, 30)
>>> data >> Min(1)
(3, 10)
Parameters
  • iterable (iterable) – Iterable over numbers

  • key (int|tuple|function|None) – Key function to extract or transform elements. None = identity function.

  • default (object) – Value returned if iterable is empty.

Returns

smallest element according to key function

Return type

object

Next()

iterable >> Next()

Return next element of iterable.

>>> [1,2,3] >> Next()
1
Parameters

iterable (iterable) – Any iterable

Returns

next element

Return type

any

Nth(iterable, n, default=None)

iterable >> Nth(nth)

Return n-th element of iterable. This consumes the iterable!

>>> 'test' >> Nth(2)
s
Parameters
  • iterable (iterable) – Any iterable

  • nth (int) – Index of element in iterable to return

Returns

n-th element

Return type

any

Reduce()

iterable >> Reduce(func [,initiaizer])

Reduces the iterable using the given function. See https://docs.python.org/2/library/functions.html#reduce

>>> [1, 2, 3] >> Reduce(lambda a,b: a+b)
6
>>> [2] >> Reduce(lambda a,b: a*b, 1)
2
Parameters
  • iterable (iterable) – Any iterable

  • func (function) – Reduction function

Returns

Result of reduction

Return type

any

Sort(iterable, key=None, reverse=False)[source]

iterable >> Sort(key=None, reverse=False)

Sorts iterable with respect to key function or column index(es).

>>> [3, 1, 2] >> Sort()
[1, 2, 3]
>>> [3, 1, 2] >> Sort(reverse=True)
[3, 2, 1]
>>> [(1,'c'), (2,'b'), (3,'a')] >> Sort(1)
[(3, 'a'), (2, 'b'), (1, 'c')]
>>> ['a3', 'c1', 'b2'] >> Sort(key=lambda s: s[0])
['a3', 'b2', 'c1']
>>> ['a3', 'c1', 'b2'] >> Sort(key=0)
['a3', 'b2', 'c1']
>>> ['a3', 'c1', 'b2'] >> Sort(1)
['c1', 'b2', 'a3']
>>> ['a3', 'c1', 'b2'] >> Sort((1,0))
['c1', 'b2', 'a3']
Parameters
  • iterable (iterable) – Iterable

  • key (int|tuple|function|None) – function to sort based on or column index(es) tuples/vectors/strings are sorted by.

  • reverse (boolean) – True: reverse order.

Returns

Sorted iterable

Return type

list

Sum(iterable, key=None)[source]

iterable >> Sum(key=None)

Return sum over inputs (transformed or extracted by key function)

>>> [1, 2, 3] >> Sum()
6
>>> [1, 2, 3] >> Sum(lambda x: x*x)
14
>>> data = [(1, 10), (2, 20), (3, 30)]
>>> data >> Sum(key=0)
6
>>> data >> Sum(key=1)
60
Parameters
  • iterable (iterable) – Iterable over numbers

  • key (int|tuple|function|None) – Key function to extract elements.

Returns

Sum of numbers

Return type

number

Tail(iterable, n, container=<class 'list'>)[source]

iterable >> Tail(n, container=list)

Collect last n elements of iterable in specified container. This consumes the iterable completely!

>>> [1, 2, 3, 4] >> Tail(2)
[3, 4]
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • n (int) – Number of elements to take.

  • container (container) – Container to collect elements in, e.g. list, set

Returns

Container with tail elements

Return type

container

Unzip(iterable, container=None)[source]

iterable >> Unzip(container=None)

Same as izip(*iterable) but returns iterators for container=None

>>> [(1, 2, 3), (4, 5, 6)] >> Unzip(tuple) >> Collect()
[(1, 4), (2, 5), (3, 6)]
Parameters
  • iterable (iterable) – Any iterable, e.g. list, range, …

  • container (container) – If not none, unzipped results are collected in the provided container, eg. list, tuple, set

Returns

Unzip iterable.

Return type

iterator over iterators

class WriteCSV(filepath, cols=None, skipheader=0, flush=False, encoding=None, fmtfunc=<function WriteCSV.<lambda>>, **kwargs)[source]

Bases: nutsflow.base.NutSink

Write data to a CSV file using Python’s CSV writer. See: https://docs.python.org/2/library/csv.html

__init__(filepath, cols=None, skipheader=0, flush=False, encoding=None, fmtfunc=<function WriteCSV.<lambda>>, **kwargs)[source]

WriteCSV(filepath, cols, skipheader, flush, fmtfunc, **kwargs)

Write data in Comma Separated Values format (CSV) and other formats to file. Tab Separated Values (TSV) files can be written by specifying a different delimiter. Note that in the docstring below delimiter is ‘t’ but in code it should be ‘ ‘. See unit tests.

Also see https://docs.python.org/2/library/csv.html and ReadCSV.

>>> import os
>>> filepath = 'tests/data/temp_out.csv'
>>> with WriteCSV(filepath) as writer:
...     range(10) >> writer
>>> os.remove(filepath)
>>> with WriteCSV(filepath, cols=(1,0)) as writer:
...     [(1,2), (3,4)] >> writer
>>> os.remove(filepath)
>>> filepath = 'tests/data/temp_out.tsv'
>>> with WriteCSV(filepath, delimiter='\t') as writer:
...     [[1,2], [3,4]] >> writer
>>> os.remove(filepath)
Parameters
  • filepath (string) – Path to file in CSV format.

  • cols (tuple) – Indices of the columns to write. If None all columns are written.

  • skipheader (int) – Number of header rows to skip.

  • flush (bool) – If True flush after every line written.

  • encoding (str) – Character encoding, e.g. “utf-8” Ignored for Python 2.x!

  • fmtfunc (function) – Function to apply to the elements of each row.

  • kwargs (kwargs) – Keyword arguments for Python’s CSV writer. See https://docs.python.org/2/library/csv.html

__rrshift__(iterable)[source]

Write elements of iterable to file

close()[source]

Close writer

nutsflow.source module

Empty()[source]

Return empty iterable.

>>> from nutsflow import Collect
>>> Empty() >> Collect()
[]
Returns

Empty iterator

Return type

iterator

Enumerate(start=0[, step])[source]

Return increasing integers. See itertools.count

>>> from nutsflow import Take, Collect
>>> Enumerate() >> Take(3) >> Collect()
[0, 1, 2]
>>> Enumerate(1, 2) >> Take(3) >> Collect()
[1, 3, 5]
Parameters
  • start (int) – Start of integer sequence

  • step (int) – Step of sequence

Returns

Increasing integers.

Return type

iterable over int

Product(*iterables[, repeat])[source]

Return cartesian product of input iterables.

>>> from nutsflow import Collect
>>> Product([1, 2], [3, 4]) >> Collect()
[(1, 3), (1, 4), (2, 3), (2, 4)]
>>> Product('ab', range(3)) >> Collect()
[('a', 0), ('a', 1), ('a', 2), ('b', 0), ('b', 1), ('b', 2)]
>>> Product([1, 2, 3], repeat=2) >> Collect()
[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
Parameters
  • iterables (iterables) – Collections of iterables to create cartesian product from.

  • repeat (int) – Repeat a single iterable ‘repeat’ times, e.g. Procuct([1,2], [1,2]) is equal to Product([1,2], repeat=2)

Returns

cartesian product

Return type

iterator over tuples

class Range(*args, **kwargs)[source]

Bases: nutsflow.base.NutSource

Range of numbers. Similar to range() but returns iterator that depletes.

__init__(*args, **kwargs)[source]

Range(start [,end [, step]])

Return range of integers.

>>> from nutsflow import Collect
>>> Range(4) >> Collect()
[0, 1, 2, 3]
>>> Range(1, 5) >> Collect()
[1, 2, 3, 4]
Parameters
  • start (int) – Start of range.

  • end (int) – End of range. Not inclusive. Optional.

  • step (int) – Step size. Optional.

Returns

Range of integers.

Return type

iterable over int

class ReadCSV(filepath, columns=None, skipheader=0, fmtfunc=None, **kwargs)[source]

Bases: nutsflow.base.NutSource

Read data from a CSV file using Python’s CSV reader. See: https://docs.python.org/2/library/csv.html

__init__(filepath, columns=None, skipheader=0, fmtfunc=None, **kwargs)[source]

ReadCSV(filepath, columns, skipheader, fmtfunc, **kwargs)

Read data in Comma Separated Format (CSV) from file. See also CSVWriter. Can also read Tab Separated Format (TSV) be providing the corresponding delimiter. Note that in the docstring below delimiter is ‘t’ but in code it should be ‘ ‘.

>>> from nutsflow import Collect
>>> filepath = 'tests/data/data.csv'
>>> with ReadCSV(filepath, skipheader=1) as reader:
...     reader >> Collect()
[('1', '2', '3'), ('4', '5', '6')]
>>> with ReadCSV(filepath, skipheader=1, fmtfunc=int) as reader:
...     reader >> Collect()
[(1, 2, 3), (4, 5, 6)]
>>> fmtfuncs=(int, str, float)
>>> with ReadCSV(filepath, skipheader=1, fmtfunc=fmtfuncs) as reader:
...     reader >> Collect()
[(1, '2', 3.0), (4, '5', 6.0)]
>>> with ReadCSV(filepath, (2, 1), 1, int) as reader:
...     reader >> Collect()
[(3, 2), (6, 5)]
>>> with ReadCSV(filepath, (2, 1), 1, (str,int)) as reader:
...     reader >> Collect()
[('3', 2), ('6', 5)]
>>> with ReadCSV(filepath, 2, 1, int) as reader:
...     reader >> Collect()
[3, 6]
>>> filepath = 'tests/data/data.tsv'
>>> with ReadCSV(filepath, skipheader=1, fmtfunc=int,
...                delimiter='\t') as reader:
...     reader >> Collect()
[(1, 2, 3), (4, 5, 6)]
Parameters
  • filepath (string) – Path to file in CSV format.

  • columns (tuple) – Indices of the columns to read. If None all columns are read.

  • skipheader (int) – Number of header lines to skip.

  • fmtfunc (tuple|function) – Function or functions to apply to the column elements of each row.

  • kwargs (kwargs) – Keyword arguments for Python’s CSV reader. See https://docs.python.org/2/library/csv.html

close()[source]

Close reader

class ReadNamedCSV(filepath, colnames, fmtfunc, rowname, **kwargs)[source]

Bases: nutsflow.base.NutSource

Read data in Comma Separated Format (CSV) from a CSV file with header names and returns named tuples. Can also read Tab Separated Format (TSV) and other formats. See ReadCSV and CSVWriter.

>>> from nutsflow import Collect, Consume, Print
>>> filepath = 'tests/data/data.csv'
>>> with ReadNamedCSV(filepath) as reader:
...     reader >> Print() >> Consume()
Row(A='1', B='2', C='3')
Row(A='4', B='5', C='6')
>>> with ReadNamedCSV(filepath, rowname='Sample') as reader:
...     reader >> Print() >> Consume()
Sample(A='1', B='2', C='3')
Sample(A='4', B='5', C='6')
>>> with ReadNamedCSV(filepath, fmtfunc=int) as reader:
...     reader >> Collect()
[Row(A=1, B=2, C=3), Row(A=4, B=5, C=6)]
>>> fmtfuncs = (int, str, float)
>>> with ReadNamedCSV(filepath, fmtfunc=fmtfuncs) as reader:
...     reader >> Print() >> Consume()
Row(A=1, B='2', C=3.0)
Row(A=4, B='5', C=6.0)
>>> with ReadNamedCSV(filepath, colnames=('C', 'A'), fmtfunc=int) as reader:
...     reader >> Collect()
[Row(C=3, A=1), Row(C=6, A=4)]
>>> with ReadNamedCSV(filepath, ('A', 'C'), int, 'Sample') as reader:
...     reader >> Print() >> Consume()
Sample(A=1, C=3)
Sample(A=4, C=6)
Parameters
  • filepath (string) – Path to file in CSV format.

  • colnames (tuple) – Names of columns to read. If None all columns are read.

  • fmtfunc (tuple|function) – Function or functions to apply to the column elements of each row.

  • rowname (str) – Name of named tuples.

  • kwargs (kwargs) – Keyword arguments for Python’s CSV reader. See https://docs.python.org/2/library/csv.html

__init__(filepath, colnames=None, fmtfunc=None, rowname='Row', **kwargs)[source]

Constructor. Nuts (and derived classes) can have arbitrary arguments.

Parameters
  • args (args) – Positional arguments.

  • kwargs (kwargs) – Keyword arguments.

close()[source]

Close reader

Repeat(obj)[source]

Return given obj indefinitely.

>>> from nutsflow import Head, Collect
>>> Repeat(1) >> Head(3)
[1, 1, 1]
>>> from nutsflow.common import StableRandom
>>> rand = StableRandom(0)
>>> Repeat(rand.random) >> Head(3)
[0.5488135024320365, 0.5928446165269344, 0.715189365138111]
>>> rand = StableRandom(0)
>>> Repeat(rand.randint, 1, 6) >> Head(10)
[4, 4, 5, 6, 4, 6, 4, 6, 3, 4]
Parameters
  • obj (object|func) – Object/value to repeat. Obj can be function that is repeatedly called.

  • args (args) – Arguments passed on to obj if obj is callable

  • kwargs (kwargs) – Keyword args passed on to obj if obj is callable

Returns

Iterator of repeated objects

Return type

iterable over object

nutsflow.underscore module

Module contents