nutsml package

Submodules

nutsml.batcher module

class BuildBatch(batchsize, prefetch=1, verbose=False)[source]

Bases: nutsflow.base.Nut

Build batches for GPU-based neural network training.

__init__(batchsize, prefetch=1, verbose=False)[source]

iterable >> BuildBatch(batchsize, prefetch=1, verbose=False)

Take samples in iterable, extract specified columns, convert column data to numpy arrays of various types, aggregate converted samples into a batch.

The following example use the verbose flag to print the shape of the batches constructed. This is useful for development and debugging but should be disabled (verbose=False) in production.

>>> from nutsflow import Collect
>>> numbers = [4.1, 3.2, 1.1]
>>> images = [np.zeros((5, 3)), np.ones((5, 3)) , np.ones((5, 3))]
>>> class_ids = [1, 2, 1]
>>> samples = zip(numbers, images, class_ids)
>>> build_batch = (BuildBatch(batchsize=2, verbose=True)
...                .input(0, 'number', 'float32')
...                .input(1, 'image', np.uint8, True)
...                .output(2, 'one_hot', np.uint8, 3))
>>> batches = samples >> build_batch >> Collect()
[[2:float32, 2x1x5x3:uint8], [2x3:uint8]]
[[1:float32, 1x1x5x3:uint8], [1x3:uint8]]

Sample columns can be ignored or reused. Assuming an autoencoder, one might which to use the sample image as input and output:

>>> build_batch = (BuildBatch(2, verbose=True)
...                .input(1, 'image', np.uint8, True)
...                .output(1, 'image', np.uint8, True))
>>> batches = samples >> build_batch >> Collect()  
[[2x1x5x3:uint8], [2x1x5x3:uint8]]
[[1x1x5x3:uint8], [1x1x5x3:uint8]]

A training batch is of the format [[inputs], [outputs]], e.g. in the first case above [[number, image], [one_hot]], where each of the columns is a Numpy array. If no output/target is provided, as in the prediction phase, the batch format is just [inputs].

>>> build_pred_batch = (BuildBatch(2, verbose=True)
...                     .input(1, 'image', 'uint8', True))
>>> batches = samples >> build_pred_batch >> Collect()  
[2x1x5x3:uint8]
[1x1x5x3:uint8]
Parameters:
  • batchsize (int) – Size of batch = number of rows in batch.
  • prefetch (int) – Number of batches to prefetch. This speeds up GPU based training, since one batch is built on CPU while the another is processed on the GPU. Note: if verbose=True, prefetch is set to 0 to simplify debugging.
  • verbose (bool) – Print batch shape when True. (and sets prefetch=0)
__rrshift__(iterable)[source]

Convert samples in iterable into mini-batches.

Structure of output depends on fmt function used. If None output is a list of np.arrays

Parameters:iterable (iterable) – Iterable over samples.
Returns:Mini-batches
Return type:list of np.array if fmt=None
by(col, name, *args, **kwargs)[source]

Specify and add batch columns to create batch.

DEPRECATED: Use input() and output() instead.

Parameters:
  • col (int) – column of the sample to extract and to create a batch column from.
  • name (string) – Name of the column function to apply to create a batch column, e.g. ‘image’ See the following functions for more details: ‘image’: nutsflow.batcher.build_image_batch ‘number’: nutsflow.batcher.build_number_batch ‘vector’: nutsflow.batcher.build_vector_batch ‘tensor’: nutsflow.batcher.build_tensor_batch ‘one_hot’: nutsflow.batcher.build_one_hot_batch
  • args (args) – Arguments for column function, e.g. dtype
  • kwargs (kwargs) – Keyword arguments for column function
Returns:

instance of BuildBatch

Return type:

BuildBatch

input(col, name, *args, **kwargs)[source]

Specify and add input columns for batch to create

Parameters:
  • col (int) – column of the sample to extract and to create a batch input column from.
  • name (string) – Name of the column function to apply to create a batch column, e.g. ‘image’ See the following functions for more details: ‘image’: nutsflow.batcher.build_image_batch ‘number’: nutsflow.batcher.build_number_batch ‘vector’: nutsflow.batcher.build_vector_batch ‘tensor’: nutsflow.batcher.build_tensor_batch ‘one_hot’: nutsflow.batcher.build_one_hot_batch
  • args (args) – Arguments for column function, e.g. dtype
  • kwargs (kwargs) – Keyword arguments for column function
Returns:

instance of BuildBatch

Return type:

BuildBatch

output(col, name, *args, **kwargs)[source]

Specify and add output columns for batch to create

Parameters:
  • col (int) – column of the sample to extract and to create a batch output column from.
  • name (string) – Name of the column function to apply to create a batch column, e.g. ‘image’ See the following functions for more details: ‘image’: nutsflow.batcher.build_image_batch ‘number’: nutsflow.batcher.build_number_batch ‘vector’: nutsflow.batcher.build_vector_batch ‘tensor’: nutsflow.batcher.build_tensor_batch ‘one_hot’: nutsflow.batcher.build_one_hot_batch
  • args (args) – Arguments for column function, e.g. dtype
  • kwargs (kwargs) – Keyword arguments for column function
Returns:

instance of BuildBatch

Return type:

BuildBatch

class Mixup(*args, **kwargs)[source]

Bases: nutsflow.base.NutFunction

Mixup produces random interpolations between data and labels.

Usage: … >> BuildBatch() >> Mixup(0.1) >> network.train() >> …

Implementation based on the following paper: mixup: Beyond Empirical Risk Minimization https://arxiv.org/abs/1710.09412

Parameters:
  • batch (list) – Batch consisting of list of input data and list of output data, where data must be numeric, e.g. images and one-hot-encoded class labels that can be interpolated between.
  • alpha (float) – Control parameter for beta distribution the interpolation factors are sampled from. Range: [0,…,1] For alpha <= 0 no mixup is performed.
Returns:

__call__(element)
build_image_batch(images, dtype, channelfirst=False)[source]

Return batch of images.

If images have no channel a channel axis is added. For channelfirst=True it will be added/moved to front otherwise the channel comes last. All images in batch will have a channel axis. Batch is of shape (n, c, h, w) or (n, h, w, c) depending on channelfirst, where n is the number of images in the batch.

>>> from nutsml.datautil import shapestr
>>> images = [np.zeros((2, 3)), np.ones((2, 3))]
>>> batch = build_image_batch(images, 'uint8', True)
>>> shapestr(batch)
'2x1x2x3'
>>> batch
array([[[[0, 0, 0],
         [0, 0, 0]]],


       [[[1, 1, 1],
         [1, 1, 1]]]], dtype=uint8)
Parameters:
  • array images (numpy) – Images to batch. Must be of shape (w,h,c) or (w,h). Gray-scale with channel is fine (w,h,1) and also alpha channel is fine (w,h,4).
  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
  • channelfirst (bool) – If True, channel is added/moved to front.
Returns:

Image batch with shape (n, c, h, w) or (n, h, w, c).

Return type:

np.array

build_number_batch(numbers, dtype)[source]

Return numpy array with given dtype for given numbers.

>>> numbers = (1, 2, 3, 1)
>>> build_number_batch(numbers, 'uint8')
array([1, 2, 3, 1], dtype=uint8)
Parameters:
  • number numbers (iterable) – Numbers to create batch from
  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
Returns:

Numpy array for numbers

Return type:

numpy.array

build_one_hot_batch(class_ids, dtype, num_classes)[source]

Return one hot vectors for class ids.

>>> class_ids = [0, 1, 2, 1]
>>> build_one_hot_batch(class_ids, 'uint8', 3)
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0]], dtype=uint8)
Parameters:
  • class_ids (iterable) – Class indices in {0, …, num_classes-1}
  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
  • num_classes – Number of classes
Returns:

One hot vectors for class ids.

Return type:

numpy.array

build_tensor_batch(tensors, dtype, axes=None)[source]

Return batch of tensors.

>>> from nutsml.datautil import shapestr
>>> tensors = [np.zeros((2, 3)), np.ones((2, 3))]
>>> batch = build_tensor_batch(tensors, 'uint8')
>>> shapestr(batch)
'2x2x3'
>>> print(batch)
[[[0 0 0]
  [0 0 0]]

 [[1 1 1]
  [1 1 1]]]
>>> batch = build_tensor_batch(tensors, 'uint8', axes = (1, 0))
>>> shapestr(batch)
'2x3x2'
>>> print(batch)
[[[0 0]
  [0 0]
  [0 0]]

 [[1 1]
  [1 1]
  [1 1]]]
Parameters:
  • tensors (iterable) – Numpy tensors
  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
  • axes (tuple|None) – axes order, e.g. to move a channel axis to the last position. (see numpy transpose for details)
Returns:

stack of tensors, with batch axis first.

Return type:

numpy.array

build_vector_batch(vectors, dtype)[source]

Return batch of vectors.

>>> from nutsml.datautil import shapestr
>>> vectors = [np.array([1,2,3]), np.array([2, 3, 4])]
>>> batch = build_vector_batch(vectors, 'uint8')
>>> shapestr(batch)
'2x3'
>>> batch
array([[1, 2, 3],
       [2, 3, 4]], dtype=uint8)
Parameters:
  • vectors (iterable) – Numpy row vectors
  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
Returns:

vstack of vectors

Return type:

numpy.array

nutsml.booster module

class Boost(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

iterable >> Boost(batcher, network, rand=None)

Boost samples with high softmax probability for incorrect class. Expects one-hot encoded targets and softmax predictions for output.

NOTE: prefetching of batches must be disabled when using boosting!

network = Network()
build_batch = BuildBatch(BATCHSIZE, prefetch=0).input(…).output(…)
boost = Boost(build_batch, network)
samples >> … ?>> boost >> build_batch >> network.train() >> Consume()
Parameters:
  • iterable (iterable) – Iterable with samples.
  • batcher (nutsml.BuildBatch) – Batcher used for network training.
  • network (nutsml.Network) – Network used for prediction
  • rand (Random|None) – Random number generator used for down-sampling. If None, random.Random() is used.
Returns:

Generator over samples to boost

Return type:

generator

__rrshift__(iterable)

nutsml.checkpoint module

class Checkpoint(create_net, parameters, checkpointspath='checkpoints')[source]

Bases: object

A factory for checkpoints to periodically save network weights and other hyper/configuration parameters.

Example usage:

def create_network(lr=0.01, momentum=0.9):
model = Sequential()
optimizer = opt.SGD(lr=lr, momentum=momentum)
model.compile(optimizer=optimizer, metrics=[‘accuracy’])
return KerasNetwork(model), optimizer

def parameters(network, optimizer):
return dict(lr = optimizer.lr, momentum = optimizer.momentum)

def train_network():
checkpoint = Checkpoint(create_network, parameters)
network, optimizer = checkpoint.load()

for epoch in xrange(EPOCHS):
train_err = train_network()
val_err = validate_network()

if epoch % 10 == 0: # Reduce learning rate every 10 epochs
optimizer.lr /= 2

checkpoint.save_best(val_err)

Checkpoints can also be saved under different names, e.g.

checkpoint.save_best(val_err, ‘checkpoint’+str(epoch))

And specific checkpoints can be loaded:

network, config = checkpoint.load(‘checkpoint103’)

If no checkpoint is specified the most recent one is loaded.

__init__(create_net, parameters, checkpointspath='checkpoints')[source]

Create checkpoint factory.

>>> def create_network(lr=0.1):
...     return 'MyNetwork', lr
>>> def parameters(network, lr):
...     return dict(lr = lr)
>>> checkpoint = Checkpoint(create_network, parameters)
>>> network, lr = checkpoint.load()
>>> network, lr
('MyNetwork', 0.1)
Parameters:
  • create_net (function) – Function that takes keyword parameters and returns a nuts-ml Network and and any other values or objects needed to describe the state to be checkpointed. Note: parameters(*create_net()) must work!
  • parameters (function) – Function that takes output of create_net() and returns dictionary with parameters (same as the one that are used in create_net(…))
  • checkpointspath (string) – Path to folder that will contain checkpoint folders.
datapaths(checkpointname=None)[source]

Return paths to network weights, parameters and config files.

If no checkpoints exist under basedir (None, None, None) is returned.

Parameters:checkpointname (str|None) – Name of checkpoint. If name is None the most recent checkpoint is used.
Returns:(weightspath, paramspath, configpath) or (None, None, None)
Return type:tuple
dirs()[source]

Return full paths to all checkpoint folders.

Returns:Paths to all folders under the basedir.
Return type:list
latest()[source]

Find most recently modified/created checkpoint folder.

Returns:Full path to checkpoint folder if it exists otherwise None.
Return type:str | None
load(checkpointname=None)[source]

Create network, load weights and parameters.

Parameters:checkpointname (str|none) – Name of checkpoint to load. If None the most recent checkpoint is used. If no checkpoint exists yet the network will be created but no weights loaded and the default configuration will be returned.
Returns:whatever self.create_net returns
Return type:object
save(checkpointname='checkpoint')[source]

Save network weights and parameters under the given name.

Parameters:checkpointname (str) – Name of checkpoint folder. Path will be self.basepath/checkpointname
Returns:path to checkpoint folder
Return type:str
save_best(score, checkpointname='checkpoint', isloss=False)[source]

Save best network weights and parameters under the given name.

Parameters:
  • score (float|int) – Some score indicating quality of network.
  • checkpointname (str) – Name of checkpoint folder.
  • isloss (bool) – True, score is a loss and lower is better otherwise higher is better.
Returns:

path to checkpoint folder

Return type:

str

nutsml.common module

class CheckNaN(*args, **kwargs)[source]

Bases: nutsflow.base.NutFunction

Raise exception if data contains NaN.

Useful to stop training if network doesn’t converge and loss gets NaN, e.g. samples >> network.train() >> CheckNan() >> log >> Consume()

>>> from nutsflow import Collect
>>> [1, 2, 3] >> CheckNaN() >> Collect()
[1, 2, 3]
>>> import numpy as np
>>> [1, np.NaN, 3] >> CheckNaN() >> Collect()
Traceback (most recent call last):
...
RuntimeError: NaN encountered: nan
Parameters:data – Items or iterables.
Returns:Return input data if they don’t contain NaNs
Return type:any
Raise:RuntimeError if data contains NaN.
__call__(element)
class ConvertLabel(column, labels, onehot=False)[source]

Bases: nutsflow.base.NutFunction

Convert string labels to integer class ids (or one-hot) and vice versa.

__call__(sample)[source]

Return sample and replace label within sample if it is a sample

__init__(column, labels, onehot=False)[source]

Convert string labels to integer class ids (or one-hot) and vice versa.

Also converts confidence vectors, e.g. softmax output or float values to class labels.

>>> from nutsflow import Collect
>>> labels = ['class0', 'class1', 'class2']
>>> convert = ConvertLabel(None, labels)
>>> [1, 0] >> convert >> Collect()
['class1', 'class0']
>>> ['class1', 'class0'] >> convert >> Collect()
[1, 0]
>>> [0.9, 0.4, 1.6] >> convert >> Collect()
['class1', 'class0', 'class2']
>>> [[0.1, 0.7, 0.2], [0.8, 0.1, 0.1]] >> convert >> Collect()
['class1', 'class0']
>>> convert = ConvertLabel(None, labels, onehot=True)
>>> ['class1', 'class0'] >> convert >> Collect()
[[0, 1, 0], [1, 0, 0]]
>>> convert = ConvertLabel(1, labels)
>>> [('data', 'class1'), ('data', 'class0')] >> convert >> Collect()
[('data', 1), ('data', 0)]
>>> [('data', 1), ('data', 2)] >> convert >> Collect()
[('data', 'class1'), ('data', 'class2')]
>>> [('data', 0.9)] >> convert >> Collect()
[('data', 'class1')]
>>> [('data', [0.1, 0.7, 0.2])] >> convert >> Collect()
[('data', 'class1')]
Parameters:
  • column (int) – Index of column in sample that contains label. If None process labels directly.
  • labels (list|tuple) – List of class labels (strings).
  • onehot (bool) – True: convert class labels to one-hot encoded vectors. False, convert to class index.
class PartitionByCol(*args, **kwargs)[source]

Bases: nutsflow.base.NutSink

Partition samples in iterables depending on column value.

>>> samples = [(1,1), (2,0), (2,4), (1,3), (3,0)]
>>> ones, twos = samples >> PartitionByCol(0, [1, 2])
>>> ones
[(1, 1), (1, 3)]
>>> twos
[(2, 0), (2, 4)]

Note that values does not need to contain all possible values. It is sufficient to provide the values for the partitions wanted.

Parameters:
  • iterable (iterable) – Iterable over samples
  • column (int) – Index of column to extract
  • values (list) – List of column values to create partitions for.
Returns:

tuple of partitions

Return type:

tuple

__rrshift__(iterable)
class SplitRandom(*args, **kwargs)[source]

Bases: nutsflow.base.NutSink

Randomly split iterable into partitions.

For the same input data the same split is created every time and is stable across different Python version 2.x or 3.x. A random number generator can be provided to create varying splits.

>>> train, val = range(10) >> SplitRandom(ratio=0.7)
>>> train, val
([6, 3, 1, 7, 0, 2, 4], [5, 9, 8])
>>> range(10) >> SplitRandom(ratio=0.7)  # Same split again
[[6, 3, 1, 7, 0, 2, 4], [5, 9, 8]]
>>> train, val, test = range(10) >> SplitRandom(ratio=(0.6, 0.3, 0.1))
>>> train, val, test
([6, 1, 4, 0, 3, 2], [8, 7, 9], [5])
>>> data = zip('aabbccddee', range(10))
>>> same_letter = lambda t: t[0]
>>> train, val = data >> SplitRandom(ratio=0.6, constraint=same_letter)
>>> train
[('a', 1), ('a', 0), ('d', 7), ('b', 2), ('d', 6), ('b', 3)]
>>> val
[('c', 5), ('e', 8), ('e', 9), ('c', 4)]
Parameters:
  • iterable (iterable) – Iterable over anything. Will be consumed!
  • ratio (float|tuple) – Ratio of two partition e.g. a ratio of 0.7 means 70%, 30% split. Alternatively a list or ratios can be provided, e.g. ratio=(0.6, 0.3, 0.1). Note that ratios must sum up to one.
  • constraint (function|None) – Function that returns key the elements of the iterable are grouped by before partitioning. Useful to ensure that a partition contains related elements, e.g. left and right eye images are not scattered across partitions. Note that constrains have precedence over ratios.
  • rand (Random|None) – Random number generator. The default None ensures that the same split is created every time SplitRandom is called. This is important when continuing an interrupted training session or running the same training on machines with different Python versions. Note that Python’s random.Random(0) generates different number for Python 2.x and 3.x!
Returns:

partitions of iterable with sizes according to provided ratios.

Return type:

(list, list, ..)

__rrshift__(iterable)

nutsml.config module

class Config(*args, **kwargs)[source]

Bases: dict

Dictionary that allows access via keys or attributes.

Used to store and access configuration data.

__init__(*args, **kwargs)[source]

Create dictionary.

>>> contact = Config({'name':'stefan', 'address':{'number':12}})
>>> contact['name']
'stefan'
>>> contact.name
'stefan'
>>> contact.address.number
12
>>> contact.surname = 'maetschke'
>>> contact.surname
'maetschke'
Parameters:
  • args (args) – See dict
  • kwargs (kwargs) – See dict
static isjson(filepath)[source]

Return true if filepath ends with ‘.json’.

Parameters:filepath (str) – Filepaht
Returns:True if filepath points ot JSON file.
Return type:bool
load(filepath)[source]

Load configuration from file in JSON or YAML format.

>>> cfg = Config().load('tests/data/configuration.json')
>>> cfg.number
13
Parameters:filepath (str) – Path to JSON or YAML file.
Returns:returns loaded configuration.
Return type:Config
save(filepath)[source]

Save configuration to file in JSON or YAML format.

>>> cfg = Config({'number': 13, 'name': 'Stefan'})
>>> cfg.save('tests/data/configuration.yaml')
Parameters:filepath (str) – Filepath. Should end with ‘.json’ or ‘.yaml’
load_config(filename)[source]

Load configuration file in YAML format from locations in defined order.

The search order for the config file is: 1) user home dir 2) current dir 3) full path

Example file: ‘tests/data/config.yaml’
filepath : c:/Maet
imagesize : [100, 200]
>>> cfg = load_config('tests/data/config.yaml')
>>> cfg.filepath
'c:/Maet'
>>> cfg['imagesize']
[100, 200]
Parameters:filename – Name or full path of configuration file.
Returns:dictionary with config data. Note that config data can be accessed by key or attribute, e.g. cfg.filepath or cfg.[‘filepath’]
Return type:ConfigDict

nutsml.datautil module

col_map(sample, columns, func, *args, **kwargs)[source]

Map function to given columns of sample and keep other columns

>>> sample = (1, 2, 3)
>>> add_n = lambda x, n: x + n
>>> col_map(sample, 1, add_n, 10)
(1, 12, 3)
>>> col_map(sample, (0, 2), add_n, 10)
(11, 2, 13)
Parameters:
  • sample (tuple|list) – Sample
  • columns (int|tuple) – Single or multiple column indices.
  • func (function) – Function to map
  • args (args) – Arguments passed on to function
  • kwargs (kwargs) – Keyword arguments passed on to function
Returns:

Sample where function has been applied to elements in the given columns.

group_by(elements, keyfunc, ordered=False)[source]

Group elements using the given key function.

>>> is_odd = lambda x: bool(x % 2)
>>> numbers = [0, 1, 2, 3, 4]
>>> group_by(numbers, is_odd, True)
OrderedDict([(False, [0, 2, 4]), (True, [1, 3])])
Parameters:
  • elements (iterable) – Any iterable
  • keyfunc (function) – Function that returns key to group by
  • ordered (bool) – True: return OrderedDict else return dict
Returns:

dictionary with results of keyfunc as keys and the elements for that key as value

Return type:

dict|OrderedDict

group_samples(samples, labelcol, ordered=False)[source]

Return samples grouped by label and label counts.

>>> samples = [('pos', 1), ('pos', 1), ('neg', 0)]  
>>> groups, labelcnts = group_samples(samples, 1, True)
>>> groups
OrderedDict([(1, [('pos', 1), ('pos', 1)]), (0, [('neg', 0)])])
>>> labelcnts
Counter({1: 2, 0: 1})
Parameters:
  • samples (iterable) – Iterable of samples where each sample has a label at a fixed position (labelcol)
  • labelcol (int) – Index of label in sample
  • ordered (bool) – True: samples are kept in order when grouping.
Returns:

(groups, labelcnts) where groups is a dict containing samples grouped by label, and labelcnts is a Counter dict containing label frequencies.

Return type:

tuple(dict, Counter)

isnan(x)[source]

Check if something is NaN.

>>> import numpy as np
>>> isnan(np.NaN)
True
>>> isnan(0)
False
Parameters:x (object) – Any object
Returns:True if x is NaN
Return type:bool
random_downsample(samples, labelcol, rand=None, ordered=False)[source]

Randomly down-sample samples.

Creates stratified samples by down-sampling larger classes to the size of the smallest class.

Note: The example shown below uses StableRandom(i) to create a deterministic sequence of randomly stratified samples. Usually it is sufficient to use the default (rand=None). Do NOT use rnd.Random(0) since this will generate the same subsample every time.

>>> from __future__ import print_function  
>>> from nutsflow.common import StableRandom
>>> samples = [('pos1', 1), ('pos2', 1), ('pos3', 1),
...            ('neg1', 0), ('neg2', 0)]
>>> for i in range(3):
...     print(random_downsample(samples, 1, StableRandom(i), True))
[('pos2', 1), ('pos3', 1), ('neg2', 0), ('neg1', 0)]
[('pos2', 1), ('pos3', 1), ('neg2', 0), ('neg1', 0)]
[('pos2', 1), ('pos1', 1), ('neg1', 0), ('neg2', 0)]
Parameters:
  • samples (iterable) – Iterable of samples where each sample has a label at a fixed position (labelcol). Labels can be any hashable type, e.g. int, str, bool
  • labelcol (int) – Index of label in sample
  • rand (Random|None) – Random number generator. If None, random.Random(None) is used.
  • ordered (bool) – True: samples are kept in order when downsampling.
Returns:

Stratified sample set.

Return type:

list of samples

shapestr(array, with_dtype=False)[source]

Return string representation of array shape.

>>> import numpy as np
>>> a = np.zeros((3,4))
>>> shapestr(a)
'3x4'
>>> a = np.zeros((3,4), dtype='uint8')
>>> shapestr(a, True)
'3x4:uint8'
Parameters:
  • array (ndarray) – Numpy array
  • with_dtype (bool) – Append dtype of array to shape string
Returns:

Shape as string, e.g shape (3,4) becomes 3x4

Return type:

str

shuffle_sublists(sublists, rand)[source]

Shuffles the lists within a list but not the list itself.

>>> from nutsflow.common import StableRandom
>>> rand = StableRandom(0)
>>> sublists = [[1, 2, 3], [4, 5, 6, 7]]
>>> shuffle_sublists(sublists, rand)
>>> sublists
[[1, 3, 2], [4, 5, 7, 6]]
Parameters:
  • sublists – A list containing lists
  • rand (Random) – A random number generator.
upsample(samples, labelcol, rand=None)[source]

Up-sample sample set.

Creates stratified samples by up-sampling smaller classes to the size of the largest class.

Note: The example shown below uses rnd.Random(i) to create a deterministic sequence of randomly stratified samples. Usually it is sufficient to use the default (rand=None).

>>> from __future__ import print_function
>>> import random as rnd
>>> samples = [('pos1', 1), ('pos2', 1), ('neg1', 0)]
>>> for i in range(3):  
...     print(upsample(samples, 1, rand=rnd.Random(i)))
[('neg1', 0), ('neg1', 0), ('pos1', 1), ('pos2', 1)]
[('pos2', 1), ('neg1', 0), ('pos1', 1), ('neg1', 0)]
[('neg1', 0), ('neg1', 0), ('pos1', 1), ('pos2', 1)]
Parameters:
  • samples (iterable) – Iterable of samples where each sample has a label at a fixed position (labelcol). Labels can by any hashable type, e.g. int, str, bool
  • labelcol (int) – Index of label in sample
  • rand (Random|None) – Random number generator. If None, random.Random(None) is used.
Returns:

Stratified sample set.

Return type:

list of samples

nutsml.fileutil module

clear_folder(path)[source]

Remove all content (files and folders) within the specified folder.

Parameters:path (str) – Path of folder to clear.
create_filename(prefix='', ext='')[source]

Create a unique filename.

Parameters:
  • prefix (str) – Prefix to add to filename.
  • ext (str) – Extension to append to filename, e.g. ‘jpg’
Returns:

Unique filename.

Return type:

str

create_folders(path, mode=511)[source]

Create folder(s). Don’t fail if already existing.

See related functions delete_folders() and clear_folder().

Parameters:
  • path (str) – Path of folders to create, e.g. ‘foo/bar’
  • mode (int) – File creation mode, e.g. 0777
create_temp_filepath(prefix='', ext='', relative=True)[source]

Create a temporary folder under TEMP_FOLDER.

If the folder already exists do nothing. Return relative (default) or absolute path to a temp file with a unique name.

See related function create_filename().

Parameters:
  • prefix (str) – Prefix to add to filename.
  • ext (str) – Extension to append to filename, e.g. ‘jpg’
  • relative (bool) – True: return relative path, otherwise absolute path.
Returns:

Path to file with unique name in temp folder.

Return type:

str

delete_file(path)[source]

Remove file at given path. Don’t fail if non-existing.

Parameters:path (str) – Path to file to delete, e.g. ‘foo/bar/file.txt’
delete_folders(path)[source]

Remove folder and sub-folders. Don’t fail if non-existing or not empty.

Parameters:path (str) – Path of folders to delete, e.g. ‘foo/bar’
delete_temp_data()[source]

Remove TEMP_FOLDER and all its contents.

nutsml.imageutil module

add_channel(image, channelfirst)[source]

Add channel if missing and make first axis if requested.

>>> import numpy as np
>>> image = np.ones((10, 20))
>>> image = add_channel(image, True)
>>> shapestr(image)
'1x10x20'
Parameters:
  • image (ndarray) – RBG (h,w,3) or gray-scale image (h,w).
  • channelfirst (bool) – If True, make channel first axis
Returns:

Numpy array with channel (as first axis if makefirst=True)

Return type:

numpy.array

annotation2coords(image, annotation)[source]

Convert geometric annotation in image to pixel coordinates.

For example, given a rectangular region annotated in an image as (‘rect’, ((x, y, w, h))) the function returns the coordinates of all pixels within this region as (row, col) position tuples.

The following annotation formats are supported: (‘point’, ((x, y), … )) (‘circle’, ((x, y, r), …)) (‘ellipse’, ((x, y, rx, ry, rot), …)) (‘rect’, ((x, y, w, h), …)) (‘polyline’, (((x, y), (x, y), …), …))

Annotation regions can exceed the image dimensions and will be clipped. Note that annotation is in x,y order while output is r,c (row, col).

>>> import numpy as np
>>> img = np.zeros((5, 5), dtype='uint8')
>>> anno = ('point', ((1, 1), (1, 2)))
>>> for rr, cc in annotation2coords(img, anno):
...     print(list(rr), list(cc))
[1] [1]
[2] [1]
Parameters:
  • image (ndarray) – Image
  • annotation (annotation) – Annotation of an image region such as point, circle, rect or polyline
Returns:

Coordinates of pixels within the (clipped) region.

Return type:

generator over tuples (row, col)

annotation2mask(image, annotations, pos=255)[source]

Convert geometric annotation to mask.

For annotation formats see: imageutil.annotation2coords

>>> import numpy as np
>>> img = np.zeros((3, 3), dtype='uint8')
>>> anno = ('point', ((0, 1), (2, 0)))
>>> annotation2mask(img, anno)
array([[  0,   0, 255],
       [255,   0,   0],
       [  0,   0,   0]], dtype=uint8)
Parameters:
  • annotation (annotation) – Annotation of an image region such as point, circle, rect or polyline
  • pos (int) – Value to write in mask for regions defined by annotation
  • array image (numpy) – Image annotation refers to. Returned mask will be of same size.
Returns:

Mask with annotation

Return type:

numpy array

annotation2pltpatch(annotation, **kwargs)[source]

Convert geometric annotation to matplotlib geometric objects (=patches)

For details regarding matplotlib patches see: http://matplotlib.org/api/patches_api.html For annotation formats see: imageutil.annotation2coords

Parameters:annotation (annotation) – Annotation of an image region such as point, circle, rect or polyline
Returns:matplotlib.patches
Return type:generator over matplotlib patches
arr_to_pil(image)[source]

Convert numpy array to PIL image.

>>> import numpy as np
>>> rgb_arr = np.ones((5, 4, 3), dtype='uint8')
>>> pil_img = arr_to_pil(rgb_arr)
>>> pil_img.size
(4, 5)
Parameters:image (ndarray) – Numpy array with dtype ‘uint8’ and dimensions (h,w,c) for RGB or (h,w) for gray-scale images.
Returns:PIL image
Return type:PIL.Image
centers_inside(centers, image, pshape)[source]

Filter center points of patches ensuring that patch is inside of image.

>>> centers = np.array([[1, 2], [0,1]])
>>> image = np.zeros((3, 4))
>>> centers_inside(centers, image, (3, 3)).astype('uint8')
array([[1, 2]], dtype=uint8)
Parameters:
  • centers (ndarray(n,2)) – Center points of patches.
  • image (ndarray(h,w)) – Image the patches should be inside.
  • pshape (tuple) – Patch shape of form (h,w)
Returns:

Patch centers where the patch is completely inside the image.

Return type:

ndarray of shape (n, 2)

change_brightness(image, brightness=1.0)[source]

Change brightness of image.

>>> image = np.eye(3, dtype='uint8') * 255
>>> change_brightness(image, 0.5)
array([[127,   0,   0],
       [  0, 127,   0],
       [  0,   0, 127]], dtype=uint8)

See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Brightness

Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • brightness (float) – Brightness [0, 1]
Returns:

Image with changed brightness

Return type:

numpy array with range [0,255] and dtype ‘uint8’

change_color(image, color=1.0)[source]

Change color of image.

>>> image = np.eye(3, dtype='uint8') * 255
>>> change_color(image, 0.5)
array([[255,   0,   0],
       [  0, 255,   0],
       [  0,   0, 255]], dtype=uint8)

See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Color

Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • color (float) – Color [0, 1]
Returns:

Image with changed color

Return type:

numpy array with range [0,255] and dtype ‘uint8’

change_contrast(image, contrast=1.0)[source]

Change contrast of image.

>>> image = np.eye(3, dtype='uint8') * 255
>>> change_contrast(image, 0.5)
array([[170,  42,  42],
       [ 42, 170,  42],
       [ 42,  42, 170]], dtype=uint8)

See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Contrast

Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • contrast (float) – Contrast [0, 1]
Returns:

Image with changed contrast

Return type:

numpy array with range [0,255] and dtype ‘uint8’

change_sharpness(image, sharpness=1.0)[source]

Change sharpness of image.

>>> image = np.eye(3, dtype='uint8') * 255
>>> change_sharpness(image, 0.5)
array([[255,   0,   0],
       [  0, 196,   0],
       [  0,   0, 255]], dtype=uint8)

See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Sharpness

Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • sharpness (float) – Sharpness [0, …]
Returns:

Image with changed sharpness

Return type:

numpy array with range [0,255] and dtype ‘uint8’

crop(image, x1, y1, x2, y2)[source]

Crop image.

>>> import numpy as np
>>> image = np.reshape(np.arange(16, dtype='uint8'), (4, 4))
>>> crop(image, 1, 2, 5, 5)
array([[ 9, 10, 11],
       [13, 14, 15]], dtype=uint8)
Parameters:
  • array image (numpy) – Numpy array.
  • x1 (int) – x-coordinate of left upper corner of crop (inclusive)
  • y1 (int) – y-coordinate of left upper corner of crop (inclusive)
  • x2 (int) – x-coordinate of right lower corner of crop (exclusive)
  • y2 (int) – y-coordinate of right lower corner of crop (exclusive)
Returns:

Cropped image

Return type:

numpy array

crop_center(image, w, h)[source]

Crop region with size w, h from center of image.

Note that the crop is specified via w, h and not via shape (h,w). Furthermore if the image or the crop region have even dimensions, coordinates are rounded down.

>>> import numpy as np
>>> image = np.reshape(np.arange(16, dtype='uint8'), (4, 4))
>>> crop_center(image, 3, 2)
array([[ 4,  5,  6],
       [ 8,  9, 10]], dtype=uint8)
Parameters:
  • array image (numpy) – Numpy array.
  • w (int) – Width of crop
  • h (int) – Height of crop
Returns:

Cropped image

Return type:

numpy array

Raise:

ValueError if image is smaller than crop region

crop_square(image)[source]

Crop image to square shape.

Crops symmetrically left and right or top and bottom to achieve aspect ratio of one and preserves the largest dimension.

Parameters:array image (numpy) – Numpy array.
Returns:Cropped image
Return type:numpy array
distort_elastic(image, smooth=10.0, scale=100.0, seed=0)[source]

Elastic distortion of images.

Channel axis in RGB images will not be distorted but grayscale or RGB images are both valid inputs. RGB and grayscale images will be distorted identically for the same seed.

Simard, et. al, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Parameters:
  • image (ndarray) – Image of shape [h,w] or [h,w,c]
  • smooth (float) – Smoothes the distortion.
  • scale (float) – Scales the distortion.
  • seed (int) – Seed for random number generator. Ensures that for the same seed images are distorted identically.
Returns:

Distorted image with same shape as input image.

Return type:

ndarray

enhance(image, func, *args, **kwargs)[source]

Enhance image using a PIL enhance function

See the following link for details on PIL enhance functions: http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html

>>> from PIL.ImageEnhance import Brightness
>>> image = np.ones((3,2), dtype='uint8')
>>> enhance(image, Brightness, 0.0)
array([[0, 0],
       [0, 0],
       [0, 0]], dtype=uint8)
Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • func (function) – PIL ImageEnhance function
  • args (args) – Argument list passed on to enhance function.
  • kwargs (kwargs) – Key-word arguments passed on to enhance function
Returns:

Enhanced image

Return type:

numpy array with range [0,255] and dtype ‘uint8’

extract_patch(image, pshape, r, c)[source]

Extract a patch of given shape, centered at r,c of given shape from image.

Note that there is no checking if the patch region is inside the image.

>>> image = np.reshape(np.arange(16, dtype='uint8'), (4, 4))
>>> extract_patch(image, (2, 3), 2, 2)
array([[ 5,  6,  7],
       [ 9, 10, 11]], dtype=uint8)
Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’. Can be of shapes MxN, MxNxC.
  • pshape (tuple) – Shape of patch. #Dimensions must match image.
  • r (int) – Row for center of patch
  • c (int) – Column for center of patch
Returns:

numpy array with shape pshape

Return type:

numpy array with range [0,255] and dtype ‘uint8’

fliplr(image)[source]

Flip image left to right.

>>> image = np.reshape(np.arange(4, dtype='uint8'), (2,2))
>>> fliplr(image)
array([[1, 0],
       [3, 2]], dtype=uint8)
Parameters:array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
Returns:Flipped image
Return type:numpy array with range [0,255] and dtype ‘uint8’
flipud(image)[source]

Flip image up to down.

>>> image = np.reshape(np.arange(4, dtype='uint8'), (2,2))
>>> flipud(image)
array([[2, 3],
       [0, 1]], dtype=uint8)
Parameters:array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
Returns:Flipped image
Return type:numpy array with range [0,255] and dtype ‘uint8’
floatimg2uint8(image)[source]

Convert array with floats to ‘uint8’ and rescale from [0,1] to [0, 256].

Converts only if image.dtype != uint8.

>>> import numpy as np
>>> image = np.eye(10, 20, dtype=float)
>>> arr = floatimg2uint8(image)
>>> np.max(arr)
255
Parameters:image (numpy.array) – Numpy array with range [0,1]
Returns:Numpy array with range [0,255] and dtype ‘uint8’
Return type:numpy array
gray2rgb(image)[source]

Grayscale scale image to RGB image

>>> image = np.eye(3, dtype='uint8') * 255
>>> gray2rgb(image)
array([[[255, 255, 255],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 255, 255],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [255, 255, 255]]], dtype=uint8)
Parameters:array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
Returns:RGB image
Return type:numpy array with range [0,255] and dtype ‘uint8’
identical(image)[source]

Return input image unchanged.

Parameters:image (numpy.array) – Should be a numpy array of an image.
Returns:Same as input
Return type:Same as input
load_image(filepath, as_grey=False, dtype='uint8', no_alpha=True)[source]

Load image as numpy array from given filepath.

Supported formats: gif, png, jpg, bmp, tif, npy

>>> img = load_image('tests/data/img_formats/nut_color.jpg')
>>> shapestr(img)
'213x320x3'
Parameters:
  • filepath (string) – Filepath to image file or numpy array.
  • as_grey (bool) –
Returns:

numpy array with shapes (h, w) for grayscale or monochrome, (h, w, 3) for RGB (3 color channels in last axis) (h, w, 4) for RGBA (for no_alpha = False) (h, w, 3) for RGBA (for no_alpha = True) pixel values are in range [0,255] for dtype = uint8

Return type:

numpy ndarray

mask_choice(mask, value, n)[source]

Random selection of n points where mask has given value

>>> np.random.seed(1)   # ensure same random selection for doctest
>>> mask = np.eye(3, dtype='uint8')
>>> mask_choice(mask, 1, 2).tolist()
[[0, 0], [2, 2]]
Parameters:
  • array mask (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • n (int) – Number of points to select. If n is larger than the points available only the available points will be returned.
Returns:

Array with x,y coordinates

Return type:

numpy array with shape nx2 where each row contains x, y

mask_where(mask, value)[source]

Return x,y coordinates where mask has specified value

>>> mask = np.eye(3, dtype='uint8')
>>> mask_where(mask, 1).tolist()
[[0, 0], [1, 1], [2, 2]]
Parameters:array mask (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
Returns:Array with x,y coordinates
Return type:numpy array with shape Nx2 where each row contains x, y
normalize_histo(image, gamma=1.0)[source]

Perform histogram normalization on image.

Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • gamma (float) – Factor for gamma adjustment.
Returns:

Normalized image

Return type:

numpy array with range [0,255] and dtype ‘uint8’

occlude(image, x, y, w, h, color=0)[source]

Occlude image with a rectangular region.

Occludes an image region with dimensions w,h centered on x,y with the given color. Invalid x,y coordinates will be clipped to ensure complete occlusion rectangle is within the image.

>>> import numpy as np
>>> image = np.ones((4, 5)).astype('uint8')
>>> occlude(image, 2, 2, 2, 3)
array([[1, 1, 1, 1, 1],
       [1, 0, 0, 1, 1],
       [1, 0, 0, 1, 1],
       [1, 0, 0, 1, 1]], dtype=uint8)
>>> image = np.ones((4, 4)).astype('uint8')
>>> occlude(image, 0.5, 0.5, 0.5, 0.5)
array([[1, 1, 1, 1],
       [1, 0, 0, 1],
       [1, 0, 0, 1],
       [1, 1, 1, 1]], dtype=uint8)
Parameters:
  • array image (numpy) – Numpy array.
  • x (int|float) – x coordinate for center of occlusion region. Can be provided as fraction (float) of image width
  • y (int|float) – y coordinate for center of occlusion region. Can be provided as fraction (float) of image height
  • w (int|float) – width of occlusion region. Can be provided as fraction (float) of image width
  • h (int|float) – height of occlusion region. Can be provided as fraction (float) of image height
  • color (int|tuple) – gray-scale or RGB color of occlusion.
Returns:

Copy of input image with occluded region.

Return type:

numpy array

patch_iter(image, shape=(3, 3), stride=1)[source]

Extracts patches from images with given shape.

Patches are extracted in a regular grid with the given stride, starting in the left upper corner and then row-wise. Image can be gray-scale (no third channel dim) or color.

>>> import numpy as np
>>> img = np.reshape(np.arange(12), (3, 4))
>>> for p in patch_iter(img, (2, 2), 2):
...     print(p)
[[0 1]
 [4 5]]
[[2 3]
 [6 7]]
Parameters:
  • image (ndarray) – Numpy array of shape h,w,c or h,w.
  • shape (tuple) – Shape of patch (h,w)
  • stride (int) – Step size of grid patches are extracted from
Returns:

Iterator over patches

Return type:

Iterator

pil_to_arr(image)[source]

Convert PIL image to Numpy array.

>>> import numpy as np
>>> rgb_arr = np.ones((5, 4, 3), dtype='uint8')
>>> pil_img = arr_to_pil(rgb_arr)
>>> arr = pil_to_arr(pil_img)
>>> shapestr(arr)
'5x4x3'
Parameters:image (PIL.Image) – PIL image (RGB or grayscale)
Returns:Numpy array
Return type:numpy.array with dtype ‘uint8’
polyline2coords(points)[source]

Return row and column coordinates for a polyline.

>>> rr, cc = polyline2coords([(0, 0), (2, 2), (2, 4)])
>>> list(rr)
[0, 1, 2, 2, 3, 4]
>>> list(cc)
[0, 1, 2, 2, 2, 2]
Parameters:of tuple points (list) – Polyline in format [(x1,y1), (x2,y2), …]
Returns:tuple with row and column coordinates in numpy arrays
Return type:tuple of numpy array
rerange(image, old_min, old_max, new_min, new_max, dtype)[source]

Return image with values in new range.

Note: The default range of images is [0, 255] and most image processing functions expect this range and will fail otherwise. However, as input to neural networks re-ranged images, e.g [-1, +1] are sometimes needed.

>>> import numpy as np
>>> image = np.array([[0, 255], [255, 0]])
>>> rerange(image, 0, 255, -1, +1, 'float32')
array([[-1.,  1.],
       [ 1., -1.]], dtype=float32)
Parameters:
  • image (numpy.array) – Should be a numpy array of an image.
  • old_min (int|float) – Current minimum value of image, e.g. 0
  • old_max (int|float) – Current maximum value of image, e.g. 255
  • new_min (int|float) – New minimum, e.g. -1.0
  • new_max (int|float) – New maximum, e.g. +1.0
  • datatype dtype (numpy) – Data type of output image, e.g. float32’ or np.uint8
Returns:

Image with values in new range.

resize(image, w, h, **kwargs)[source]

Resize image.

Image can be up- or down-sized (using interpolation). For details see: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.resize

>>> image = np.ones((10,5), dtype='uint8')
>>> resize(image, 4, 3)
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=uint8)
Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • w (int) – Width in pixels.
  • h (int) – Height in pixels.
  • kwargs (kwargs) – Keyword arguments for the underlying scikit-image resize function, e.g. order=1 for linear interpolation.
Returns:

Resized image

Return type:

numpy array with range [0,255] and dtype ‘uint8’

rgb2gray(image)[source]

RGB scale image to grayscale image

>>> image = np.eye(3, dtype='uint8') * 255
>>> rgb2gray(image)
array([[255,   0,   0],
       [  0, 255,   0],
       [  0,   0, 255]], dtype=uint8)
Parameters:array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
Returns:grayscale image
Return type:numpy array with range [0,255] and dtype ‘uint8’
rotate(image, angle=0, **kwargs)[source]

Rotate image.

For details see: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.rotate

For a smooth interpolation of images set ‘order=1’. To rotate masks use the default ‘order=0’.

>>> image = np.eye(3, dtype='uint8')
>>> rotate(image, 90)
array([[0, 0, 1],
       [0, 1, 0],
       [1, 0, 0]], dtype=uint8)
Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • angle (float) – Angle in degrees in counter-clockwise direction
  • kwargs (kwargs) – Keyword arguments for the underlying scikit-image rotate function, e.g. order=1 for linear interpolation.
Returns:

Rotated image

Return type:

numpy array with range [0,255] and dtype ‘uint8’

sample_labeled_patch_centers(mask, value, pshape, n, label)[source]

Randomly pick n points in mask where mask has given value and add label.

Same as imageutil.sample_mask but adds given label to each center

>>> mask = np.zeros((3, 4))
>>> mask[1, 2] = 1
>>> sample_labeled_patch_centers(mask, 1, (1, 1), 1, 0)
array([[1, 2, 0]], dtype=uint16)
Parameters:
  • mask (ndarray) – Mask
  • value (int) – Sample points in mask that have this value.
  • pshape (tuple) – Patch shape of form (h,w)
  • n (int) – Number of points to sample. If there is not enough points to sample from a smaller number will be returned. If there are not points at all np.empty((0, 2)) will be returned.
  • label (int) – Numeric label to append to each center point
Returns:

Center points of patches within the mask where the center point has the given mask value and the label

Return type:

ndarray of shape (n, 3)

sample_mask(mask, value, pshape, n)[source]

Randomly pick n points in mask where mask has given value.

Ensure that only points picked that can be center of a patch with shape pshape that is inside the mask.

>>> mask = np.zeros((3, 4))
>>> mask[1, 2] = 1
>>> sample_mask(mask, 1, (1, 1), 1)
array([[1, 2]], dtype=uint16)
Parameters:
  • mask (ndarray) – Mask
  • value (int) – Sample points in mask that have this value.
  • pshape (tuple) – Patch shape of form (h,w)
  • n (int) – Number of points to sample. If there is not enough points to sample from a smaller number will be returned. If there are not points at all np.empty((0, 2)) will be returned.
Returns:

Center points of patches within the mask where the center point has the given mask value.

Return type:

ndarray of shape (n, 2)

sample_patch_centers(mask, pshape, npos, nneg, pos=255, neg=0)[source]

Sample positive and negative patch centers where mask value is pos or neg.

The sampling routine ensures that the patch is completely inside the mask.

>>> np.random.seed(0)   # just to ensure consistent doctest
>>> mask = np.zeros((3, 4))
>>> mask[1, 2] = 255
>>> sample_patch_centers(mask, (2, 2), 1, 1)
array([[1, 1, 0],
       [1, 2, 1]], dtype=uint16)
Parameters:
  • mask (ndarray) – Mask
  • pshape (tuple) – Patch shape of form (h,w)
  • npos (int) – Number of positives to sample.
  • nneg (int) – Number of negatives to sample.
  • pos (int) – Value for positive points in mask
  • neg (int) – Value for negative points in mask
Returns:

Center points of patches within the mask where the center point has the given mask value (pos, neg) and the label (1, 0)

Return type:

ndarray of shape (n, 3)

sample_pn_patches(image, mask, pshape, npos, nneg, pos=255, neg=0)[source]

Sample positive and negative patches where mask value is pos or neg.

The sampling routine ensures that the patch is completely inside the image and mask and that a patch a the same position is extracted from the image and the mask.

>>> np.random.seed(0)   # just to ensure consistent doctest
>>> mask = np.zeros((3, 4), dtype='uint8')
>>> img = np.reshape(np.arange(12, dtype='uint8'), (3, 4))
>>> mask[1, 2] = 255
>>> for ip, mp, l in sample_pn_patches(img, mask, (2, 2), 1, 1):
...     print(ip)
...     print(mp)
...     print(l)
[[0 1]
 [4 5]]
[[0 0]
 [0 0]]
0
[[1 2]
 [5 6]]
[[  0   0]
 [  0 255]]
1
Parameters:
  • mask (ndarray) – Mask
  • pshape (tuple) – Patch shape of form (h,w)
  • npos (int) – Number of positives to sample.
  • nneg (int) – Number of negatives to sample.
  • pos (int) – Value for positive points in mask
  • neg (int) – Value for negative points in mask
Returns:

Image and mask patches where the patch center point has the given mask value (pos, neg) and the label (1, 0)

Return type:

tuple(image_patch, mask_patch, label)

save_image(filepath, image)[source]

Save numpy array as image (or numpy array) to given filepath.

Supported formats: gif, png, jpg, bmp, tif, npy

Parameters:
  • filepath (string) – File path for image file. Extension determines image file format, e.g. .gif
  • array image (numpy) – Numpy array to save as image. Must be of shape (h,w) or (h,w,3) or (h,w,4)
set_default_order(kwargs)[source]

Set order parameter in kwargs for scikit-image functions.

Default order is 1, which performs a linear interpolation of pixel values when images are rotated, resized and sheared. This is fine for images but causes unwanted pixel values in masks. This function set the default order to 0, which disables the interpolation.

Parameters:kwargs (kwargs) – Dictionary with keyword arguments.
shear(image, shear_factor, **kwargs)[source]

Shear image.

For details see: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.AffineTransform

>>> image = np.eye(3, dtype='uint8')
>>> rotated = rotate(image, 45)
Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • shear_factor (float) – Shear factor [0, 1]
  • kwargs (kwargs) – Keyword arguments for the underlying scikit-image warp function, e.g. order=1 for linear interpolation.
Returns:

Sheared image

Return type:

numpy array with range [0,255] and dtype ‘uint8’

translate(image, dx, dy, **kwargs)[source]

Shift image horizontally and vertically

>>> image = np.eye(3, dtype='uint8') * 255
>>> translate(image, 2, 1)
array([[  0,   0,   0],
       [  0,   0, 255],
       [  0,   0,   0]], dtype=uint8)
Parameters:
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
  • dx – horizontal translation in pixels
  • dy – vertical translation in pixels
  • kwargs (kwargs) – Keyword arguments for the underlying scikit-image rotate function, e.g. order=1 for linear interpolation.
Returns:

translated image

Return type:

numpy array with range [0,255] and dtype ‘uint8’

nutsml.logger module

class LogCols(filepath, cols=None, colnames=None, reset=True, delimiter=', ')[source]

Bases: nutsml.logger.LogToFile

__init__(filepath, cols=None, colnames=None, reset=True, delimiter=', ')[source]

Construct logger.

>>> from __future__ import print_function
>>> from nutsflow import Consume
>>> filepath = 'tests/data/temp_logfile.csv'
>>> data = [[1, 2], [3, 4]]
>>> with LogToFile(filepath) as logtofile:
...     data >> logtofile >> Consume()
>>> print(open(filepath).read())
1,2
3,4
>>> logtofile = LogToFile(filepath, cols=(1, 0), colnames=['a', 'b'])
>>> data >> logtofile >> Consume()
>>> print(open(filepath).read())
a,b
2,1
4,3

>>> logtofile.close()
>>> logtofile.delete()
Parameters:
  • filepath (string) – Path to file to write log to.
  • cols (int|tuple|None) – Indices of columns of input data to write. None: write all columns int: only write the single given column tuple: list of column indices
  • colnames (tuple|None) – Column names to write in first line. If None no colnames are written.
  • reset (bool) – If True the writing to the log file is reset if the logger is recreated. Otherwise log data is appended to the log file.
  • delimiter (str) – Delimiter for columns in log file.
class LogToFile(filepath, cols=None, colnames=None, reset=True, delimiter=', ')[source]

Bases: nutsflow.base.NutFunction

Log columns of data to file.

__call__(x)[source]

Log x

Parameters:x (any) – Any type of data. Special support for numpy arrays.
Returns:Return input unchanged
Return type:Same as input
__init__(filepath, cols=None, colnames=None, reset=True, delimiter=', ')[source]

Construct logger.

>>> from __future__ import print_function
>>> from nutsflow import Consume
>>> filepath = 'tests/data/temp_logfile.csv'
>>> data = [[1, 2], [3, 4]]
>>> with LogToFile(filepath) as logtofile:
...     data >> logtofile >> Consume()
>>> print(open(filepath).read())
1,2
3,4
>>> logtofile = LogToFile(filepath, cols=(1, 0), colnames=['a', 'b'])
>>> data >> logtofile >> Consume()
>>> print(open(filepath).read())
a,b
2,1
4,3

>>> logtofile.close()
>>> logtofile.delete()
Parameters:
  • filepath (string) – Path to file to write log to.
  • cols (int|tuple|None) – Indices of columns of input data to write. None: write all columns int: only write the single given column tuple: list of column indices
  • colnames (tuple|None) – Column names to write in first line. If None no colnames are written.
  • reset (bool) – If True the writing to the log file is reset if the logger is recreated. Otherwise log data is appended to the log file.
  • delimiter (str) – Delimiter for columns in log file.
close()[source]

Implementation of context manager API

delete()[source]

Delete log file

nutsml.network module

class EvalNut(*args, **kwargs)[source]

Bases: nutsflow.base.NutSink

batches >> EvalNut(network, metrics)

Create nut to evaluate network performance for given metrics. Returned when network.evaluate() is called.

Parameters:
  • over batches batches (iterable) – Batches to evaluate
  • network (nutmsml.Network) –
  • of functions metrics (list) – List of functions that compute some metric, e.g. accuracy, F1, kappa-score. Each metric function must take vectors with true and predicted classes/probabilities and must compute the metric over the entire input (not per sample/mini-batch).
  • compute (function) – Function of the form f(metric, targets, preds) that computes the given metric (e.g. mean accuracy) for the given targets and predictions.
  • predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.
Returns:

Result(s) of evaluation, e.g. accuracy, precision, …

Return type:

float or tuple of floats if there is more than one metric

__rrshift__(iterable)
class KerasNetwork(model, weightspath='weights_keras_net.hd5')[source]

Bases: nutsml.network.Network

Wrapper for Keras models: https://keras.io/

__init__(model, weightspath='weights_keras_net.hd5')[source]

Construct wrapper around Keras model.

Parameters:
evaluate(metrics, predcol=None)[source]

Evaluate performance of network for given metrices

>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])  
Parameters:
  • metric (list) – List of metrics. See EvalNut for details.
  • predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.
  • targetcol (int) – Index of batch column that contain targets.
Returns:

Result for each metric as a tuple or a single float if there is only one metric.

load_weights(weightspath=None)[source]

Load network weights.

network.load_weights()
Parameters:weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
predict(flatten=True)[source]

Get network predictions

>>> predictions = samples >> batcher >> network.predict() >> Collect()  
Parameters:flatten (bool) – True: return individual predictions instead of batch of prediction
Returns:Typically returns softmax class probabilities.
Return type:ndarray
print_layers()[source]

Print description of the network layers

save_weights(weightspath=None)[source]

Save network weights.

network.save_weights()
Parameters:weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
train(**kwargs)[source]

Train network

>>> train_losses = samples >> batcher >> network.train() >> Collect()  
Returns:Typically returns training loss per batch.
validate(**kwargs)[source]

Validate network

>>> val_losses = samples >> batcher >> network.validate() >> Collect()  
Returns:Typically returns validation loss per batch.
class LasagneNetwork(out_layer, train_fn, val_fn, pred_fn, weightspath='weights_lasagne_net.npz')[source]

Bases: nutsml.network.Network

Wrapper for Lasagne models: https://lasagne.readthedocs.io/en/latest/

__init__(out_layer, train_fn, val_fn, pred_fn, weightspath='weights_lasagne_net.npz')[source]

Construct wrapper around Lasagne network.

Parameters:
  • layer out_layer (Lasgane) – Output layer of Lasagne network.
  • function train_fn (Theano) – Training function
  • function val_fn (Theano) – Validation function
  • function pred_fn (Theano) – Prediction function
  • weightspath (string) – Filepath to save/load model weights.
evaluate(metrics, predcol=None, targetcol=-1)[source]

Evaluate performance of network for given metrices

>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])  
Parameters:
  • metric (list) – List of metrics. See EvalNut for details.
  • predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.
  • targetcol (int) – Index of batch column that contain targets.
Returns:

Result for each metric as a tuple or a single float if there is only one metric.

load_weights(weightspath=None)[source]

Load network weights.

network.load_weights()
Parameters:weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
predict(flatten=True)[source]

Get network predictions

>>> predictions = samples >> batcher >> network.predict() >> Collect()  
Parameters:flatten (bool) – True: return individual predictions instead of batch of prediction
Returns:Typically returns softmax class probabilities.
Return type:ndarray
print_layers()[source]

Print description of the network layers

save_weights(weightspath=None)[source]

Save network weights.

network.save_weights()
Parameters:weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
train(**kwargs)[source]

Train network

>>> train_losses = samples >> batcher >> network.train() >> Collect()  
Returns:Typically returns training loss per batch.
validate(**kwargs)[source]

Validate network

>>> val_losses = samples >> batcher >> network.validate() >> Collect()  
Returns:Typically returns validation loss per batch.
class Network(weightspath)[source]

Bases: object

Abstract base class for networks. Allows to wrap existing network APIs such as Lasagne or Keras into an API that enables direct usage of the network as a Nut in a nuts flow.

__init__(weightspath)[source]

Constructs base wrapper for networks.

Parameters:weightspath (string) – Filepath where network weights are saved to and loaded from.
evaluate(metrics, predcol=None, targetcol=-1)[source]

Evaluate performance of network for given metrices

>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])  
Parameters:
  • metric (list) – List of metrics. See EvalNut for details.
  • predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.
  • targetcol (int) – Index of batch column that contain targets.
Returns:

Result for each metric as a tuple or a single float if there is only one metric.

load_weights(weightspath=None)[source]

Load network weights.

network.load_weights()
Parameters:weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
predict(flatten=True)[source]

Get network predictions

>>> predictions = samples >> batcher >> network.predict() >> Collect()  
Parameters:flatten (bool) – True: return individual predictions instead of batch of prediction
Returns:Typically returns softmax class probabilities.
Return type:ndarray
print_layers()[source]

Print description of the network layers

save_best(score, isloss=True)[source]

Save weights of best network

Parameters:
  • score (float) – Score of the network, e.g. loss, accuracy
  • isloss (bool) – True means lower score is better, e.g. loss and the network with the lower score score is saved.
save_weights(weightspath=None)[source]

Save network weights.

network.save_weights()
Parameters:weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
train()[source]

Train network

>>> train_losses = samples >> batcher >> network.train() >> Collect()  
Returns:Typically returns training loss per batch.
validate()[source]

Validate network

>>> val_losses = samples >> batcher >> network.validate() >> Collect()  
Returns:Typically returns validation loss per batch.
class PredictNut(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

batches >> PredictNut(func)

Create nut to perform network predictions.

Parameters:
  • over batches batches (iterable) – Batches to create predictions for.
  • func (function) – Prediction function
  • flatten (bool) – True: flatten output. Instead of returning batch of predictions return individual predictions
Returns:

Result(s) of prediction

Return type:

typically array with class probabilities (softmax vector)

__rrshift__(iterable)
class TrainValNut(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

batches >> TrainValNut(func, **kwargs)

Create nut to train or validate a network.

Parameters:
  • over batches batches (iterable) – Batches to train/validate.
  • func (function) – Training or validation function of network.
  • kwargs (kwargs) – Keyword arguments passed on to function.
Returns:

Result(s) of training/validation function, e.g. loss, accuracy, …

Return type:

float or array/tuple of floats

__rrshift__(iterable)

nutsml.plotter module

class PlotLines(ycols, xcols=None, layout=(1, None), titles=None, every_sec=0, every_n=0, filterfunc=<function PlotLines.<lambda>>, figsize=None, filepath=None)[source]

Bases: nutsflow.base.NutFunction

Plot line graph for selected data columns.

__call__(data)[source]

Plot data

__init__(ycols, xcols=None, layout=(1, None), titles=None, every_sec=0, every_n=0, filterfunc=<function PlotLines.<lambda>>, figsize=None, filepath=None)[source]

iterable >> PlotLines(ycols) >> Consume()

>>> import os
>>> import numpy as np
>>> from nutsflow import Consume
>>> fp = 'tests/data/temp_plotter.png'
>>> xs = np.arange(0, 6.3, 1.2)
>>> ysin, ycos = np.sin(xs),  np.cos(xs)
>>> data = zip(xs, ysin, ycos)
>>> data >> PlotLines(1, 0, filepath=fp) >> Consume()
>>> list(ycos) >> PlotLines(0, filepath=fp) >> Consume()
>>> data >> PlotLines(ycols=(1,2), filepath=fp) >> Consume()
>>> ysin.tolist() >> PlotLines(ycols=None, filepath=fp) >> Consume()
>>> if os.path.exists(fp): os.remove(fp)
Parameters:
  • ycols (int|tuple|None) – Index or tuple of indices of the data columns that contain the y-data for the plot. If None data is used directly.
  • xcols (int|tuple|function|iterable|None) – Index or tuple of indices of the data columns that contain the x-data for the plot. Alternatively an iterator or a function can be provided that generates the x-data for the plot, e.g. xcols = itertools.count() or xcols = lambda: epoch For xcols==None, itertools.count() will be used.
  • layout (tuple) – Rows and columns of the plotter layout., e.g. a layout of (2,3) means that 6 plots in the data are arranged in 2 rows and 3 columns. Number of cols can be None is then derived from ycols
  • every_sec (float) – Plot every given second, e.g. to plot every 2.5 sec every_sec = 2.5
  • every_n (int) – Plot every n-th call.
  • filterfunc (function) – Boolean function to filter plot data.
  • figsize (tuple) – Figure size in inch.
  • filepath – Path to a file to draw plot to. If provided the plot will not appear on the screen.
Returns:

Returns input unaltered

Return type:

any

reset()[source]

Reset plot data

nutsml.reader module

class DplyToList(*args, **kwargs)[source]

Bases: nutsflow.base.NutFunction

Convert DplyDataframe to list.

See: https://github.com/dodger487/dplython

>>> ReadPandas(fpath).dplyr() >> DplyToList() >> Collect()  
Parameters:dplyframe (DplyDataframe) – Dataframe.
Returns:List of dataframe rows
Return type:list of tuples
__call__(element)
class ReadImage(*args, **kwargs)[source]

Bases: nutsflow.base.NutFunction

Load images for samples.

Loads images in jpg, gif, png, tif and bmp format. Images are returned as numpy arrays of shape (h, w, c) or (h, w) for color images or gray scale images respectively. See nutsml.imageutil.load_image for details.

Note that the loaded images replace the image file name|path in the sample. If the images file paths are directly proved (not as a tuple sample) still tuples with the loaded image are returned.

>>> from nutsflow import Consume, Collect
>>> from nutsml import PrintColType
>>> images = ['tests/data/img_formats/nut_color.gif']
>>> images >> ReadImage(None) >> PrintColType() >> Consume()
item 0: <tuple>
  0: <ndarray> shape:213x320x3 dtype:uint8 range:0..255
>>> samples = [('tests/data/img_formats/nut_color.gif', 'class0')]
>>> img_samples = samples >> ReadImage(0) >> Collect()
>>> imagepath = 'tests/data/img_formats/*.jpg'
>>> samples = [(1, 'nut_color'), (2, 'nut_grayscale')]
>>> samples >> ReadImage(1, imagepath) >> PrintColType() >> Consume()
item 0: <tuple>
  0: <int> 1
  1: <ndarray> shape:213x320x3 dtype:uint8 range:0..248
item 1: <tuple>
  0: <int> 2
  1: <ndarray> shape:213x320 dtype:uint8 range:18..235
>>> pathfunc = lambda sample: 'tests/data/img_formats/{1}.jpg'.format(*sample)
>>> img_samples = samples >> ReadImage(1, pathfunc) >> Collect()
Parameters:
  • sample (tuple|list) – (‘nut_color’, 1)
  • columns (None|int|tuple) – Indices of columns in sample to be replaced by image (based on image id in that column) If None then a flat samples is assumed and a tuple with the image is returned.
  • pathfunc (string|function|None) – Filepath with wildcard ‘*’, which is replaced by the imageid provided in the sample, e.g. ‘tests/data/img_formats/*.jpg’ for sample (‘nut_grayscale’, 2) will become ‘tests/data/img_formats/nut_grayscale.jpg’ or Function to compute path to image file from sample, e.g. lambda sample: ‘tests/data/img_formats/{1}.jpg’.format(*sample) or None, in this case the image id is take as filepath.
  • as_grey (bool) – If true, load as grayscale image.
  • dtype (dtype) – Numpy data type of the image.
Returns:

Sample with image ids replaced by image (=ndarray) of shape (h, w, c) or (h, w)

Return type:

tuple

__call__(element)
class ReadLabelDirs(*args, **kwargs)[source]

Bases: nutsflow.base.NutSource

Read file paths from label directories.

Typically used when classification data is organized in folders, where the folder name represents the class label and the files in the folder the data samples (images, documents, …) for that class.

>>> from __future__ import print_function
>>> from nutsflow import Sort
>>> read = ReadLabelDirs('tests/data/labeldirs', '*.txt')
>>> samples = read >> Sort()
>>> for sample in samples:
...     print(sample)
...
('tests/data/labeldirs/0/test0.txt', '0')
('tests/data/labeldirs/1/test1.txt', '1')
('tests/data/labeldirs/1/test11.txt', '1')
Parameters:
  • basedir (string) – Path to folder that contains label directories.
  • filepattern (string) – Pattern for filepaths to read from label directories, e.g. ‘.jpg’, ‘.txt’
  • exclude (string) – Pattern for label directories to exclude. Default is ‘_*’ which excludes all label folders prefixed with ‘_’.
Returns:

iterator over labeled file paths

Return type:

iterator

class ReadPandas(filepath, rows=None, columns=None, dropnan=True, replacenan=False, **kwargs)[source]

Bases: nutsflow.base.NutSource

Read data as Pandas table from file system.

__init__(filepath, rows=None, columns=None, dropnan=True, replacenan=False, **kwargs)[source]

Create reader for Pandas tables.

>>> from nutsflow import Collect
>>> ReadPandas('tests/data/pandas_table.csv') >> Collect()
[(1.0, 4.0), (3.0, 6.0)]

Note that samples.dataframe contains the original Pandas dataframe and any Pandas operations can be performed on it.

>>> samples = ReadPandas('tests/data/pandas_table.csv')
>>> samples.dataframe.head()
   col1  col2
0     1   4.0
1     2   NaN
2     3   6.0
>>> samples = ReadPandas('tests/data/pandas_table.csv')
>>> samples.dataframe.columns.values.tolist()
['col1', 'col2']
Parameters:
  • filepath (str) – Path to a table in CSV, TSV, XLSX or Pandas pickle format. Depending of file extension (e.g. .csv) the table format is picked. Note tables must have a header with the column names or use kwarg header=None
  • rows (str) – Rows to filter. Any Pandas filter expression. If rows = None all rows of the table are returned.
  • columns (list) – List of names for the table columns to return. For columns = None all columns are returned.
  • dropnan (bool) – If True all rows that contain NaN are dropped.
  • replacenan (object) – If not False all NaNs are replaced by the value of replacenan
  • kwargs (kwargs) – Key word arguments passed on the the Pandas methods for data reading, e.g, header=None. See pandas/pandas/io/parsers.py for detais
dply()[source]

Return dplyr frame for the read table.

dplyr is an R inspired wrapper to process Pandas tables in a flow-like manner. See https://github.com/dodger487/dplython and https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html for more details about dplyr.

dplyr and nuts-ml use the same syntax (>>) for chaining functions and integrate nicely with each other.

Returns:dplyr dataframe instead of Pandas dataframe.
Return type:DplyFrame
static isnull(value)[source]

Return true if values is NaN or None.

>>> import numpy as np
>>> ReadPandas.isnull(np.NaN)
True
>>> ReadPandas.isnull(None)
True
>>> ReadPandas.isnull(0)
False
Parameters:value – Value to test
Returns:Return true for NaN or None values.
Return type:bool

nutsml.stratify module

class CollectStratified(*args, **kwargs)[source]

Bases: nutsflow.base.NutSink

iterable >> CollectStratified(labelcol, mode=’downrnd’, container=list,
rand=rnd.Random())

Collects samples in a container and stratifies them by either randomly down-sampling classes or up-sampling classes by duplicating samples.

>>> from nutsflow import Collect
>>> samples = [('pos', 1), ('pos', 1), ('neg', 0)]
>>> samples >> CollectStratified(1) >> Sort()
[('neg', 0), ('pos', 1)]
Parameters:
  • over tuples iterable (iterable) – Iterable of tuples where column labelcol contains a sample label that is used for stratification
  • labelcol (int) – Column of tuple/samples that contains the label
  • mode (string) – ‘downrnd’ : randomly down-sample ‘up’ : up-sample
  • container (container) – Some container, e.g. list, set, dict that can be filled from an iterable
  • rand (Random|None) – Random number generator used for sampling. If None, random.Random() is used.
Returns:

Stratified samples

Return type:

List of tuples

__rrshift__(iterable)
class Stratify(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

iterable >> Stratify(labelcol, labeldist, rand=None)

Stratifies samples by randomly down-sampling according to the given label distribution. In detail: samples belonging to the class with the smallest number of samples are returned with probability one. Samples from other classes are randomly down-sampled to match the number of samples in the smallest class.

Note that in contrast to SplitRandom, which generates the same random split per default, Stratify generates different stratifications. Furthermore, while the downsampling is random the order of samples remains the same!

While labeldist needs to be provided or computed upfront the actual stratification occurs online and only one sample per time is stored in memory.

>>> from nutsflow import Collect, CountValues
>>> from nutsflow.common import StableRandom
>>> fix = StableRandom(1)  # Stable random numbers for doctest
>>> samples = [('pos', 1), ('pos', 1), ('neg', 0)]
>>> labeldist = samples >> CountValues(1)
>>> samples >> Stratify(1, labeldist, rand=fix) >> Sort()
[('neg', 0), ('pos', 1)]
Parameters:
  • over tuples iterable (iterable) – Iterable of tuples where column labelcol contains a sample label that is used for stratification
  • labelcol (int) – Column of tuple/samples that contains the label,
  • labeldist (dict) – Dictionary with numbers of different labels, e.g. {‘good’:12, ‘bad’:27, ‘ugly’:3}
  • rand (Random|None) – Random number generator used for down-sampling. If None, random.Random() is used.
Returns:

Stratified samples

Return type:

Generator over tuples

__rrshift__(iterable)

nutsml.transformer module

class AugmentImage(imagecols, rand=None)[source]

Bases: nutsflow.base.Nut

Random augmentation of images in samples

__init__(imagecols, rand=None)[source]

samples >> AugmentImage(imagecols, rand=None)

Randomly augment images, e.g. changing contrast. See TransformImage for a full list of available augmentations. Every transformation can be used as an augmentation. Note that the same (random) augmentation is applied to all images specified in imagecols. This ensures that an image and its mask are randomly rotated by the same angle, for instance.

>>> augment_img = (AugmentImage(0)
...     .by('identical', 1.0)
...     .by('brightness', 0.5, [0.7, 1.3])
...     .by('contrast', 0.5, [0.7, 1.3])
...     .by('fliplr', 0.5)
...     .by('flipud', 0.5)
...     .by('occlude', 0.5, [0, 1], [0, 1],[0.1, 0.5], [0.1, 0.5])
...     .by('rotate', 0.5, [0, 360]))

See nutsml.transformer.TransformImage.by() for full list of available augmentations.

Note that each augmentation is applied independently. This is in contrast to transformations which are applied in sequence and result in one image. Augmentation on the other hand are randomly applied and can result in many images. However, augmenters can be chained to achieve combinations of augmentation, e.g. contrast or brightness combined with rotation or shearing:

>>> augment1 = (AugmentImage(0)
...     .by('brightness', 0.5, [0.7, 1.3])
...     .by('contrast', 0.5, [0.7, 1.3]))
>>> augment2 = (AugmentImage(0)
...     .by('shear', 0.5, [0, 0.2])
...     .by('rotate', 0.5, [0, 360]))
>>> samples >> augment1 >> augment2 >> Consume()  
Parameters:
  • imagecols (int|tuple) – Indices of sample columns that contain images.
  • rand (Random|None) – Random number generator. If None, random.Random() is used.
__rrshift__(iterable)[source]

Apply augmentation to samples in iterable.

Parameters:iterable (iterable) – Samples
Returns:iterable with augmented samples
Return type:generator
by(name, prob, *ranges, **kwargs)[source]

Specify and add augmentation to be performed.

>>> augment_img = AugmentImage(0).by('rotate', 0.5, [0, 360])
Parameters:
  • name (string) – Name of the augmentation/transformation, e.g. ‘rotate’
  • prob (float|int) – If prob <= 1: probability [0,1] that the augmentation is applied If prob > 1: number of times augmentation is applied.
  • of lists ranges (list) –

    Lists with ranges for each argument of the augmentation, e.g. [0, 360] degrees, where parameters are

    randomly sampled from.
  • kwargs (kwargs) – Keyword arguments passed on the the augmentation.
Returns:

instance of AugmentImage

Return type:

AugmentImage

class ImageAnnotationToMask(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

samples >> ImageAnnotationToMask(imagecol, annocol)

Create mask for image annotation. Annotation are of the following formats. See imageutil.annotation2coords for details. (‘point’, ((x, y), … )) (‘circle’, ((x, y, r), …)) (‘rect’, ((x, y, w, h), …)) (‘polyline’, (((x, y), (x, y), …), …))

>>> import numpy as np
>>> from nutsflow import Collect
>>> img = np.zeros((3, 3), dtype='uint8')
>>> anno = ('point', ((0, 1), (2, 0)))
>>> samples = [(img, anno)]
>>> masks = samples >> ImageAnnotationToMask(0, 1) >> Collect()
>>> print(masks[0][1])
[[  0   0 255]
 [255   0   0]
 [  0   0   0]]
Parameters:
  • iterable (iterable) – Samples with images and annotations
  • imagecol (int) – Index of sample column that contain image
  • annocol (int) – Index of sample column that contain annotation
Returns:

Iterator over samples where annotations are replaced by masks

Return type:

generator

__rrshift__(iterable)
class ImageChannelMean(imagecol, filepath='image_channel_means.npy', means=None)[source]

Bases: nutsflow.base.NutFunction

Compute, save per-channel means over images and subtract from images.

__call__(sample)[source]

Subtract per-channel mean from images in samples.

sub_mean = ImageChannelMean(imagecol, filepath=’means.npy’) samples >> sub_mean >> Consume()

sub_mean = ImageChannelMean(imagecol, means=[197, 87, 101]) samples >> sub_mean >> Consume()

Parameters:sample (tuple) – Sample that contains an image (at imagecol).
Returns:Sample with image where mean is subtracted. Note that image will not be of dtype uint8 and in range [0,255] anymore!
Return type:tuple
__init__(imagecol, filepath='image_channel_means.npy', means=None)[source]
samples >> ImageChannelMean(imagecol,
filepath=’image_channel_means.npy’, means=None)

Construct ImageChannelMean nut.

Parameters:
  • imagecol (int) – Index of sample column that contain image
  • filepath (string) – Path to file were mean values are saved and loaded from.
  • means (list|tuple) – Mean values can be provided directly. In this case filepath will be ignored and training is not necessary.
train()[source]

Compute per-channel mean over images in samples.

sub_mean = ImageChannelMean(imagecol, filepath) samples >> sub_mean.train() >> Consume()

Returns:Input samples are returned unchanged
Return type:tuple
class ImageMean(imagecol, filepath='image_means.npy')[source]

Bases: nutsflow.base.NutFunction

Compute, save mean over images and subtract from images.

__call__(sample)[source]

Subtract mean from images in samples.

sub_mean = ImageMean(imagecol, filepath) samples >> sub_mean >> Consume()

Parameters:sample (tuple) – Sample that contains an image (at imagecol).
Returns:Sample with image where mean is subtracted. Note that image will not be of dtype uint8 and in range [0,255] anymore!
Return type:tuple
__init__(imagecol, filepath='image_means.npy')[source]

samples >> ImageMean(imagecol, filepath=’image_means.npy’)

Construct ImageMean nut.

Parameters:
  • imagecol (int) – Index of sample column that contain image
  • filepath (string) – Path to file were mean values are saved and loaded from.
train()[source]

Compute mean over images in samples.

sub_mean = ImageMean(imagecol, filepath) samples >> sub_mean.train() >> Consume()

Returns:Input samples are returned unchanged
Return type:tuple
class ImagePatchesByAnnotation(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

samples >> ImagePatchesByAnnotation(imagecol, annocol, pshape, npos,
nneg=lambda npos: npos, pos=255, neg=0, retlabel=True)

Randomly sample positive/negative patches from image based on annotation. See imageutil.annotation2coords for annotation format. A patch is positive if its center point is within the annotated region and is negative otherwise.

>>> import numpy as np
>>> np.random.seed(0)    # just to ensure stable doctest
>>> img = np.reshape(np.arange(25), (5, 5))
>>> anno = ('point', ((3, 2), (2, 3),))
>>> samples = [(img, anno)]
>>> getpatches = ImagePatchesByAnnotation(0, 1, (3, 3), 1, 1)
>>> for (p, l) in samples >> getpatches:
...     print(p.tolist(), l)
[[12, 13, 14], [17, 18, 19], [22, 23, 24]] 0
[[11, 12, 13], [16, 17, 18], [21, 22, 23]] 1
[[7, 8, 9], [12, 13, 14], [17, 18, 19]] 1
Parameters:
  • iterable (iterable) – Samples with images
  • imagecol (int) – Index of sample column that contain image
  • annocol (int) – Index of sample column that contain annotation
  • pshape (tuple) – Shape of patch
  • npos (int) – Number of positive patches to sample
  • nneg (int|function) – Number of negative patches to sample or a function hat returns the number of negatives based on number of positives.
  • pos (int) – Mask value indicating positives
  • neg (int) – Mask value indicating negatives
  • retlabel (bool) – True return label, False return mask patch
Returns:

Iterator over samples where images are replaced by image patches and masks are replaced by labels [0,1] or mask patches

Return type:

generator

__rrshift__(iterable)
class ImagePatchesByMask(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

samples >> ImagePatchesByMask(imagecol, maskcol, pshape, npos,
nneg=lambda npos: npos, pos=255, neg=0, retlabel=True)

Randomly sample positive/negative patches from image based on mask.

A patch is positive if its center point has the value ‘pos’ in the mask (corresponding to the input image) and is negative for value ‘neg’ The mask must be of same size as image.

>>> 
>>> import numpy as np
>>> np.random.seed(0)    # just to ensure stable doctest
>>> img = np.reshape(np.arange(25), (5, 5))
>>> mask = np.eye(5, dtype='uint8') * 255
>>> samples = [(img, mask)]
>>> getpatches = ImagePatchesByMask(0, 1, (3, 3), 2, 1)
>>> for (p, l) in samples >> getpatches:
...     print(p.tolist(), l)
[[10, 11, 12], [15, 16, 17], [20, 21, 22]] 0
[[12, 13, 14], [17, 18, 19], [22, 23, 24]] 1
[[6, 7, 8], [11, 12, 13], [16, 17, 18]] 1
>>> np.random.seed(0)    # just to ensure stable doctest
>>> patches = ImagePatchesByMask(0, 1, (3, 3), 1, 1, retlabel=False)
>>> for (p, m) in samples >> getpatches:
...     print(p.tolist(), l)
[[10, 11, 12], [15, 16, 17], [20, 21, 22]] 1
[[12, 13, 14], [17, 18, 19], [22, 23, 24]] 1
[[6, 7, 8], [11, 12, 13], [16, 17, 18]] 1
Parameters:
  • iterable (iterable) – Samples with images
  • imagecol (int) – Index of sample column that contain image
  • maskcol (int) – Index of sample column that contain mask
  • pshape (tuple) – Shape of patch
  • npos (int) – Number of positive patches to sample
  • nneg (int|function) – Number of negative patches to sample or a function hat returns the number of negatives based on number of positives.
  • pos (int) – Mask value indicating positives
  • neg (int) – Mask value indicating negatives
  • retlabel (bool) – True return label, False return mask patch
Returns:

Iterator over samples where images are replaced by image patches and masks are replaced by labels [0,1] or mask patches

Return type:

generator

__rrshift__(iterable)
class RandomImagePatches(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

samples >> RandomImagePatches(imagecols, shape, npatches)

Extract patches at random locations from images.

>>> import numpy as np
>>> np.random.seed(0)    # just to ensure stable doctest
>>> img = np.reshape(np.arange(30), (5, 6))
>>> samples = [(img, 0)]
>>> getpatches = RandomImagePatches(0, (2, 3), 3)
>>> for (p, l) in samples >> getpatches:
...     print(p.tolist(), l)
[[7, 8, 9], [13, 14, 15]] 0
[[8, 9, 10], [14, 15, 16]] 0
[[8, 9, 10], [14, 15, 16]] 0
Parameters:
  • iterable (iterable) – Samples with images
  • imagecols (int|tuple) – Indices of sample columns that contain images, where patches are extracted from. Images must be numpy arrays of shape h,w,c or h,w
  • shape (tuple) – Shape of patch (h,w)
  • npatches (int) – Number of patches to extract (per image)
Returns:

Iterator over samples where images are replaced by patches.

Return type:

generator

__rrshift__(iterable)
class RegularImagePatches(*args, **kwargs)[source]

Bases: nutsflow.base.Nut

samples >> RegularImagePatches(imagecols, shape, stride)

Extract patches in a regular grid from images.

>>> import numpy as np
>>> img = np.reshape(np.arange(12), (3, 4))
>>> samples = [(img, 0)]
>>> getpatches = RegularImagePatches(0, (2, 2), 2)
>>> for p in samples >> getpatches:
...     print(p)
(array([[0, 1],
       [4, 5]]), 0)
(array([[2, 3],
       [6, 7]]), 0)
Parameters:
  • iterable (iterable) – Samples with images
  • imagecols (int|tuple) – Indices of sample columns that contain images, where patches are extracted from. Images must be numpy arrays of shape h,w,c or h,w
  • shape (tuple) – Shape of patch (h,w)
  • stride (int) – Step size of grid patches are extracted from
Returns:

Iterator over samples where images are replaced by patches.

Return type:

generator

__rrshift__(iterable)
class TransformImage(imagecols)[source]

Bases: nutsflow.base.NutFunction

Transformation of images in samples.

__call__(sample)[source]

Apply transformation to sample.

Parameters:sample (tuple) – Sample
Returns:Transformed sample
Return type:tuple
__init__(imagecols)[source]

samples >> TransformImage(imagecols)

Images are expected to be numpy arrays of the shape (h, w, c) or (h, w) with a range of [0,255] and a dtype of uint8. Transformation should result in images with the same properties.

>>> transform = TransformImage(0).by('resize', 10, 20)
Parameters:
  • imagecols (int|tuple) – Indices of sample columns the transformation should be applied to. Can be a single index or a tuple of indices.
  • transspec (tuple) – Transformation specification. Either a tuple with the name of the transformation function or a tuple with the name, arguments and keyword arguments of the transformation function. The list of argument values and dictionaries provided in the transspec are simply passed on to the transformation function. See the relevant functions for details.
by(name, *args, **kwargs)[source]

Specify and add transformations to be performed.

>>> transform = TransformImage(0).by('resize', 10, 20).by('fliplr')
Available transformations:
rerange (old_min, old_max, new_min, new_max, dtype)
crop (x1, y1, x2, y2)
resize (w, h)
translate (dx, dy)
rotate (angle)
contrast (contrast)
sharpness (sharpness)
brightness (brightness)
color (color)
shear (shear_factor)
elastic (smooth, scale, seed)
occlude (x, y, w, h)
Parameters:
  • name (string) – Name of the transformation to apply, e.g. ‘resize’
  • args (args) – Arguments for the transformation, e.g. width and height for resize.
  • kwargs (kwargs) – Keyword arguments passed on to the transformation
Returns:

instance of TransformImage with added transformation

Return type:

TransformImage

classmethod register(name, transformation)[source]

Register new transformation function.

>>> brighter = lambda image, c: image * c
>>> TransformImage.register('brighter', brighter)
>>> transform = TransformImage(0).by('brighter', 1.5)
Parameters:
  • name (string) – Name of transformation
  • transformation (function) – Transformation function.
transformations = {'brightness': <function change_brightness at 0x00000000096360D0>, 'color': <function change_color at 0x00000000096361E0>, 'contrast': <function change_contrast at 0x0000000009636048>, 'crop': <function crop at 0x000000000962CC80>, 'crop_center': <function crop_center at 0x000000000962CD08>, 'crop_square': <function crop_square at 0x000000000962CD90>, 'elastic': <function distort_elastic at 0x00000000096366A8>, 'fliplr': <function fliplr at 0x0000000009636598>, 'flipud': <function flipud at 0x0000000009636620>, 'gray2rgb': <function gray2rgb at 0x0000000009636268>, 'identical': <function identical at 0x000000000962CBF8>, 'normalize_histo': <function normalize_histo at 0x000000000962CEA0>, 'occlude': <function occlude at 0x000000000962CE18>, 'rerange': <function rerange at 0x000000000962CB70>, 'resize': <function resize at 0x0000000009636488>, 'rgb2gray': <function rgb2gray at 0x00000000096362F0>, 'rotate': <function rotate at 0x0000000009636400>, 'sharpness': <function change_sharpness at 0x0000000009636158>, 'shear': <function shear at 0x0000000009636510>, 'translate': <function translate at 0x0000000009636378>}
map_transform(sample, imagecols, spec)[source]

Map transformation function on columns of sample.

Parameters:
  • sample (tuple) – Sample with images
  • imagecols (int|tuple) – Indices of sample columns the transformation should be applied to. Can be a single index or a tuple of indices.
  • spec (tuple) – Transformation specification. Either a tuple with the name of the transformation function or a tuple with the name, arguments and keyword arguments of the transformation function.
Returns:

Sample with transformations applied. Columns not specified remain unchained.

Return type:

tuple

nutsml.viewer module

class PrintColType(cols=None)[source]

Bases: nutsflow.base.NutFunction

__call__(data)[source]

Print data info.

Parameters:data (any) – Any type of iterable
Returns:data unchanged
Return type:same as data
__init__(cols=None)[source]

iterable >> PrintColType()

Print type and other information for columns in data.

>>> from nutsflow import Consume
>>> data = [(np.zeros((10, 20, 3)), 1), ('text', 2), 3]
>>> data >> PrintColType() >> Consume()
item 0: <tuple>
  0: <ndarray> shape:10x20x3 dtype:float64 range:0.0..0.0
  1: <int> 1
item 1: <tuple>
  0: <str> text
  1: <int> 2
item 2: <int>
  0: <int> 3
>>> [(1, 2), (3, 4)] >> PrintColType(1) >> Consume()
item 0: <tuple>
  1: <int> 2
item 1: <tuple>
  1: <int> 4
Parameters:cols (int|tuple|None) – Indices of columnbs to show info for. None means all columns. Can be a single index or a tuple of indices.
Returns:input data unchanged
Return type:same as data
class ViewImage(imgcols, layout=(1, None), figsize=None, pause=0.0001, **imargs)[source]

Bases: nutsflow.base.NutFunction

Display images in window.

__call__(data)[source]

View the images in data

Parameters:data (tuple) – Data with images at imgcols.
Returns:unchanged input data
Return type:tuple
__init__(imgcols, layout=(1, None), figsize=None, pause=0.0001, **imargs)[source]

iterable >> ViewImage(imgcols, layout=(1, None), figsize=None, **plotargs)

Images should be numpy arrays in one of the following formats:
MxN - luminance (grayscale, float array only)
MxNx3 - RGB (float or uint8 array)
MxNx4 - RGBA (float or uint8 array)

Shapes with single-dimension axis are supported but not encouraged, e.g. MxNx1 will be converted to MxN.

See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow

>>> from nutsflow import Consume
>>> from nutsml import ReadImage
>>> imagepath = 'tests/data/img_formats/*.jpg'
>>> samples = [(1, 'nut_color'), (2, 'nut_grayscale')]
>>> read_image = ReadImage(1, imagepath)
>>> samples >> read_image >> ViewImage(1) >> Consume() 
Parameters:
  • imgcols (int|tuple|None) – Index or tuple of indices of data columns containing images (ndarray). Use None if images are provided directly, e.g. [img1, img2, …] >> ViewImage(None) >> Consume()
  • layout (tuple) – Rows and columns of the viewer layout., e.g. a layout of (2,3) means that 6 images in the data are arranged in 2 rows and 3 columns. Number of cols can be None is then derived from imgcols
  • figsize (tuple) – Figure size in inch.
  • pause (float) – Waiting time in seconds after each plot. Pressing a key skips the waiting time.
  • imargs (kwargs) – Keyword arguments passed on to matplotlib’s imshow() function. See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow
class ViewImageAnnotation(imgcol, annocols, figsize=None, pause=0.0001, interpolation=None, **annoargs)[source]

Bases: nutsflow.base.NutFunction

Display images and annotation in window.

SHAPEPROP = {'edgecolor': 'y', 'facecolor': 'none', 'linewidth': 1}
TEXTPROP = {'backgroundcolor': (1, 1, 1, 0.5), 'edgecolor': 'k'}
__call__(data)[source]

View the image and its annotation

Parameters:data (tuple) – Data with image at imgcol and annotation at annocol.
Returns:unchanged input data
Return type:tuple
__init__(imgcol, annocols, figsize=None, pause=0.0001, interpolation=None, **annoargs)[source]
iterable >> ViewImageAnnotation(imgcol, annocols, figsize=None,
pause, interpolation, **annoargs)
Images must be numpy arrays in one of the following formats:
MxN - luminance (grayscale, float array only)
MxNx3 - RGB (float or uint8 array)
MxNx4 - RGBA (float or uint8 array)
See

Shapes with single-dimension axis are supported but not encouraged, e.g. MxNx1 will be converted to MxN.

Parameters:
  • imgcol (int) – Index of data column that contains the image
  • annocols (int|tuple) – Index or tuple of indices specifying the data column(s) that contain annotation (labels, or geometry)
  • figsize (tuple) – Figure size in inch.
  • pause (float) – Waiting time in seconds after each plot. Pressing a key skips the waiting time.
  • interpolation (string) – Interpolation for imshow, e.g. ‘nearest’, ‘bilinear’, ‘bicubic’. for details see http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow
  • annoargs (kwargs) – Keyword arguments for visual properties of annotation, e.g. edgecolor=’y’, linewidth=1

nutsml.writer module

class WriteImage(column, pathfunc, namefunc=None)[source]

Bases: nutsflow.base.NutFunction

Write images within samples.

__call__(sample)[source]

Return sample and write image within sample

__init__(column, pathfunc, namefunc=None)[source]

Write images within samples to file.

Writes jpg, gif, png, tif and bmp format depending on file extension. Images in samples are expected to be numpy arrays. See nutsml.util.load_image for details.

Folders on output file path are created if missing.

>>> from nutsml import ReadImage
>>> from nutsflow import Collect, Get, GetCols, Consume, Unzip
>>> samples = [('nut_color', 1), ('nut_grayscale', 2)]
>>> inpath = 'tests/data/img_formats/*.bmp'
>>> img_samples = samples >> ReadImage(0, inpath) >> Collect()
>>> imagepath = 'tests/data/test_*.bmp'
>>> names = samples >> Get(0) >> Collect()
>>> img_samples >> WriteImage(0, imagepath, names) >> Consume()
>>> imagepath = 'tests/data/test_*.bmp'
>>> names = samples >> Get(0) >> Collect()
>>> images = img_samples >> Get(0)
>>> images >> WriteImage(None, imagepath, names) >> Consume()
>>> imagepath = 'tests/data/test_*.bmp'
>>> namefunc = lambda sample: sample[1]
>>> (samples >> GetCols(0,0,1) >> ReadImage(0, inpath) >>
... WriteImage(0, imagepath, namefunc) >> Consume())
Parameters:
  • column (int|None) – Column in sample that contains image or take sample itself if column is None.
  • pathfunc (str|function) – Filepath with wildcard ‘*’, which is replaced by the name provided names e.g. ‘tests/data/img_formats/*.jpg’ for names = [‘nut_grayscale’] will become ‘tests/data/img_formats/nut_grayscale.jpg’ or Function to compute path to image file from sample and name, e.g. pathfunc=lambda sample, name: ‘tests/data/test_{}.jpg’.format(name)
  • namefunc (iterable|function|None) – Iterable over names to generate image paths from (length need to be the same as samples), or Function to compute filenames from sample, e.g. namefunc=lambda samples: sample[0] if None, Enumerate() is used.

Module contents