nutsml package

Submodules

nutsml.batcher module

class BuildBatch(batchsize, prefetch=1)[source]

Bases: nutsflow.base.Nut

Build batches for GPU-based neural network training.

__init__(batchsize, prefetch=1)[source]

iterable >> BuildBatch(batchsize, prefetch=1)

Take samples in iterable, extract specified columns, convert column data to numpy arrays of various types, aggregate converted samples into a batch.

The format of a batch is a list of lists: [[inputs], [outputs]] where inputs and outputs are Numpy arrays.

The following example uses PrintType() to print the shape of the batches constructed. This is useful for development and debugging but should be removed in production.

>>> from nutsflow import Collect, PrintType
>>> numbers = [4.1, 3.2, 1.1]
>>> images = [np.zeros((5, 3)), np.ones((5, 3)) , np.ones((5, 3))]
>>> class_ids = [1, 2, 1]
>>> samples = list(zip(numbers, images, class_ids))
>>> build_batch = (BuildBatch(batchsize=2)
...                .input(0, 'number', 'float32')
...                .input(1, 'image', np.uint8, True)
...                .output(2, 'one_hot', np.uint8, 3))
>>> batches = samples >> build_batch >> PrintType() >> Collect()
[[<ndarray> 2:float32, <ndarray> 2x1x5x3:uint8], [<ndarray> 2x3:uint8]]
[[<ndarray> 1:float32, <ndarray> 1x1x5x3:uint8], [<ndarray> 1x3:uint8]]

In the example above, we have multiple inputs and a single output, and the batch is of format [[number, image], [one_hot]], where each data element a Numpy array with the shown shape and dtype.

Sample columns can be ignored or reused. Assuming an autoencoder, one might whish to reuse the sample image as input and output:

>>> build_batch = (BuildBatch(2)
...                .input(1, 'image', np.uint8, True)
...                .output(1, 'image', np.uint8, True))
>>> batches = samples >> build_batch >> PrintType() >> Collect()
[[<ndarray> 2x1x5x3:uint8], [<ndarray> 2x1x5x3:uint8]]
[[<ndarray> 1x1x5x3:uint8], [<ndarray> 1x1x5x3:uint8]]

In the prediction phase no target outputs are needed. If the batch contains only inputs, the batch format is just [inputs].

>>> build_pred_batch = (BuildBatch(2)
...                     .input(1, 'image', 'uint8', True))
>>> batches = samples >> build_pred_batch >> PrintType() >> Collect()
[<ndarray> 2x1x5x3:uint8]
[<ndarray> 1x1x5x3:uint8]
Parameters
  • batchsize (int) – Size of batch = number of rows in batch.

  • prefetch (int) – Number of batches to prefetch. This speeds up GPU based training, since one batch is built on CPU while the another is processed on the GPU. Note: if verbose=True, prefetch is set to 0 to simplify debugging.

  • verbose (bool) – Print batch shape when True. (and sets prefetch=0)

__rrshift__(iterable)[source]

Convert samples in iterable into mini-batches.

Structure of output depends on fmt function used. If None output is a list of np.arrays

Parameters

iterable (iterable) – Iterable over samples.

Returns

Mini-batches

Return type

list of np.array if fmt=None

input(col, name, *args, **kwargs)[source]

Specify and add input columns for batch to create

Parameters
  • col (int) – column of the sample to extract and to create a batch input column from.

  • name (string) – Name of the column function to apply to create a batch column, e.g. ‘image’ See the following functions for more details: ‘image’: nutsflow.batcher.build_image_batch ‘number’: nutsflow.batcher.build_number_batch ‘vector’: nutsflow.batcher.build_vector_batch ‘tensor’: nutsflow.batcher.build_tensor_batch ‘one_hot’: nutsflow.batcher.build_one_hot_batch

  • args (args) – Arguments for column function, e.g. dtype

  • kwargs (kwargs) – Keyword arguments for column function

Returns

instance of BuildBatch

Return type

BuildBatch

output(col, name, *args, **kwargs)[source]

Specify and add output columns for batch to create

Parameters
  • col (int) – column of the sample to extract and to create a batch output column from.

  • name (string) – Name of the column function to apply to create a batch column, e.g. ‘image’ See the following functions for more details: ‘image’: nutsflow.batcher.build_image_batch ‘number’: nutsflow.batcher.build_number_batch ‘vector’: nutsflow.batcher.build_vector_batch ‘tensor’: nutsflow.batcher.build_tensor_batch ‘one_hot’: nutsflow.batcher.build_one_hot_batch

  • args (args) – Arguments for column function, e.g. dtype

  • kwargs (kwargs) – Keyword arguments for column function

Returns

instance of BuildBatch

Return type

BuildBatch

Mixup(batch, alpha)[source]

Mixup produces random interpolations between data and labels.

Usage: … >> BuildBatch() >> Mixup(0.1) >> network.train() >> …

Implementation based on the following paper: mixup: Beyond Empirical Risk Minimization https://arxiv.org/abs/1710.09412

Parameters
  • batch (list) – Batch consisting of list of input data and list of output data, where data must be numeric, e.g. images and one-hot-encoded class labels that can be interpolated between.

  • alpha (float) – Control parameter for beta distribution the interpolation factors are sampled from. Range: [0,…,1] For alpha <= 0 no mixup is performed.

Returns

build_image_batch(images, dtype, channelfirst=False)[source]

Return batch of images.

If images have no channel a channel axis is added. For channelfirst=True it will be added/moved to front otherwise the channel comes last. All images in batch will have a channel axis. Batch is of shape (n, c, h, w) or (n, h, w, c) depending on channelfirst, where n is the number of images in the batch.

>>> from nutsflow.common import shapestr
>>> images = [np.zeros((2, 3)), np.ones((2, 3))]
>>> batch = build_image_batch(images, 'uint8', True)
>>> shapestr(batch)
'2x1x2x3'
>>> batch
array([[[[0, 0, 0],
         [0, 0, 0]]],


       [[[1, 1, 1],
         [1, 1, 1]]]], dtype=uint8)
Parameters
  • array images (numpy) – Images to batch. Must be of shape (w,h,c) or (w,h). Gray-scale with channel is fine (w,h,1) and also alpha channel is fine (w,h,4).

  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’

  • channelfirst (bool) – If True, channel is added/moved to front.

Returns

Image batch with shape (n, c, h, w) or (n, h, w, c).

Return type

np.array

build_number_batch(numbers, dtype)[source]

Return numpy array with given dtype for given numbers.

>>> numbers = (1, 2, 3, 1)
>>> build_number_batch(numbers, 'uint8')
array([1, 2, 3, 1], dtype=uint8)
Parameters
  • number numbers (iterable) – Numbers to create batch from

  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’

Returns

Numpy array for numbers

Return type

numpy.array

build_one_hot_batch(class_ids, dtype, num_classes)[source]

Return one hot vectors for class ids.

>>> class_ids = [0, 1, 2, 1]
>>> build_one_hot_batch(class_ids, 'uint8', 3)
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0]], dtype=uint8)
Parameters
  • class_ids (iterable) – Class indices in {0, …, num_classes-1}

  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’

  • num_classes – Number of classes

Returns

One hot vectors for class ids.

Return type

numpy.array

build_tensor_batch(tensors, dtype, axes=None, expand=None)[source]

Return batch of tensors.

>>> from nutsflow.common import shapestr
>>> tensors = [np.zeros((2, 3)), np.ones((2, 3))]
>>> batch = build_tensor_batch(tensors, 'uint8')
>>> shapestr(batch)
'2x2x3'
>>> print(batch)
[[[0 0 0]
  [0 0 0]]

 [[1 1 1]
  [1 1 1]]]
>>> batch = build_tensor_batch(tensors, 'uint8', expand=0)
>>> shapestr(batch)
'2x1x2x3'
>>> print(batch)
[[[[0 0 0]
   [0 0 0]]]

 [[[1 1 1]
   [1 1 1]]]]
>>> batch = build_tensor_batch(tensors, 'uint8', axes=(1, 0))
>>> shapestr(batch)
'2x3x2'
>>> print(batch)
[[[0 0]
  [0 0]
  [0 0]]

 [[1 1]
  [1 1]
  [1 1]]]
Parameters
  • tensors (iterable) – Numpy tensors

  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’

  • axes (tuple|None) – axes order, e.g. to move a channel axis to the last position. (see numpy transpose for details)

  • expand (int|None) – Add empty dimension at expand dimension. (see numpy expand_dims for details).

Returns

stack of tensors, with batch axis first.

Return type

numpy.array

build_vector_batch(vectors, dtype)[source]

Return batch of vectors.

>>> from nutsflow.common import shapestr
>>> vectors = [np.array([1,2,3]), np.array([2, 3, 4])]
>>> batch = build_vector_batch(vectors, 'uint8')
>>> shapestr(batch)
'2x3'
>>> batch
array([[1, 2, 3],
       [2, 3, 4]], dtype=uint8)
Parameters
  • vectors (iterable) – Numpy row vectors

  • data type dtype (numpy) – Data type of batch, e.g. ‘uint8’

Returns

vstack of vectors

Return type

numpy.array

nutsml.booster module

Boost(iterable, batcher, network, rand=None)[source]

iterable >> Boost(batcher, network, rand=None)

Boost samples with high softmax probability for incorrect class. Expects one-hot encoded targets and softmax predictions for output.

NOTE: prefetching of batches must be disabled when using boosting!

network = Network()
build_batch = BuildBatch(BATCHSIZE, prefetch=0).input(…).output(…)
boost = Boost(build_batch, network)
samples >> … ?>> boost >> build_batch >> network.train() >> Consume()
Parameters
  • iterable (iterable) – Iterable with samples.

  • batcher (nutsml.BuildBatch) – Batcher used for network training.

  • network (nutsml.Network) – Network used for prediction

  • rand (Random|None) – Random number generator used for down-sampling. If None, random.Random() is used.

Returns

Generator over samples to boost

Return type

generator

nutsml.checkpoint module

class Checkpoint(create_net, parameters, checkpointspath='checkpoints')[source]

Bases: object

A factory for checkpoints to periodically save network weights and other hyper/configuration parameters.

Example usage:

def create_network(lr=0.01, momentum=0.9):
model = Sequential()
optimizer = opt.SGD(lr=lr, momentum=momentum)
model.compile(optimizer=optimizer, metrics=[‘accuracy’])
return KerasNetwork(model), optimizer

def parameters(network, optimizer):
return dict(lr = optimizer.lr, momentum = optimizer.momentum)

def train_network():
checkpoint = Checkpoint(create_network, parameters)
network, optimizer = checkpoint.load()

for epoch in xrange(EPOCHS):
train_err = train_network()
val_err = validate_network()

if epoch % 10 == 0: # Reduce learning rate every 10 epochs
optimizer.lr /= 2

checkpoint.save_best(val_err)

Checkpoints can also be saved under different names, e.g.

checkpoint.save_best(val_err, ‘checkpoint’+str(epoch))

And specific checkpoints can be loaded:

network, config = checkpoint.load(‘checkpoint103’)

If no checkpoint is specified the most recent one is loaded.

__init__(create_net, parameters, checkpointspath='checkpoints')[source]

Create checkpoint factory.

>>> def create_network(lr=0.1):
...     return 'MyNetwork', lr
>>> def parameters(network, lr):
...     return dict(lr = lr)
>>> checkpoint = Checkpoint(create_network, parameters)
>>> network, lr = checkpoint.load()
>>> network, lr
('MyNetwork', 0.1)
Parameters
  • create_net (function) – Function that takes keyword parameters and returns a nuts-ml Network and and any other values or objects needed to describe the state to be checkpointed. Note: parameters(*create_net()) must work!

  • parameters (function) – Function that takes output of create_net() and returns dictionary with parameters (same as the one that are used in create_net(…))

  • checkpointspath (string) – Path to folder that will contain checkpoint folders.

datapaths(checkpointname=None)[source]

Return paths to network weights, parameters and config files.

If no checkpoints exist under basedir (None, None, None) is returned.

Parameters

checkpointname (str|None) – Name of checkpoint. If name is None the most recent checkpoint is used.

Returns

(weightspath, paramspath, configpath) or (None, None, None)

Return type

tuple

dirs()[source]

Return full paths to all checkpoint folders.

Returns

Paths to all folders under the basedir.

Return type

list

latest()[source]

Find most recently modified/created checkpoint folder.

Returns

Full path to checkpoint folder if it exists otherwise None.

Return type

str | None

load(checkpointname=None)[source]

Create network, load weights and parameters.

Parameters

checkpointname (str|none) – Name of checkpoint to load. If None the most recent checkpoint is used. If no checkpoint exists yet the network will be created but no weights loaded and the default configuration will be returned.

Returns

whatever self.create_net returns

Return type

object

save(checkpointname='checkpoint')[source]

Save network weights and parameters under the given name.

Parameters

checkpointname (str) – Name of checkpoint folder. Path will be self.basepath/checkpointname

Returns

path to checkpoint folder

Return type

str

save_best(score, checkpointname='checkpoint', isloss=False)[source]

Save best network weights and parameters under the given name.

Parameters
  • score (float|int) – Some score indicating quality of network.

  • checkpointname (str) – Name of checkpoint folder.

  • isloss (bool) – True, score is a loss and lower is better otherwise higher is better.

Returns

path to checkpoint folder

Return type

str

nutsml.common module

CheckNaN(data)[source]

Raise exception if data contains NaN.

Useful to stop training if network doesn’t converge and loss function returns NaN. Example: samples >> network.train() >> CheckNan() >> log >> Consume()

>>> from nutsflow import Collect
>>> [1, 2, 3] >> CheckNaN() >> Collect()
[1, 2, 3]
>>> import numpy as np
>>> [1, np.NaN, 3] >> CheckNaN() >> Collect()
Traceback (most recent call last):
...
RuntimeError: NaN encountered: nan
Parameters

data – Items or iterables.

Returns

Return input data if it doesn’t contain NaN

Return type

any

Raise

RuntimeError if data contains NaN.

class ConvertLabel(column, labels, onehot=False)[source]

Bases: nutsflow.base.NutFunction

Convert string labels to integer class ids (or one-hot) and vice versa.

__call__(sample)[source]

Return sample and replace label within sample if it is a sample

__init__(column, labels, onehot=False)[source]

Convert string labels to integer class ids (or one-hot) and vice versa.

Also converts confidence vectors, e.g. softmax output or float values to class labels.

>>> from nutsflow import Collect
>>> labels = ['class0', 'class1', 'class2']
>>> convert = ConvertLabel(None, labels)
>>> [1, 0] >> convert >> Collect()
['class1', 'class0']
>>> ['class1', 'class0'] >> convert >> Collect()
[1, 0]
>>> [0.9, 0.4, 1.6] >> convert >> Collect()
['class1', 'class0', 'class2']
>>> [[0.1, 0.7, 0.2], [0.8, 0.1, 0.1]] >> convert >> Collect()
['class1', 'class0']
>>> convert = ConvertLabel(None, labels, onehot=True)
>>> ['class1', 'class0'] >> convert >> Collect()
[[0, 1, 0], [1, 0, 0]]
>>> convert = ConvertLabel(1, labels)
>>> [('data', 'class1'), ('data', 'class0')] >> convert >> Collect()
[('data', 1), ('data', 0)]
>>> [('data', 1), ('data', 2)] >> convert >> Collect()
[('data', 'class1'), ('data', 'class2')]
>>> [('data', 0.9)] >> convert >> Collect()
[('data', 'class1')]
>>> [('data', [0.1, 0.7, 0.2])] >> convert >> Collect()
[('data', 'class1')]
Parameters
  • column (int) – Index of column in sample that contains label. If None process labels directly.

  • labels (list|tuple) – List of class labels (strings).

  • onehot (bool) – True: convert class labels to one-hot encoded vectors. False, convert to class index.

PartitionByCol(iterable, column, values)[source]

Partition samples in iterables depending on column value.

>>> samples = [(1,1), (2,0), (2,4), (1,3), (3,0)]
>>> ones, twos = samples >> PartitionByCol(0, [1, 2])
>>> ones
[(1, 1), (1, 3)]
>>> twos
[(2, 0), (2, 4)]

Note that values does not need to contain all possible values. It is sufficient to provide the values for the partitions wanted.

Parameters
  • iterable (iterable) – Iterable over samples

  • column (int) – Index of column to extract

  • values (list) – List of column values to create partitions for.

Returns

tuple of partitions

Return type

tuple

SplitLeaveOneOut(iterable, keyfunc=None)[source]

Returns a leave-one-out split of the iterable.

Note that SplitLeaveOneOut consumes the entire input stream and returns a generator over the leave-one-out splits. The splits are stable across Python version 2.x or 3.x and deterministic.

>>> from nutsflow.common import console  # just for printing
>>> samples = [1, 2, 3]
>>> for train, test in samples >> SplitLeaveOneOut():
...     console(train, '  ', test)
[2, 3]    [1]
[1, 3]    [2]
[1, 2]    [3]
>>> samples = [(1, 1), (2, 0), (2, 4), (1, 3), (3, 0)]
>>> splits = samples >> SplitLeaveOneOut(lambda x: x[0])
>>> for train, test in splits:
...     console(train, '   ', test)
[(2, 0), (2, 4), (3, 0)]     [(1, 1), (1, 3)]
[(1, 1), (1, 3), (3, 0)]     [(2, 0), (2, 4)]
[(1, 1), (1, 3), (2, 0), (2, 4)]     [(3, 0)]
Parameters
  • iterable (iterable) – Iterable over anything. Will be consumed!

  • keyfunc (function/None) – Function that returns value the split is based on. If None, the sample itself serves as key.

Returns

generator over leave-one-out train and test splits (train, test)

Return type

Generator[(list, list)]

SplitRandom(iterable, ratio=0.7, constraint=None, rand=None)[source]

Randomly split iterable into partitions.

For the same input data the same split is created every time and is stable across different Python version 2.x or 3.x. A random number generator can be provided to create varying splits.

>>> train, val = range(10) >> SplitRandom(ratio=0.7)
>>> train, val
([6, 3, 1, 7, 0, 2, 4], [5, 9, 8])
>>> range(10) >> SplitRandom(ratio=0.7)  # Same split again
[[6, 3, 1, 7, 0, 2, 4], [5, 9, 8]]
>>> train, val, test = range(10) >> SplitRandom(ratio=(0.6, 0.3, 0.1))
>>> train, val, test
([6, 1, 4, 0, 3, 2], [8, 7, 9], [5])
>>> data = zip('aabbccddee', range(10))
>>> same_letter = lambda t: t[0]
>>> train, val = data >> SplitRandom(ratio=0.6, constraint=same_letter)
>>> sorted(train)
[('a', 0), ('a', 1), ('b', 2), ('b', 3), ('d', 6), ('d', 7)]
>>> sorted(val)
[('c', 4), ('c', 5), ('e', 8), ('e', 9)]
Parameters
  • iterable (iterable) – Iterable over anything. Will be consumed!

  • ratio (float|tuple) – Ratio of two partition e.g. a ratio of 0.7 means 70%, 30% split. Alternatively a list or ratios can be provided, e.g. ratio=(0.6, 0.3, 0.1). Note that ratios must sum up to one and cannot be zero.

  • constraint (function|None) – Function that returns key the elements of the iterable are grouped by before partitioning. Useful to ensure that a partition contains related elements, e.g. left and right eye images are not scattered across partitions. Note that constrains have precedence over ratios.

  • rand (Random|None) – Random number generator. The default None ensures that the same split is created every time SplitRandom is called. This is important when continuing an interrupted training session or running the same training on machines with different Python versions. Note that Python’s random.Random(0) generates different number for Python 2.x and 3.x!

Returns

partitions of iterable with sizes according to provided ratios.

Return type

(list, list, ..)

nutsml.config module

class Config(*args, **kwargs)[source]

Bases: dict

Dictionary that allows access via keys or attributes.

Used to store and access configuration data.

__init__(*args, **kwargs)[source]

Create dictionary.

>>> contact = Config({'name':'stefan', 'address':{'number':12}})
>>> contact['name']
'stefan'
>>> contact.name
'stefan'
>>> contact.address.number
12
>>> contact.surname = 'maetschke'
>>> contact.surname
'maetschke'
Parameters
  • args (args) – See dict

  • kwargs (kwargs) – See dict

static isjson(filepath)[source]

Return true if filepath ends with ‘.json’.

Parameters

filepath (str) – Filepaht

Returns

True if filepath points ot JSON file.

Return type

bool

load(filepath)[source]

Load configuration from file in JSON or YAML format.

>>> cfg = Config().load('tests/data/configuration.json')
>>> cfg.number
13
Parameters

filepath (str) – Path to JSON or YAML file.

Returns

returns loaded configuration.

Return type

Config

save(filepath)[source]

Save configuration to file in JSON or YAML format.

>>> cfg = Config({'number': 13, 'name': 'Stefan'})
>>> cfg.save('tests/data/configuration.yaml')
Parameters

filepath (str) – Filepath. Should end with ‘.json’ or ‘.yaml’

load_config(filename)[source]

Load configuration file in YAML format from locations in defined order.

The search order for the config file is: 1) user home dir 2) current dir 3) full path

Example file: ‘tests/data/config.yaml’
filepath : c:/Maet
imagesize : [100, 200]
>>> cfg = load_config('tests/data/config.yaml')
>>> cfg.filepath
'c:/Maet'
>>> cfg['imagesize']
[100, 200]
Parameters

filename – Name or full path of configuration file.

Returns

dictionary with config data. Note that config data can be accessed by key or attribute, e.g. cfg.filepath or cfg.[‘filepath’]

Return type

ConfigDict

nutsml.datautil module

col_map(sample, columns, func, *args, **kwargs)[source]

Map function to given columns of sample and keep other columns

>>> sample = (1, 2, 3)
>>> add_n = lambda x, n: x + n
>>> col_map(sample, 1, add_n, 10)
(1, 12, 3)
>>> col_map(sample, (0, 2), add_n, 10)
(11, 2, 13)
Parameters
  • sample (tuple|list) – Sample

  • columns (int|tuple) – Single or multiple column indices.

  • func (function) – Function to map

  • args (args) – Arguments passed on to function

  • kwargs (kwargs) – Keyword arguments passed on to function

Returns

Sample where function has been applied to elements in the given columns.

group_by(elements, keyfunc, ordered=False)[source]

Group elements using the given key function.

>>> is_odd = lambda x: bool(x % 2)
>>> numbers = [0, 1, 2, 3, 4]
>>> group_by(numbers, is_odd, True)
OrderedDict([(False, [0, 2, 4]), (True, [1, 3])])
Parameters
  • elements (iterable) – Any iterable

  • keyfunc (function) – Function that returns key to group by

  • ordered (bool) – True: return OrderedDict else return dict

Returns

dictionary with results of keyfunc as keys and the elements for that key as value

Return type

dict|OrderedDict

group_samples(samples, labelcol, ordered=False)[source]

Return samples grouped by label and label counts.

>>> samples = [('pos', 1), ('pos', 1), ('neg', 0)]  
>>> groups, labelcnts = group_samples(samples, 1, True)
>>> groups
OrderedDict([(1, [('pos', 1), ('pos', 1)]), (0, [('neg', 0)])])
>>> labelcnts
Counter({1: 2, 0: 1})
Parameters
  • samples (iterable) – Iterable of samples where each sample has a label at a fixed position (labelcol)

  • labelcol (int) – Index of label in sample

  • ordered (bool) – True: samples are kept in order when grouping.

Returns

(groups, labelcnts) where groups is a dict containing samples grouped by label, and labelcnts is a Counter dict containing label frequencies.

Return type

tuple(dict, Counter)

random_downsample(samples, labelcol, rand=None, ordered=False)[source]

Randomly down-sample samples.

Creates stratified samples by down-sampling larger classes to the size of the smallest class.

Note: The example shown below uses StableRandom(i) to create a deterministic sequence of randomly stratified samples. Usually it is sufficient to use the default (rand=None). Do NOT use rnd.Random(0) since this will generate the same subsample every time.

>>> from __future__ import print_function  
>>> from nutsflow.common import StableRandom
>>> samples = [('pos1', 1), ('pos2', 1), ('pos3', 1),
...            ('neg1', 0), ('neg2', 0)]
>>> for i in range(3):
...     print(random_downsample(samples, 1, StableRandom(i), True))
[('pos2', 1), ('pos3', 1), ('neg2', 0), ('neg1', 0)]
[('pos2', 1), ('pos3', 1), ('neg2', 0), ('neg1', 0)]
[('pos2', 1), ('pos1', 1), ('neg1', 0), ('neg2', 0)]
Parameters
  • samples (iterable) – Iterable of samples where each sample has a label at a fixed position (labelcol). Labels can be any hashable type, e.g. int, str, bool

  • labelcol (int) – Index of label in sample

  • rand (Random|None) – Random number generator. If None, random.Random(None) is used.

  • ordered (bool) – True: samples are kept in order when downsampling.

Returns

Stratified sample set.

Return type

list of samples

shuffle_sublists(sublists, rand)[source]

Shuffles the lists within a list but not the list itself.

>>> from nutsflow.common import StableRandom
>>> rand = StableRandom(0)
>>> sublists = [[1, 2, 3], [4, 5, 6, 7]]
>>> shuffle_sublists(sublists, rand)
>>> sublists
[[1, 3, 2], [4, 5, 7, 6]]
Parameters
  • sublists – A list containing lists

  • rand (Random) – A random number generator.

upsample(samples, labelcol, rand=None)[source]

Up-sample sample set.

Creates stratified samples by up-sampling smaller classes to the size of the largest class.

Note: The example shown below uses rnd.Random(i) to create a deterministic sequence of randomly stratified samples. Usually it is sufficient to use the default (rand=None).

>>> from __future__ import print_function
>>> import random as rnd
>>> samples = [('pos1', 1), ('pos2', 1), ('neg1', 0)]
>>> for i in range(3):  
...     print(upsample(samples, 1, rand=rnd.Random(i)))
[('neg1', 0), ('neg1', 0), ('pos1', 1), ('pos2', 1)]
[('pos2', 1), ('neg1', 0), ('pos1', 1), ('neg1', 0)]
[('neg1', 0), ('neg1', 0), ('pos1', 1), ('pos2', 1)]
Parameters
  • samples (iterable) – Iterable of samples where each sample has a label at a fixed position (labelcol). Labels can by any hashable type, e.g. int, str, bool

  • labelcol (int) – Index of label in sample

  • rand (Random|None) – Random number generator. If None, random.Random(None) is used.

Returns

Stratified sample set.

Return type

list of samples

nutsml.fileutil module

clear_folder(path)[source]

Remove all content (files and folders) within the specified folder.

Parameters

path (str) – Path of folder to clear.

create_filename(prefix='', ext='')[source]

Create a unique filename.

Parameters
  • prefix (str) – Prefix to add to filename.

  • ext (str) – Extension to append to filename, e.g. ‘jpg’

Returns

Unique filename.

Return type

str

create_folders(path, mode=511)[source]

Create folder(s). Don’t fail if already existing.

See related functions delete_folders() and clear_folder().

Parameters
  • path (str) – Path of folders to create, e.g. ‘foo/bar’

  • mode (int) – File creation mode, e.g. 0777

create_temp_filepath(prefix='', ext='', relative=True)[source]

Create a temporary folder under TEMP_FOLDER.

If the folder already exists do nothing. Return relative (default) or absolute path to a temp file with a unique name.

See related function create_filename().

Parameters
  • prefix (str) – Prefix to add to filename.

  • ext (str) – Extension to append to filename, e.g. ‘jpg’

  • relative (bool) – True: return relative path, otherwise absolute path.

Returns

Path to file with unique name in temp folder.

Return type

str

delete_file(path)[source]

Remove file at given path. Don’t fail if non-existing.

Parameters

path (str) – Path to file to delete, e.g. ‘foo/bar/file.txt’

delete_folders(path)[source]

Remove folder and sub-folders. Don’t fail if non-existing or not empty.

Parameters

path (str) – Path of folders to delete, e.g. ‘foo/bar’

delete_temp_data()[source]

Remove TEMP_FOLDER and all its contents.

reader_filepath(sample, filename, pathfunc)[source]

Construct filepath from sample, filename and/or pathfunction.

Helper function used in ReadImage and ReadNumpy.

Parameters
  • sample (tuple|list) – E.g. (‘nut_color’, 1)

  • filename

  • pathfunc (string|function|None) – Filepath with wildcard ‘*’, which is replaced by the file id/name provided in the sample, e.g. ‘tests/data/img_formats/*.jpg’ for sample (‘nut_grayscale’, 2) will become ‘tests/data/img_formats/nut_grayscale.jpg’ or Function to compute path to image file from sample, e.g. lambda sample: ‘tests/data/img_formats/{1}.jpg’.format(*sample) or None, in this case the filename is taken as the filepath.

Returns

nutsml.imageutil module

add_channel(image, channelfirst)[source]

Add channel if missing and make first axis if requested.

>>> import numpy as np
>>> image = np.ones((10, 20))
>>> image = add_channel(image, True)
>>> shapestr(image)
'1x10x20'
Parameters
  • image (ndarray) – RBG (h,w,3) or gray-scale image (h,w).

  • channelfirst (bool) – If True, make channel first axis

Returns

Numpy array with channel (as first axis if makefirst=True)

Return type

numpy.array

annotation2coords(image, annotation)[source]

Convert geometric annotation in image to pixel coordinates.

For example, given a rectangular region annotated in an image as (‘rect’, ((x, y, w, h))) the function returns the coordinates of all pixels within this region as (row, col) position tuples.

The following annotation formats are supported: (‘point’, ((x, y), … )) (‘circle’, ((x, y, r), …)) (‘ellipse’, ((x, y, rx, ry, rot), …)) (‘rect’, ((x, y, w, h), …)) (‘polyline’, (((x, y), (x, y), …), …))

Annotation regions can exceed the image dimensions and will be clipped. Note that annotation is in x,y order while output is r,c (row, col).

>>> import numpy as np
>>> img = np.zeros((5, 5), dtype='uint8')
>>> anno = ('point', ((1, 1), (1, 2)))
>>> for rr, cc in annotation2coords(img, anno):
...     print(list(rr), list(cc))
[1] [1]
[2] [1]
Parameters
  • image (ndarray) – Image

  • annotation (annotation) – Annotation of an image region such as point, circle, rect or polyline

Returns

Coordinates of pixels within the (clipped) region.

Return type

generator over tuples (row, col)

annotation2mask(image, annotations, pos=255)[source]

Convert geometric annotation to mask.

For annotation formats see: imageutil.annotation2coords

>>> import numpy as np
>>> img = np.zeros((3, 3), dtype='uint8')
>>> anno = ('point', ((0, 1), (2, 0)))
>>> annotation2mask(img, anno)
array([[  0,   0, 255],
       [255,   0,   0],
       [  0,   0,   0]], dtype=uint8)
Parameters
  • annotation (annotation) – Annotation of an image region such as point, circle, rect or polyline

  • pos (int) – Value to write in mask for regions defined by annotation

  • array image (numpy) – Image annotation refers to. Returned mask will be of same size.

Returns

Mask with annotation

Return type

numpy array

annotation2pltpatch(annotation, **kwargs)[source]

Convert geometric annotation to matplotlib geometric objects (=patches)

For details regarding matplotlib patches see: http://matplotlib.org/api/patches_api.html For annotation formats see: imageutil.annotation2coords

Parameters

annotation (annotation) – Annotation of an image region such as point, circle, rect or polyline

Returns

matplotlib.patches

Return type

generator over matplotlib patches

arr_to_pil(image)[source]

Convert numpy array to PIL image.

>>> import numpy as np
>>> rgb_arr = np.ones((5, 4, 3), dtype='uint8')
>>> pil_img = arr_to_pil(rgb_arr)
>>> pil_img.size
(4, 5)
Parameters

image (ndarray) – Numpy array with dtype ‘uint8’ and dimensions (h,w,c) for RGB or (h,w) for gray-scale images.

Returns

PIL image

Return type

PIL.Image

centers_inside(centers, image, pshape)[source]

Filter center points of patches ensuring that patch is inside of image.

>>> centers = np.array([[1, 2], [0,1]])
>>> image = np.zeros((3, 4))
>>> centers_inside(centers, image, (3, 3)).astype('uint8')
array([[1, 2]], dtype=uint8)
Parameters
  • centers (ndarray(n,2)) – Center points of patches.

  • image (ndarray(h,w)) – Image the patches should be inside.

  • pshape (tuple) – Patch shape of form (h,w)

Returns

Patch centers where the patch is completely inside the image.

Return type

ndarray of shape (n, 2)

change_brightness(image, brightness=1.0)[source]

Change brightness of image.

>>> image = np.eye(3, dtype='uint8') * 255
>>> change_brightness(image, 0.5)
array([[127,   0,   0],
       [  0, 127,   0],
       [  0,   0, 127]], dtype=uint8)

See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Brightness

Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • brightness (float) – Brightness [0, 1]

Returns

Image with changed brightness

Return type

numpy array with range [0,255] and dtype ‘uint8’

change_color(image, color=1.0)[source]

Change color of image.

>>> image = np.eye(3, dtype='uint8') * 255
>>> change_color(image, 0.5)
array([[255,   0,   0],
       [  0, 255,   0],
       [  0,   0, 255]], dtype=uint8)

See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Color

Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • color (float) – Color [0, 1]

Returns

Image with changed color

Return type

numpy array with range [0,255] and dtype ‘uint8’

change_contrast(image, contrast=1.0)[source]

Change contrast of image.

>>> image = np.eye(3, dtype='uint8') * 255
>>> change_contrast(image, 0.5)
array([[170,  42,  42],
       [ 42, 170,  42],
       [ 42,  42, 170]], dtype=uint8)

See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Contrast

Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • contrast (float) – Contrast [0, 1]

Returns

Image with changed contrast

Return type

numpy array with range [0,255] and dtype ‘uint8’

change_sharpness(image, sharpness=1.0)[source]

Change sharpness of image.

>>> image = np.eye(3, dtype='uint8') * 255
>>> change_sharpness(image, 0.5)
array([[255,   0,   0],
       [  0, 196,   0],
       [  0,   0, 255]], dtype=uint8)

See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Sharpness

Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • sharpness (float) – Sharpness [0, …]

Returns

Image with changed sharpness

Return type

numpy array with range [0,255] and dtype ‘uint8’

crop(image, x1, y1, x2, y2)[source]

Crop image.

>>> import numpy as np
>>> image = np.reshape(np.arange(16, dtype='uint8'), (4, 4))
>>> crop(image, 1, 2, 5, 5)
array([[ 9, 10, 11],
       [13, 14, 15]], dtype=uint8)
Parameters
  • array image (numpy) – Numpy array.

  • x1 (int) – x-coordinate of left upper corner of crop (inclusive)

  • y1 (int) – y-coordinate of left upper corner of crop (inclusive)

  • x2 (int) – x-coordinate of right lower corner of crop (exclusive)

  • y2 (int) – y-coordinate of right lower corner of crop (exclusive)

Returns

Cropped image

Return type

numpy array

crop_center(image, w, h)[source]

Crop region with size w, h from center of image.

Note that the crop is specified via w, h and not via shape (h,w). Furthermore if the image or the crop region have even dimensions, coordinates are rounded down.

>>> import numpy as np
>>> image = np.reshape(np.arange(16, dtype='uint8'), (4, 4))
>>> crop_center(image, 3, 2)
array([[ 4,  5,  6],
       [ 8,  9, 10]], dtype=uint8)
Parameters
  • array image (numpy) – Numpy array.

  • w (int) – Width of crop

  • h (int) – Height of crop

Returns

Cropped image

Return type

numpy array

Raise

ValueError if image is smaller than crop region

crop_square(image)[source]

Crop image to square shape.

Crops symmetrically left and right or top and bottom to achieve aspect ratio of one and preserves the largest dimension.

Parameters

array image (numpy) – Numpy array.

Returns

Cropped image

Return type

numpy array

distort_elastic(image, smooth=10.0, scale=100.0, seed=0)[source]

Elastic distortion of images.

Channel axis in RGB images will not be distorted but grayscale or RGB images are both valid inputs. RGB and grayscale images will be distorted identically for the same seed.

Simard, et. al, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Parameters
  • image (ndarray) – Image of shape [h,w] or [h,w,c]

  • smooth (float) – Smoothes the distortion.

  • scale (float) – Scales the distortion.

  • seed (int) – Seed for random number generator. Ensures that for the same seed images are distorted identically.

Returns

Distorted image with same shape as input image.

Return type

ndarray

enhance(image, func, *args, **kwargs)[source]

Enhance image using a PIL enhance function

See the following link for details on PIL enhance functions: http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html

>>> from PIL.ImageEnhance import Brightness
>>> image = np.ones((3,2), dtype='uint8')
>>> enhance(image, Brightness, 0.0)
array([[0, 0],
       [0, 0],
       [0, 0]], dtype=uint8)
Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • func (function) – PIL ImageEnhance function

  • args (args) – Argument list passed on to enhance function.

  • kwargs (kwargs) – Key-word arguments passed on to enhance function

Returns

Enhanced image

Return type

numpy array with range [0,255] and dtype ‘uint8’

extract_edges(image, sigma)[source]

Extract edges using the Canny algorithm.

Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • sigma (float) – Standard deviation of the Gaussian filter.

Returns

Binary image with extracted edges

Return type

numpy array with range [0,255] and dtype ‘uint8’

extract_patch(image, pshape, r, c)[source]

Extract a patch of given shape, centered at r,c of given shape from image.

Note that there is no checking if the patch region is inside the image.

>>> image = np.reshape(np.arange(16, dtype='uint8'), (4, 4))
>>> extract_patch(image, (2, 3), 2, 2)
array([[ 5,  6,  7],
       [ 9, 10, 11]], dtype=uint8)
Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’. Can be of shapes MxN, MxNxC.

  • pshape (tuple) – Shape of patch. #Dimensions must match image.

  • r (int) – Row for center of patch

  • c (int) – Column for center of patch

Returns

numpy array with shape pshape

Return type

numpy array with range [0,255] and dtype ‘uint8’

fliplr(image)[source]

Flip image left to right.

>>> image = np.reshape(np.arange(4, dtype='uint8'), (2,2))
>>> fliplr(image)
array([[1, 0],
       [3, 2]], dtype=uint8)
Parameters

array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

Returns

Flipped image

Return type

numpy array with range [0,255] and dtype ‘uint8’

flipud(image)[source]

Flip image up to down.

>>> image = np.reshape(np.arange(4, dtype='uint8'), (2,2))
>>> flipud(image)
array([[2, 3],
       [0, 1]], dtype=uint8)
Parameters

array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

Returns

Flipped image

Return type

numpy array with range [0,255] and dtype ‘uint8’

floatimg2uint8(image)[source]

Convert array with floats to ‘uint8’ and rescale from [0,1] to [0, 256].

Converts only if image.dtype != uint8.

>>> import numpy as np
>>> image = np.eye(10, 20, dtype=float)
>>> arr = floatimg2uint8(image)
>>> np.max(arr)
255
Parameters

image (numpy.array) – Numpy array with range [0,1]

Returns

Numpy array with range [0,255] and dtype ‘uint8’

Return type

numpy array

gray2rgb(image)[source]

Grayscale scale image to RGB image

>>> image = np.eye(3, dtype='uint8') * 255
>>> gray2rgb(image)
array([[[255, 255, 255],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 255, 255],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [255, 255, 255]]], dtype=uint8)
Parameters

array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

Returns

RGB image

Return type

numpy array with range [0,255] and dtype ‘uint8’

identical(image)[source]

Return input image unchanged.

Parameters

image (numpy.array) – Should be a numpy array of an image.

Returns

Same as input

Return type

Same as input

load_image(filepath, as_grey=False, dtype='uint8', no_alpha=True)[source]

Load image as numpy array from given filepath.

Supported formats: gif, png, jpg, bmp, tif, npy

>>> img = load_image('tests/data/img_formats/nut_color.jpg')
>>> shapestr(img)
'213x320x3'
Parameters
  • filepath (string) – Filepath to image file or numpy array.

  • as_grey (bool) –

Returns

numpy array with shapes (h, w) for grayscale or monochrome, (h, w, 3) for RGB (3 color channels in last axis) (h, w, 4) for RGBA (for no_alpha = False) (h, w, 3) for RGBA (for no_alpha = True) pixel values are in range [0,255] for dtype = uint8

Return type

numpy ndarray

mask_choice(mask, value, n)[source]

Random selection of n points where mask has given value

>>> np.random.seed(1)   # ensure same random selection for doctest
>>> mask = np.eye(3, dtype='uint8')
>>> mask_choice(mask, 1, 2).tolist()
[[0, 0], [2, 2]]
Parameters
  • array mask (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • n (int) – Number of points to select. If n is larger than the points available only the available points will be returned.

Returns

Array with x,y coordinates

Return type

numpy array with shape nx2 where each row contains x, y

mask_where(mask, value)[source]

Return x,y coordinates where mask has specified value

>>> mask = np.eye(3, dtype='uint8')
>>> mask_where(mask, 1).tolist()
[[0, 0], [1, 1], [2, 2]]
Parameters

array mask (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

Returns

Array with x,y coordinates

Return type

numpy array with shape Nx2 where each row contains x, y

normalize_histo(image, gamma=1.0)[source]

Perform histogram normalization on image.

Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • gamma (float) – Factor for gamma adjustment.

Returns

Normalized image

Return type

numpy array with range [0,255] and dtype ‘uint8’

occlude(image, x, y, w, h, color=0)[source]

Occlude image with a rectangular region.

Occludes an image region with dimensions w,h centered on x,y with the given color. Invalid x,y coordinates will be clipped to ensure complete occlusion rectangle is within the image.

>>> import numpy as np
>>> image = np.ones((4, 5)).astype('uint8')
>>> occlude(image, 2, 2, 2, 3)
array([[1, 1, 1, 1, 1],
       [1, 0, 0, 1, 1],
       [1, 0, 0, 1, 1],
       [1, 0, 0, 1, 1]], dtype=uint8)
>>> image = np.ones((4, 4)).astype('uint8')
>>> occlude(image, 0.5, 0.5, 0.5, 0.5)
array([[1, 1, 1, 1],
       [1, 0, 0, 1],
       [1, 0, 0, 1],
       [1, 1, 1, 1]], dtype=uint8)
Parameters
  • array image (numpy) – Numpy array.

  • x (int|float) – x coordinate for center of occlusion region. Can be provided as fraction (float) of image width

  • y (int|float) – y coordinate for center of occlusion region. Can be provided as fraction (float) of image height

  • w (int|float) – width of occlusion region. Can be provided as fraction (float) of image width

  • h (int|float) – height of occlusion region. Can be provided as fraction (float) of image height

  • color (int|tuple) – gray-scale or RGB color of occlusion.

Returns

Copy of input image with occluded region.

Return type

numpy array

patch_iter(image, shape=(3, 3), stride=1)[source]

Extracts patches from images with given shape.

Patches are extracted in a regular grid with the given stride, starting in the left upper corner and then row-wise. Image can be gray-scale (no third channel dim) or color.

>>> import numpy as np
>>> img = np.reshape(np.arange(12), (3, 4))
>>> for p in patch_iter(img, (2, 2), 2):
...     print(p)
[[0 1]
 [4 5]]
[[2 3]
 [6 7]]
Parameters
  • image (ndarray) – Numpy array of shape h,w,c or h,w.

  • shape (tuple) – Shape of patch (h,w)

  • stride (int) – Step size of grid patches are extracted from

Returns

Iterator over patches

Return type

Iterator

pil_to_arr(image)[source]

Convert PIL image to Numpy array.

>>> import numpy as np
>>> rgb_arr = np.ones((5, 4, 3), dtype='uint8')
>>> pil_img = arr_to_pil(rgb_arr)
>>> arr = pil_to_arr(pil_img)
>>> shapestr(arr)
'5x4x3'
Parameters

image (PIL.Image) – PIL image (RGB or grayscale)

Returns

Numpy array

Return type

numpy.array with dtype ‘uint8’

polyline2coords(points)[source]

Return row and column coordinates for a polyline.

>>> rr, cc = polyline2coords([(0, 0), (2, 2), (2, 4)])
>>> list(rr)
[0, 1, 2, 2, 3, 4]
>>> list(cc)
[0, 1, 2, 2, 2, 2]
Parameters

of tuple points (list) – Polyline in format [(x1,y1), (x2,y2), …]

Returns

tuple with row and column coordinates in numpy arrays

Return type

tuple of numpy array

rerange(image, old_min, old_max, new_min, new_max, dtype)[source]

Return image with values in new range.

Note: The default range of images is [0, 255] and most image processing functions expect this range and will fail otherwise. However, as input to neural networks re-ranged images, e.g [-1, +1] are sometimes needed.

>>> import numpy as np
>>> image = np.array([[0, 255], [255, 0]])
>>> rerange(image, 0, 255, -1, +1, 'float32')
array([[-1.,  1.],
       [ 1., -1.]], dtype=float32)
Parameters
  • image (numpy.array) – Should be a numpy array of an image.

  • old_min (int|float) – Current minimum value of image, e.g. 0

  • old_max (int|float) – Current maximum value of image, e.g. 255

  • new_min (int|float) – New minimum, e.g. -1.0

  • new_max (int|float) – New maximum, e.g. +1.0

  • datatype dtype (numpy) – Data type of output image, e.g. float32’ or np.uint8

Returns

Image with values in new range.

resize(image, w, h, anti_aliasing=False, **kwargs)[source]

Resize image.

Image can be up- or down-sized (using interpolation). For details see: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.resize

>>> image = np.ones((10,5), dtype='uint8')
>>> resize(image, 4, 3)
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=uint8)
Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • w (int) – Width in pixels.

  • h (int) – Height in pixels.

  • anti_aliasing (bool) – Toggle anti aliasing.

  • kwargs (kwargs) – Keyword arguments for the underlying scikit-image resize function, e.g. order=1 for linear interpolation.

Returns

Resized image

Return type

numpy array with range [0,255] and dtype ‘uint8’

rgb2gray(image)[source]

RGB scale image to grayscale image

>>> image = np.eye(3, dtype='uint8') * 255
>>> rgb2gray(image)
array([[255,   0,   0],
       [  0, 255,   0],
       [  0,   0, 255]], dtype=uint8)
Parameters

array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

Returns

grayscale image

Return type

numpy array with range [0,255] and dtype ‘uint8’

rotate(image, angle=0, **kwargs)[source]

Rotate image.

For details see: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.rotate

For a smooth interpolation of images set ‘order=1’. To rotate masks use the default ‘order=0’.

>>> image = np.eye(3, dtype='uint8')
>>> rotate(image, 90)
array([[0, 0, 1],
       [0, 1, 0],
       [1, 0, 0]], dtype=uint8)
Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • angle (float) – Angle in degrees in counter-clockwise direction

  • kwargs (kwargs) – Keyword arguments for the underlying scikit-image rotate function, e.g. order=1 for linear interpolation.

Returns

Rotated image

Return type

numpy array with range [0,255] and dtype ‘uint8’

sample_labeled_patch_centers(mask, value, pshape, n, label)[source]

Randomly pick n points in mask where mask has given value and add label.

Same as imageutil.sample_mask but adds given label to each center

>>> mask = np.zeros((3, 4))
>>> mask[1, 2] = 1
>>> sample_labeled_patch_centers(mask, 1, (1, 1), 1, 0)
array([[1, 2, 0]], dtype=uint16)
Parameters
  • mask (ndarray) – Mask

  • value (int) – Sample points in mask that have this value.

  • pshape (tuple) – Patch shape of form (h,w)

  • n (int) – Number of points to sample. If there is not enough points to sample from a smaller number will be returned. If there are not points at all np.empty((0, 2)) will be returned.

  • label (int) – Numeric label to append to each center point

Returns

Center points of patches within the mask where the center point has the given mask value and the label

Return type

ndarray of shape (n, 3)

sample_mask(mask, value, pshape, n)[source]

Randomly pick n points in mask where mask has given value.

Ensure that only points picked that can be center of a patch with shape pshape that is inside the mask.

>>> mask = np.zeros((3, 4))
>>> mask[1, 2] = 1
>>> sample_mask(mask, 1, (1, 1), 1)
array([[1, 2]], dtype=uint16)
Parameters
  • mask (ndarray) – Mask

  • value (int) – Sample points in mask that have this value.

  • pshape (tuple) – Patch shape of form (h,w)

  • n (int) – Number of points to sample. If there is not enough points to sample from a smaller number will be returned. If there are not points at all np.empty((0, 2)) will be returned.

Returns

Center points of patches within the mask where the center point has the given mask value.

Return type

ndarray of shape (n, 2)

sample_patch_centers(mask, pshape, npos, nneg, pos=255, neg=0)[source]

Sample positive and negative patch centers where mask value is pos or neg.

The sampling routine ensures that the patch is completely inside the mask.

>>> np.random.seed(0)   # just to ensure consistent doctest
>>> mask = np.zeros((3, 4))
>>> mask[1, 2] = 255
>>> sample_patch_centers(mask, (2, 2), 1, 1)
array([[1, 1, 0],
       [1, 2, 1]], dtype=uint16)
Parameters
  • mask (ndarray) – Mask

  • pshape (tuple) – Patch shape of form (h,w)

  • npos (int) – Number of positives to sample.

  • nneg (int) – Number of negatives to sample.

  • pos (int) – Value for positive points in mask

  • neg (int) – Value for negative points in mask

Returns

Center points of patches within the mask where the center point has the given mask value (pos, neg) and the label (1, 0)

Return type

ndarray of shape (n, 3)

sample_pn_patches(image, mask, pshape, npos, nneg, pos=255, neg=0)[source]

Sample positive and negative patches where mask value is pos or neg.

The sampling routine ensures that the patch is completely inside the image and mask and that a patch a the same position is extracted from the image and the mask.

>>> np.random.seed(0)   # just to ensure consistent doctest
>>> mask = np.zeros((3, 4), dtype='uint8')
>>> img = np.reshape(np.arange(12, dtype='uint8'), (3, 4))
>>> mask[1, 2] = 255
>>> for ip, mp, l in sample_pn_patches(img, mask, (2, 2), 1, 1):
...     print(ip)
...     print(mp)
...     print(l)
[[0 1]
 [4 5]]
[[0 0]
 [0 0]]
0
[[1 2]
 [5 6]]
[[  0   0]
 [  0 255]]
1
Parameters
  • mask (ndarray) – Mask

  • pshape (tuple) – Patch shape of form (h,w)

  • npos (int) – Number of positives to sample.

  • nneg (int) – Number of negatives to sample.

  • pos (int) – Value for positive points in mask

  • neg (int) – Value for negative points in mask

Returns

Image and mask patches where the patch center point has the given mask value (pos, neg) and the label (1, 0)

Return type

tuple(image_patch, mask_patch, label)

save_image(filepath, image)[source]

Save numpy array as image (or numpy array) to given filepath.

Supported formats: gif, png, jpg, bmp, tif, npy

Parameters
  • filepath (string) – File path for image file. Extension determines image file format, e.g. .gif

  • array image (numpy) – Numpy array to save as image. Must be of shape (h,w) or (h,w,3) or (h,w,4)

set_default_order(kwargs)[source]

Set order parameter in kwargs for scikit-image functions.

Default order is 1, which performs a linear interpolation of pixel values when images are rotated, resized and sheared. This is fine for images but causes unwanted pixel values in masks. This function set the default order to 0, which disables the interpolation.

Parameters

kwargs (kwargs) – Dictionary with keyword arguments.

shear(image, shear_factor, **kwargs)[source]

Shear image.

For details see: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.AffineTransform

>>> image = np.eye(3, dtype='uint8')
>>> rotated = rotate(image, 45)
Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • shear_factor (float) – Shear factor [0, 1]

  • kwargs (kwargs) – Keyword arguments for the underlying scikit-image warp function, e.g. order=1 for linear interpolation.

Returns

Sheared image

Return type

numpy array with range [0,255] and dtype ‘uint8’

translate(image, dx, dy, **kwargs)[source]

Shift image horizontally and vertically

>>> image = np.eye(3, dtype='uint8') * 255
>>> translate(image, 2, 1)
array([[  0,   0,   0],
       [  0,   0, 255],
       [  0,   0,   0]], dtype=uint8)
Parameters
  • array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.

  • dx – horizontal translation in pixels

  • dy – vertical translation in pixels

  • kwargs (kwargs) – Keyword arguments for the underlying scikit-image rotate function, e.g. order=1 for linear interpolation.

Returns

translated image

Return type

numpy array with range [0,255] and dtype ‘uint8’

nutsml.logger module

class LogCols(filepath, cols=None, colnames=None, reset=True, delimiter=',')[source]

Bases: nutsml.logger.LogToFile

__init__(filepath, cols=None, colnames=None, reset=True, delimiter=',')[source]

Construct logger.

>>> from __future__ import print_function
>>> from nutsflow import Consume
>>> filepath = 'tests/data/temp_logfile.csv'
>>> data = [[1, 2], [3, 4]]
>>> with LogToFile(filepath) as logtofile:
...     data >> logtofile >> Consume()
>>> print(open(filepath).read())
1,2
3,4
>>> logtofile = LogToFile(filepath, cols=(1, 0), colnames=['a', 'b'])
>>> data >> logtofile >> Consume()
>>> print(open(filepath).read())
a,b
2,1
4,3

>>> logtofile.close()
>>> logtofile.delete()
Parameters
  • filepath (string) – Path to file to write log to.

  • cols (int|tuple|None) – Indices of columns of input data to write. None: write all columns int: only write the single given column tuple: list of column indices

  • colnames (tuple|None) – Column names to write in first line. If None no colnames are written.

  • reset (bool) – If True the writing to the log file is reset if the logger is recreated. Otherwise log data is appended to the log file.

  • delimiter (str) – Delimiter for columns in log file.

class LogToFile(filepath, cols=None, colnames=None, reset=True, delimiter=',')[source]

Bases: nutsflow.base.NutFunction

Log columns of data to file.

__call__(x)[source]

Log x

Parameters

x (any) – Any type of data. Special support for numpy arrays.

Returns

Return input unchanged

Return type

Same as input

__init__(filepath, cols=None, colnames=None, reset=True, delimiter=',')[source]

Construct logger.

>>> from __future__ import print_function
>>> from nutsflow import Consume
>>> filepath = 'tests/data/temp_logfile.csv'
>>> data = [[1, 2], [3, 4]]
>>> with LogToFile(filepath) as logtofile:
...     data >> logtofile >> Consume()
>>> print(open(filepath).read())
1,2
3,4
>>> logtofile = LogToFile(filepath, cols=(1, 0), colnames=['a', 'b'])
>>> data >> logtofile >> Consume()
>>> print(open(filepath).read())
a,b
2,1
4,3

>>> logtofile.close()
>>> logtofile.delete()
Parameters
  • filepath (string) – Path to file to write log to.

  • cols (int|tuple|None) – Indices of columns of input data to write. None: write all columns int: only write the single given column tuple: list of column indices

  • colnames (tuple|None) – Column names to write in first line. If None no colnames are written.

  • reset (bool) – If True the writing to the log file is reset if the logger is recreated. Otherwise log data is appended to the log file.

  • delimiter (str) – Delimiter for columns in log file.

close()[source]

Implementation of context manager API

delete()[source]

Delete log file

nutsml.network module

EvalNut(batches, network, metrics, compute, predcol=None)[source]

batches >> EvalNut(network, metrics)

Create nut to evaluate network performance for given metrics. Returned when network.evaluate() is called.

Parameters
  • over batches batches (iterable) – Batches to evaluate

  • network (nutmsml.Network) –

  • of functions metrics (list) – List of functions that compute some metric, e.g. accuracy, F1, kappa-score. Each metric function must take vectors with true and predicted classes/probabilities and must compute the metric over the entire input (not per sample/mini-batch).

  • compute (function) – Function of the form f(metric, targets, preds) that computes the given metric (e.g. mean accuracy) for the given targets and predictions.

  • predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.

Returns

Result(s) of evaluation, e.g. accuracy, precision, …

Return type

float or tuple of floats if there is more than one metric

class KerasNetwork(model, weightspath='weights_keras_net.hd5')[source]

Bases: nutsml.network.Network

Wrapper for Keras models: https://keras.io/

__init__(model, weightspath='weights_keras_net.hd5')[source]

Construct wrapper around Keras model.

Parameters
evaluate(metrics, predcol=None)[source]

Evaluate performance of network for given metrices

>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])  
Parameters
  • metric (list) – List of metrics. See EvalNut for details.

  • predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.

  • targetcol (int) – Index of batch column that contain targets.

Returns

Result for each metric as a tuple or a single float if there is only one metric.

load_weights(weightspath=None)[source]

Load network weights.

network.load_weights()
Parameters

weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.

predict(flatten=True)[source]

Get network predictions

>>> predictions = samples >> batcher >> network.predict() >> Collect()  
Parameters

flatten (bool) – True: return individual predictions instead of batched prediction

Returns

Typically returns softmax class probabilities.

Return type

ndarray

print_layers()[source]

Print description of the network layers

save_weights(weightspath=None)[source]

Save network weights.

network.save_weights()
Parameters

weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.

train(**kwargs)[source]

Train network

>>> train_losses = samples >> batcher >> network.train() >> Collect() 
Returns

Typically returns training loss per batch.

validate(**kwargs)[source]

Validate network

>>> val_losses = samples >> batcher >> network.validate() >> Collect()  
Returns

Typically returns validation loss per batch.

class LasagneNetwork(out_layer, train_fn, val_fn, pred_fn, weightspath='weights_lasagne_net.npz')[source]

Bases: nutsml.network.Network

Wrapper for Lasagne models: https://lasagne.readthedocs.io/en/latest/

__init__(out_layer, train_fn, val_fn, pred_fn, weightspath='weights_lasagne_net.npz')[source]

Construct wrapper around Lasagne network.

Parameters
  • layer out_layer (Lasgane) – Output layer of Lasagne network.

  • function train_fn (Theano) – Training function

  • function val_fn (Theano) – Validation function

  • function pred_fn (Theano) – Prediction function

  • weightspath (string) – Filepath to save/load model weights.

evaluate(metrics, predcol=None)[source]

Evaluate performance of network for given metrices

>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])  
Parameters
  • metric (list) – List of metrics. See EvalNut for details.

  • predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.

  • targetcol (int) – Index of batch column that contain targets.

Returns

Result for each metric as a tuple or a single float if there is only one metric.

load_weights(weightspath=None)[source]

Load network weights.

network.load_weights()
Parameters

weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.

predict(flatten=True)[source]

Get network predictions

>>> predictions = samples >> batcher >> network.predict() >> Collect()  
Parameters

flatten (bool) – True: return individual predictions instead of batched prediction

Returns

Typically returns softmax class probabilities.

Return type

ndarray

print_layers()[source]

Print description of the network layers

save_weights(weightspath=None)[source]

Save network weights.

network.save_weights()
Parameters

weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.

train(**kwargs)[source]

Train network

>>> train_losses = samples >> batcher >> network.train() >> Collect() 
Returns

Typically returns training loss per batch.

validate(**kwargs)[source]

Validate network

>>> val_losses = samples >> batcher >> network.validate() >> Collect()  
Returns

Typically returns validation loss per batch.

class Network(weightspath)[source]

Bases: object

Abstract base class for networks. Allows to wrap existing network APIs such as Lasagne, Keras or Pytorch into an API that enables direct usage of the network as a Nut in a nuts flow.

__init__(weightspath)[source]

Constructs base wrapper for networks.

Parameters

weightspath (string) – Filepath where network weights are saved to and loaded from.

evaluate(metrics, predcol=None, targetcol=- 1)[source]

Evaluate performance of network for given metrices

>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])  
Parameters
  • metric (list) – List of metrics. See EvalNut for details.

  • predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.

  • targetcol (int) – Index of batch column that contain targets.

Returns

Result for each metric as a tuple or a single float if there is only one metric.

load_weights(weightspath=None)[source]

Load network weights.

network.load_weights()
Parameters

weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.

predict(flatten=True)[source]

Get network predictions

>>> predictions = samples >> batcher >> network.predict() >> Collect()  
Parameters

flatten (bool) – True: return individual predictions instead of batched prediction

Returns

Typically returns softmax class probabilities.

Return type

ndarray

print_layers()[source]

Print description of the network layers

save_best(score, isloss=True)[source]

Save weights of best network

Parameters
  • score (float) – Score of the network, e.g. loss, accuracy

  • isloss (bool) – True means lower score is better, e.g. loss and the network with the lower score score is saved.

save_weights(weightspath=None)[source]

Save network weights.

network.save_weights()
Parameters

weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.

train()[source]

Train network

>>> train_losses = samples >> batcher >> network.train() >> Collect() 
Returns

Typically returns training loss per batch.

validate()[source]

Validate network

>>> val_losses = samples >> batcher >> network.validate() >> Collect()  
Returns

Typically returns validation loss per batch.

PredictNut(batches, func, flatten=True)[source]

batches >> PredictNut(func)

Create nut to perform network predictions.

Parameters
  • over batches batches (iterable) – Batches to create predictions for.

  • func (function) – Prediction function

  • flatten (bool) – True: flatten output. Instead of returning batch of predictions return individual predictions

Returns

Result(s) of prediction

Return type

typically array with class probabilities (softmax vector)

class PytorchNetwork(model, weightspath='weights_pytorch_net.pt')[source]

Bases: nutsml.network.Network

Wrapper for Pytorch models: https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html

__init__(model, weightspath='weights_pytorch_net.pt')[source]

Construct wrapper around Pytorch model.

Parameters
  • model model (Pytorch) – Pytorch model to wrap. model needs to have three attributes: | model.device:, e.g ‘cuda:0’ or ‘cpu’ | model.optimizer: e.g. torch.optim.SGD | model.losses: (list of) loss functions, e.g. F.cross_entropy

  • weightspath (string) – Filepath to save/load model weights.

evaluate(metrics, predcol=None)[source]

Evaluate performance of network for given metrices

>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])  
Parameters
  • metric (list) – List of metrics. See EvalNut for details.

  • predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.

  • targetcol (int) – Index of batch column that contain targets.

Returns

Result for each metric as a tuple or a single float if there is only one metric.

load_weights(weightspath=None)[source]

Load network weights.

network.load_weights()
Parameters

weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.

predict(flatten=True)[source]

Get network predictions

>>> predictions = samples >> batcher >> network.predict() >> Collect()  
Parameters

flatten (bool) – True: return individual predictions instead of batched prediction

Returns

Typically returns softmax class probabilities.

Return type

ndarray

print_layers(input_shape=None)[source]

Print network architecture (and layer dimensions).

Parameters

input_shape (tuple|None) – (C, H, W) or None If None, layer dimensions and param numbers are not printed.

save_weights(weightspath=None)[source]

Save network weights.

network.save_weights()
Parameters

weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.

train(**kwargs)[source]

Train network

>>> train_losses = samples >> batcher >> network.train() >> Collect() 
Returns

Typically returns training loss per batch.

validate(**kwargs)[source]

Validate network

>>> val_losses = samples >> batcher >> network.validate() >> Collect()  
Returns

Typically returns validation loss per batch.

TrainValNut(batches, func, **kwargs)[source]

batches >> TrainValNut(func, **kwargs)

Create nut to train or validate a network.

Parameters
  • over batches batches (iterable) – Batches to train/validate.

  • func (function) – Training or validation function of network.

  • kwargs (kwargs) – Keyword arguments passed on to function.

Returns

Result(s) of training/validation function, e.g. loss, accuracy, …

Return type

float or array/tuple of floats

nutsml.plotter module

class PlotLines(ycols, xcols=None, layout=(1, None), titles=None, every_sec=0, every_n=0, filterfunc=<function PlotLines.<lambda>>, figsize=None, filepath=None)[source]

Bases: nutsflow.base.NutFunction

Plot line graph for selected data columns.

__call__(data)[source]

Plot data

__init__(ycols, xcols=None, layout=(1, None), titles=None, every_sec=0, every_n=0, filterfunc=<function PlotLines.<lambda>>, figsize=None, filepath=None)[source]

iterable >> PlotLines(ycols) >> Consume()

>>> import os
>>> import numpy as np
>>> from nutsflow import Consume
>>> fp = 'tests/data/temp_plotter.png'
>>> xs = np.arange(0, 6.3, 1.2)
>>> ysin, ycos = np.sin(xs),  np.cos(xs)
>>> data = zip(xs, ysin, ycos)
>>> data >> PlotLines(1, 0, filepath=fp) >> Consume()
>>> list(ycos) >> PlotLines(0, filepath=fp) >> Consume()
>>> data >> PlotLines(ycols=(1,2), filepath=fp) >> Consume()
>>> ysin.tolist() >> PlotLines(ycols=None, filepath=fp) >> Consume()
>>> if os.path.exists(fp): os.remove(fp)
Parameters
  • ycols (int|tuple|None) – Index or tuple of indices of the data columns that contain the y-data for the plot. If None data is used directly.

  • xcols (int|tuple|function|iterable|None) – Index or tuple of indices of the data columns that contain the x-data for the plot. Alternatively an iterator or a function can be provided that generates the x-data for the plot, e.g. xcols = itertools.count() or xcols = lambda: epoch For xcols==None, itertools.count() will be used.

  • layout (tuple) – Rows and columns of the plotter layout., e.g. a layout of (2,3) means that 6 plots in the data are arranged in 2 rows and 3 columns. Number of cols can be None is then derived from ycols

  • every_sec (float) – Plot every given second, e.g. to plot every 2.5 sec every_sec = 2.5

  • every_n (int) – Plot every n-th call.

  • filterfunc (function) – Boolean function to filter plot data.

  • figsize (tuple) – Figure size in inch.

  • filepath – Path to a file to draw plot to. If provided the plot will not appear on the screen.

Returns

Returns input unaltered

Return type

any

reset()[source]

Reset plot data

nutsml.reader module

ReadImage(sample, columns, pathfunc=None, as_grey=False, dtype='uint8')[source]

Load images from filesystem for samples.

Loads images in jpg, gif, png, tif and bmp format. Images are returned as numpy arrays of shape (h, w, c) or (h, w) for color images or gray scale images respectively. See nutsml.imageutil.load_image for details.

Note that the loaded images replace the image file name|path in the sample. If the images file paths are directly proved (not as a tuple sample) still tuples with the loaded image are returned.

>>> from nutsflow import Consume, Collect
>>> from nutsml import PrintColType
>>> images = ['tests/data/img_formats/nut_color.gif']
>>> images >> ReadImage(None) >> PrintColType() >> Consume()
item 0: <tuple>
  0: <ndarray> shape:213x320x3 dtype:uint8 range:0..255
>>> samples = [('tests/data/img_formats/nut_color.gif', 'class0')]
>>> img_samples = samples >> ReadImage(0) >> Collect()
>>> imagepath = 'tests/data/img_formats/*.gif'
>>> samples = [(1, 'nut_color'), (2, 'nut_grayscale')]
>>> samples >> ReadImage(1, imagepath) >> PrintColType() >> Consume()
item 0: <tuple>
  0: <int> 1
  1: <ndarray> shape:213x320x3 dtype:uint8 range:0..255
item 1: <tuple>
  0: <int> 2
  1: <ndarray> shape:213x320 dtype:uint8 range:20..235
>>> pathfunc = lambda s: 'tests/data/img_formats/{1}.jpg'.format(*s)
>>> img_samples = samples >> ReadImage(1, pathfunc) >> Collect()
Parameters
  • sample (tuple|list) – (‘nut_color’, 1)

  • columns (None|int|tuple) – Indices of columns in sample to be replaced by image (based on image id in that column) If None then a flat samples is assumed and a tuple with the image is returned.

  • pathfunc (string|function|None) – Filepath with wildcard ‘*’, which is replaced by the imageid provided in the sample, e.g. ‘tests/data/img_formats/*.jpg’ for sample (‘nut_grayscale’, 2) will become ‘tests/data/img_formats/nut_grayscale.jpg’ or Function to compute path to image file from sample, e.g. lambda sample: ‘tests/data/img_formats/{1}.jpg’.format(*sample) or None, in this case the image id is taken as the filepath.

  • as_grey (bool) – If true, load as grayscale image.

  • dtype (dtype) – Numpy data type of the image.

Returns

Sample with image ids replaced by image (=ndarray) of shape (h, w, c) or (h, w)

Return type

tuple

ReadLabelDirs(basedir, filepattern='*', exclude='_*')[source]

Read file paths from label directories.

Typically used when classification data is organized in folders, where the folder name represents the class label and the files in the folder the data samples (images, documents, …) for that class.

>>> from __future__ import print_function
>>> from nutsflow import Sort
>>> read = ReadLabelDirs('tests/data/labeldirs', '*.txt')
>>> samples = read >> Sort()
>>> for sample in samples:
...     print(sample)
...
('tests/data/labeldirs/0/test0.txt', '0')
('tests/data/labeldirs/1/test1.txt', '1')
('tests/data/labeldirs/1/test11.txt', '1')
Parameters
  • basedir (string) – Path to folder that contains label directories.

  • filepattern (string) – Pattern for filepaths to read from label directories, e.g. ‘.jpg’, ‘.txt’

  • exclude (string) – Pattern for label directories to exclude. Default is ‘_*’ which excludes all label folders prefixed with ‘_’.

Returns

iterator over labeled file paths

Return type

iterator

ReadNumpy(sample, columns, pathfunc=None, allow_pickle=False)[source]

Load numpy arrays from filesystem.

Note that the loaded numpy array replace the file name|path in the sample.

>>> from nutsflow import Consume, Collect, PrintType
>>> samples = ['tests/data/img_arrays/nut_color.jpg.npy']
>>> samples >> ReadNumpy(None) >> PrintType() >> Consume()
(<ndarray> 213x320x3:uint8)
>>> samples = [('tests/data/img_arrays/nut_color.jpg.npy', 'class0')]
>>> samples >> ReadNumpy(0) >> PrintType() >> Consume()
(<ndarray> 213x320x3:uint8, <str> class0)
>>> filepath = 'tests/data/img_arrays/*.jpg.npy'
>>> samples = [(1, 'nut_color'), (2, 'nut_grayscale')]
>>> samples >> ReadNumpy(1, filepath) >> PrintType() >> Consume()
(<int> 1, <ndarray> 213x320x3:uint8)
(<int> 2, <ndarray> 213x320:uint8)
>>> pathfunc = lambda s: 'tests/data/img_arrays/{1}.jpg.npy'.format(*s)
>>> samples >> ReadNumpy(1, pathfunc) >> PrintType() >> Consume()
(<int> 1, <ndarray> 213x320x3:uint8)
(<int> 2, <ndarray> 213x320:uint8)
Parameters
  • sample (tuple|list) – (‘nut_data’, 1)

  • columns (None|int|tuple) – Indices of columns in sample to be replaced by numpy array (based on fileid in that column) If None then a flat samples is assumed and a tuple with the numpy array is returned.

  • pathfunc (string|function|None) – Filepath with wildcard ‘*’, which is replaced by the file id/name provided in the sample, e.g. ‘tests/data/img_arrays/*.jpg.npy’ for sample (‘nut_grayscale’, 2) will become ‘tests/data/img_arrays/nut_grayscale.jpg.npy’ or Function to compute path to numnpy file from sample, e.g. lambda sample: ‘tests/data/img_arrays/{1}.jpg.npy’.format(*sample) or None, in this case the file id/name is taken as the filepath.

:param bool allow_pickle : Allow loading pickled object arrays in npy files. :return: Sample with file ids/names replaced by numpy arrays. :rtype: tuple

class ReadPandas(filepath, rows=None, colnames=None, dropnan=True, replacenan=False, rowname='Row', **kwargs)[source]

Bases: nutsflow.base.NutSource

Read data as Pandas table from file system.

__init__(filepath, rows=None, colnames=None, dropnan=True, replacenan=False, rowname='Row', **kwargs)[source]

Create reader for Pandas tables.

The reader returns the table contents as an interator over named tuples, where the column names are derived from the table columns. The order and selection of columns can be changed.

>>> from nutsflow import Collect, Consume, Print
>>> filepath = 'tests/data/pandas_table.csv'
>>> ReadPandas(filepath) >> Print() >> Consume()
Row(col1=1.0, col2=4.0)
Row(col1=3.0, col2=6.0)
>>> (ReadPandas(filepath, dropnan=False, rowname='Sample') >>
... Print() >> Consume())
Sample(col1=1.0, col2=4.0)
Sample(col1=2.0, col2=nan)
Sample(col1=3.0, col2=6.0)
>>> ReadPandas(filepath, replacenan=None) >> Print() >> Consume()
Row(col1=1.0, col2=4.0)
Row(col1=2.0, col2=None)
Row(col1=3.0, col2=6.0)
>>> colnames=['col2', 'col1']   # swap order
>>> ReadPandas(filepath, colnames=colnames) >> Print() >> Consume()
Row(col2=4.0, col1=1.0)
Row(col2=6.0, col1=3.0)
>>> ReadPandas(filepath, rows='col1 > 1', replacenan=0) >> Collect()
[Row(col1=2.0, col2=0), Row(col1=3.0, col2=6.0)]
Parameters
  • filepath (str) – Path to a table in CSV, TSV, XLSX or Pandas pickle format. Depending on file extension (e.g. .csv) the table format is picked. Note tables must have a header with the column names.

  • rows (str) – Rows to filter. Any Pandas filter expression. If rows = None all rows of the table are returned.

  • columns (list) – List of names for the table columns to return. For columns = None all columns are returned.

  • dropnan (bool) – If True all rows that contain NaN are dropped.

  • replacenan (object) – If not False all NaNs are replaced by the value of replacenan

  • rowname (str) – Name of named tuple return as rows.

  • kwargs (kwargs) – Key word arguments passed on the the Pandas methods for data reading, e.g, header=None. See pandas/pandas/io/parsers.py for detais

static isnull(value)[source]

Return true if values is NaN or None.

>>> import numpy as np
>>> ReadPandas.isnull(np.NaN)
True
>>> ReadPandas.isnull(None)
True
>>> ReadPandas.isnull(0)
False
Parameters

value – Value to test

Returns

Return true for NaN or None values.

Return type

bool

nutsml.stratify module

CollectStratified(iterable, labelcol, mode='downrnd', container=<class 'list'>, rand=None)[source]
iterable >> CollectStratified(labelcol, mode=’downrnd’, container=list,

rand=rnd.Random())

Collects samples in a container and stratifies them by either randomly down-sampling classes or up-sampling classes by duplicating samples.

>>> from nutsflow import Collect
>>> samples = [('pos', 1), ('pos', 1), ('neg', 0)]
>>> samples >> CollectStratified(1) >> Sort()
[('neg', 0), ('pos', 1)]
Parameters
  • over tuples iterable (iterable) – Iterable of tuples where column labelcol contains a sample label that is used for stratification

  • labelcol (int) – Column of tuple/samples that contains the label

  • mode (string) – ‘downrnd’ : randomly down-sample ‘up’ : up-sample

  • container (container) – Some container, e.g. list, set, dict that can be filled from an iterable

  • rand (Random|None) – Random number generator used for sampling. If None, random.Random() is used.

Returns

Stratified samples

Return type

List of tuples

Stratify(iterable, labelcol, labeldist, rand=None)[source]

iterable >> Stratify(labelcol, labeldist, rand=None)

Stratifies samples by randomly down-sampling according to the given label distribution. In detail: samples belonging to the class with the smallest number of samples are returned with probability one. Samples from other classes are randomly down-sampled to match the number of samples in the smallest class.

Note that in contrast to SplitRandom, which generates the same random split per default, Stratify generates different stratifications. Furthermore, while the downsampling is random the order of samples remains the same!

While labeldist needs to be provided or computed upfront the actual stratification occurs online and only one sample per time is stored in memory.

>>> from nutsflow import Collect, CountValues
>>> from nutsflow.common import StableRandom
>>> fix = StableRandom(1)  # Stable random numbers for doctest
>>> samples = [('pos', 1), ('pos', 1), ('neg', 0)]
>>> labeldist = samples >> CountValues(1)
>>> samples >> Stratify(1, labeldist, rand=fix) >> Sort()
[('neg', 0), ('pos', 1)]
Parameters
  • over tuples iterable (iterable) – Iterable of tuples where column labelcol contains a sample label that is used for stratification

  • labelcol (int) – Column of tuple/samples that contains the label,

  • labeldist (dict) – Dictionary with numbers of different labels, e.g. {‘good’:12, ‘bad’:27, ‘ugly’:3}

  • rand (Random|None) – Random number generator used for down-sampling. If None, random.Random() is used.

Returns

Stratified samples

Return type

Generator over tuples

nutsml.transformer module

class AugmentImage(imagecols, rand=None)[source]

Bases: nutsflow.base.Nut

Random augmentation of images in samples

__init__(imagecols, rand=None)[source]

samples >> AugmentImage(imagecols, rand=None)

Randomly augment images, e.g. changing contrast. See TransformImage for a full list of available augmentations. Every transformation can be used as an augmentation. Note that the same (random) augmentation is applied to all images specified in imagecols. This ensures that an image and its mask are randomly rotated by the same angle, for instance.

>>> augment_img = (AugmentImage(0)
...     .by('identical', 1.0)
...     .by('brightness', 0.5, [0.7, 1.3])
...     .by('contrast', 0.5, [0.7, 1.3])
...     .by('fliplr', 0.5)
...     .by('flipud', 0.5)
...     .by('occlude', 0.5, [0, 1], [0, 1],[0.1, 0.5], [0.1, 0.5])
...     .by('rotate', 0.5, [0, 360]))

See nutsml.transformer.TransformImage.by() for full list of available augmentations.

Note that each augmentation is applied independently. This is in contrast to transformations which are applied in sequence and result in one image. Augmentation on the other hand are randomly applied and can result in many images. However, augmenters can be chained to achieve combinations of augmentation, e.g. contrast or brightness combined with rotation or shearing:

>>> augment1 = (AugmentImage(0)
...     .by('brightness', 0.5, [0.7, 1.3])
...     .by('contrast', 0.5, [0.7, 1.3]))
>>> augment2 = (AugmentImage(0)
...     .by('shear', 0.5, [0, 0.2])
...     .by('rotate', 0.5, [0, 360]))
>>> samples >> augment1 >> augment2 >> Consume()  
Parameters
  • imagecols (int|tuple) – Indices of sample columns that contain images.

  • rand (Random|None) – Random number generator. If None, random.Random() is used.

__rrshift__(iterable)[source]

Apply augmentation to samples in iterable.

Parameters

iterable (iterable) – Samples

Returns

iterable with augmented samples

Return type

generator

by(name, prob, *ranges, **kwargs)[source]

Specify and add augmentation to be performed.

>>> augment_img = AugmentImage(0).by('rotate', 0.5, [0, 360])
Parameters
  • name (string) – Name of the augmentation/transformation, e.g. ‘rotate’

  • prob (float|int) – If prob <= 1: probability [0,1] that the augmentation is applied If prob > 1: number of times augmentation is applied.

  • of lists ranges (list) –

    Lists with ranges for each argument of the augmentation, e.g. [0, 360] degrees, where parameters are

    randomly sampled from.

  • kwargs (kwargs) – Keyword arguments passed on the the augmentation.

Returns

instance of AugmentImage

Return type

AugmentImage

ImageAnnotationToMask(iterable, imagecol, annocol)[source]

samples >> ImageAnnotationToMask(imagecol, annocol)

Create mask for image annotation. Annotation are of the following formats. See imageutil.annotation2coords for details. (‘point’, ((x, y), … )) (‘circle’, ((x, y, r), …)) (‘rect’, ((x, y, w, h), …)) (‘polyline’, (((x, y), (x, y), …), …))

>>> import numpy as np
>>> from nutsflow import Collect
>>> img = np.zeros((3, 3), dtype='uint8')
>>> anno = ('point', ((0, 1), (2, 0)))
>>> samples = [(img, anno)]
>>> masks = samples >> ImageAnnotationToMask(0, 1) >> Collect()
>>> print(masks[0][1])
[[  0   0 255]
 [255   0   0]
 [  0   0   0]]
Parameters
  • iterable (iterable) – Samples with images and annotations

  • imagecol (int) – Index of sample column that contain image

  • annocol (int) – Index of sample column that contain annotation

Returns

Iterator over samples where annotations are replaced by masks

Return type

generator

class ImageChannelMean(imagecol, filepath='image_channel_means.npy', means=None)[source]

Bases: nutsflow.base.NutFunction

Compute, save per-channel means over images and subtract from images.

__call__(sample)[source]

Subtract per-channel mean from images in samples.

sub_mean = ImageChannelMean(imagecol, filepath=’means.npy’) samples >> sub_mean >> Consume()

sub_mean = ImageChannelMean(imagecol, means=[197, 87, 101]) samples >> sub_mean >> Consume()

Parameters

sample (tuple) – Sample that contains an image (at imagecol).

Returns

Sample with image where mean is subtracted. Note that image will not be of dtype uint8 and in range [0,255] anymore!

Return type

tuple

__init__(imagecol, filepath='image_channel_means.npy', means=None)[source]
samples >> ImageChannelMean(imagecol,

filepath=’image_channel_means.npy’, means=None)

Construct ImageChannelMean nut.

Parameters
  • imagecol (int) – Index of sample column that contain image

  • filepath (string) – Path to file were mean values are saved and loaded from.

  • means (list|tuple) – Mean values can be provided directly. In this case filepath will be ignored and training is not necessary.

train()[source]

Compute per-channel mean over images in samples.

sub_mean = ImageChannelMean(imagecol, filepath) samples >> sub_mean.train() >> Consume()

Returns

Input samples are returned unchanged

Return type

tuple

class ImageMean(imagecol, filepath='image_means.npy')[source]

Bases: nutsflow.base.NutFunction

Compute, save mean over images and subtract from images.

__call__(sample)[source]

Subtract mean from images in samples.

sub_mean = ImageMean(imagecol, filepath) samples >> sub_mean >> Consume()

Parameters

sample (tuple) – Sample that contains an image (at imagecol).

Returns

Sample with image where mean is subtracted. Note that image will not be of dtype uint8 and in range [0,255] anymore!

Return type

tuple

__init__(imagecol, filepath='image_means.npy')[source]

samples >> ImageMean(imagecol, filepath=’image_means.npy’)

Construct ImageMean nut.

Parameters
  • imagecol (int) – Index of sample column that contain image

  • filepath (string) – Path to file were mean values are saved and loaded from.

train()[source]

Compute mean over images in samples.

sub_mean = ImageMean(imagecol, filepath) samples >> sub_mean.train() >> Consume()

Returns

Input samples are returned unchanged

Return type

tuple

ImagePatchesByAnnotation(iterable, imagecol, annocol, pshape, npos, nneg=<function <lambda>>, pos=255, neg=0, retlabel=True)[source]
samples >> ImagePatchesByAnnotation(imagecol, annocol, pshape, npos,

nneg=lambda npos: npos, pos=255, neg=0, retlabel=True)

Randomly sample positive/negative patches from image based on annotation. See imageutil.annotation2coords for annotation format. A patch is positive if its center point is within the annotated region and is negative otherwise.

>>> import numpy as np
>>> np.random.seed(0)    # just to ensure stable doctest
>>> img = np.reshape(np.arange(25), (5, 5))
>>> anno = ('point', ((3, 2), (2, 3),))
>>> samples = [(img, anno)]
>>> getpatches = ImagePatchesByAnnotation(0, 1, (3, 3), 1, 1)
>>> for (p, l) in samples >> getpatches:
...     print(p.tolist(), l)
[[12, 13, 14], [17, 18, 19], [22, 23, 24]] 0
[[11, 12, 13], [16, 17, 18], [21, 22, 23]] 1
[[7, 8, 9], [12, 13, 14], [17, 18, 19]] 1
Parameters
  • iterable (iterable) – Samples with images

  • imagecol (int) – Index of sample column that contain image

  • annocol (int) – Index of sample column that contain annotation

  • pshape (tuple) – Shape of patch

  • npos (int) – Number of positive patches to sample

  • nneg (int|function) – Number of negative patches to sample or a function hat returns the number of negatives based on number of positives.

  • pos (int) – Mask value indicating positives

  • neg (int) – Mask value indicating negatives

  • retlabel (bool) – True return label, False return mask patch

Returns

Iterator over samples where images are replaced by image patches and masks are replaced by labels [0,1] or mask patches

Return type

generator

ImagePatchesByMask(iterable, imagecol, maskcol, pshape, npos, nneg=<function <lambda>>, pos=255, neg=0, retlabel=True)[source]
samples >> ImagePatchesByMask(imagecol, maskcol, pshape, npos,

nneg=lambda npos: npos, pos=255, neg=0, retlabel=True)

Randomly sample positive/negative patches from image based on mask.

A patch is positive if its center point has the value ‘pos’ in the mask (corresponding to the input image) and is negative for value ‘neg’ The mask must be of same size as image.

>>> 
>>> import numpy as np
>>> np.random.seed(0)    # just to ensure stable doctest
>>> img = np.reshape(np.arange(25), (5, 5))
>>> mask = np.eye(5, dtype='uint8') * 255
>>> samples = [(img, mask)]
>>> getpatches = ImagePatchesByMask(0, 1, (3, 3), 2, 1)
>>> for (p, l) in samples >> getpatches:
...     print(p.tolist(), l)
[[10, 11, 12], [15, 16, 17], [20, 21, 22]] 0
[[12, 13, 14], [17, 18, 19], [22, 23, 24]] 1
[[6, 7, 8], [11, 12, 13], [16, 17, 18]] 1
>>> np.random.seed(0)    # just to ensure stable doctest
>>> patches = ImagePatchesByMask(0, 1, (3, 3), 1, 1, retlabel=False)
>>> for (p, m) in samples >> getpatches:
...     print(p.tolist(), l)
[[10, 11, 12], [15, 16, 17], [20, 21, 22]] 1
[[12, 13, 14], [17, 18, 19], [22, 23, 24]] 1
[[6, 7, 8], [11, 12, 13], [16, 17, 18]] 1
Parameters
  • iterable (iterable) – Samples with images

  • imagecol (int) – Index of sample column that contain image

  • maskcol (int) – Index of sample column that contain mask

  • pshape (tuple) – Shape of patch

  • npos (int) – Number of positive patches to sample

  • nneg (int|function) – Number of negative patches to sample or a function hat returns the number of negatives based on number of positives.

  • pos (int) – Mask value indicating positives

  • neg (int) – Mask value indicating negatives

  • retlabel (bool) – True return label, False return mask patch

Returns

Iterator over samples where images are replaced by image patches and masks are replaced by labels [0,1] or mask patches

Return type

generator

RandomImagePatches(iterable, imagecols, pshape, npatches)[source]

samples >> RandomImagePatches(imagecols, shape, npatches)

Extract patches at random locations from images.

>>> import numpy as np
>>> np.random.seed(0)    # just to ensure stable doctest
>>> img = np.reshape(np.arange(30), (5, 6))
>>> samples = [(img, 0)]
>>> getpatches = RandomImagePatches(0, (2, 3), 3)
>>> for (p, l) in samples >> getpatches:
...     print(p.tolist(), l)
[[7, 8, 9], [13, 14, 15]] 0
[[8, 9, 10], [14, 15, 16]] 0
[[8, 9, 10], [14, 15, 16]] 0
Parameters
  • iterable (iterable) – Samples with images

  • imagecols (int|tuple) – Indices of sample columns that contain images, where patches are extracted from. Images must be numpy arrays of shape h,w,c or h,w

  • shape (tuple) – Shape of patch (h,w)

  • npatches (int) – Number of patches to extract (per image)

Returns

Iterator over samples where images are replaced by patches.

Return type

generator

RegularImagePatches(iterable, imagecols, pshape, stride)[source]

samples >> RegularImagePatches(imagecols, shape, stride)

Extract patches in a regular grid from images.

>>> import numpy as np
>>> img = np.reshape(np.arange(12), (3, 4))
>>> samples = [(img, 0)]
>>> getpatches = RegularImagePatches(0, (2, 2), 2)
>>> for p in samples >> getpatches:
...     print(p)
(array([[0, 1],
       [4, 5]]), 0)
(array([[2, 3],
       [6, 7]]), 0)
Parameters
  • iterable (iterable) – Samples with images

  • imagecols (int|tuple) – Indices of sample columns that contain images, where patches are extracted from. Images must be numpy arrays of shape h,w,c or h,w

  • shape (tuple) – Shape of patch (h,w)

  • stride (int) – Step size of grid patches are extracted from

Returns

Iterator over samples where images are replaced by patches.

Return type

generator

class TransformImage(imagecols)[source]

Bases: nutsflow.base.NutFunction

Transformation of images in samples.

__call__(sample)[source]

Apply transformation to sample.

Parameters

sample (tuple) – Sample

Returns

Transformed sample

Return type

tuple

__init__(imagecols)[source]

samples >> TransformImage(imagecols)

Images are expected to be numpy arrays of the shape (h, w, c) or (h, w) with a range of [0,255] and a dtype of uint8. Transformation should result in images with the same properties.

>>> transform = TransformImage(0).by('resize', 10, 20)
Parameters
  • imagecols (int|tuple) – Indices of sample columns the transformation should be applied to. Can be a single index or a tuple of indices.

  • transspec (tuple) – Transformation specification. Either a tuple with the name of the transformation function or a tuple with the name, arguments and keyword arguments of the transformation function. The list of argument values and dictionaries provided in the transspec are simply passed on to the transformation function. See the relevant functions for details.

by(name, *args, **kwargs)[source]

Specify and add transformations to be performed.

>>> transform = TransformImage(0).by('resize', 10, 20).by('fliplr')
Available transformations:
rerange (old_min, old_max, new_min, new_max, dtype)
crop (x1, y1, x2, y2)
resize (w, h)
translate (dx, dy)
rotate (angle)
contrast (contrast)
sharpness (sharpness)
brightness (brightness)
color (color)
edges (sigma)
shear (shear_factor)
elastic (smooth, scale, seed)
occlude (x, y, w, h)
Parameters
  • name (string) – Name of the transformation to apply, e.g. ‘resize’

  • args (args) – Arguments for the transformation, e.g. width and height for resize.

  • kwargs (kwargs) – Keyword arguments passed on to the transformation

Returns

instance of TransformImage with added transformation

Return type

TransformImage

classmethod register(name, transformation)[source]

Register new transformation function.

>>> brighter = lambda image, c: image * c
>>> TransformImage.register('brighter', brighter)
>>> transform = TransformImage(0).by('brighter', 1.5)
Parameters
  • name (string) – Name of transformation

  • transformation (function) – Transformation function.

transformations = {'brightness': <function change_brightness>, 'color': <function change_color>, 'contrast': <function change_contrast>, 'crop': <function crop>, 'crop_center': <function crop_center>, 'crop_square': <function crop_square>, 'edges': <function extract_edges>, 'elastic': <function distort_elastic>, 'fliplr': <function fliplr>, 'flipud': <function flipud>, 'gray2rgb': <function gray2rgb>, 'identical': <function identical>, 'normalize_histo': <function normalize_histo>, 'occlude': <function occlude>, 'rerange': <function rerange>, 'resize': <function resize>, 'rgb2gray': <function rgb2gray>, 'rotate': <function rotate>, 'sharpness': <function change_sharpness>, 'shear': <function shear>, 'translate': <function translate>}
map_transform(sample, imagecols, spec)[source]

Map transformation function on columns of sample.

Parameters
  • sample (tuple) – Sample with images

  • imagecols (int|tuple) – Indices of sample columns the transformation should be applied to. Can be a single index or a tuple of indices.

  • spec (tuple) – Transformation specification. Either a tuple with the name of the transformation function or a tuple with the name, arguments and keyword arguments of the transformation function.

Returns

Sample with transformations applied. Columns not specified remain unchained.

Return type

tuple

nutsml.viewer module

class ViewImage(imgcols, layout=(1, None), figsize=None, pause=0.0001, axis_off=False, labels_off=False, titles=None, every_sec=0, every_n=0, **imargs)[source]

Bases: nutsflow.base.NutFunction

Display images in window.

__call__(data)[source]

View the images in data

Parameters

data (tuple) – Data with images at imgcols.

Returns

unchanged input data

Return type

tuple

__init__(imgcols, layout=(1, None), figsize=None, pause=0.0001, axis_off=False, labels_off=False, titles=None, every_sec=0, every_n=0, **imargs)[source]

iterable >> ViewImage(imgcols, layout=(1, None), figsize=None, **plotargs)

Images should be numpy arrays in one of the following formats:
MxN - luminance (grayscale, float array only)
MxNx3 - RGB (float or uint8 array)
MxNx4 - RGBA (float or uint8 array)

Shapes with single-dimension axis are supported but not encouraged, e.g. MxNx1 will be converted to MxN.

See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow

>>> from nutsflow import Consume
>>> from nutsml import ReadImage
>>> imagepath = 'tests/data/img_formats/*.jpg'
>>> samples = [(1, 'nut_color'), (2, 'nut_grayscale')]
>>> read_image = ReadImage(1, imagepath)
>>> samples >> read_image >> ViewImage(1) >> Consume() 
>>> view_gray = ViewImage(1, cmap='gray')
>>> samples >> read_image >> view_gray >> Consume() 
Parameters
  • imgcols (int|tuple|None) – Index or tuple of indices of data columns containing images (ndarray). Use None if images are provided directly, e.g. [img1, img2, …] >> ViewImage(None) >> Consume()

  • layout (tuple) – Rows and columns of the viewer layout., e.g. a layout of (2,3) means that 6 images in the data are arranged in 2 rows and 3 columns. Number of cols can be None is then derived from imgcols

  • figsize (tuple) – Figure size in inch.

  • pause (float) – Waiting time in seconds after each plot. Pressing a key skips the waiting time.

  • axis_off (bool) – Enable or disable display of figure axes.

  • lables_off (bool) – Enable or disable display of axes labels.

  • every_sec (float) – View every given second, e.g. to print every 2.5 sec every_sec = 2.5

  • every_n (int) – View every n-th call.

  • imargs (kwargs) – Keyword arguments passed on to matplotlib’s imshow() function, e.g. cmap=’gray’. See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow

class ViewImageAnnotation(imgcol, annocols, figsize=None, pause=0.0001, interpolation=None, **annoargs)[source]

Bases: nutsflow.base.NutFunction

Display images and annotation in window.

SHAPEPROP = {'edgecolor': 'y', 'facecolor': 'none', 'linewidth': 1}
TEXTPROP = {'backgroundcolor': (1, 1, 1, 0.5), 'edgecolor': 'k'}
__call__(data)[source]

View the image and its annotation

Parameters

data (tuple) – Data with image at imgcol and annotation at annocol.

Returns

unchanged input data

Return type

tuple

__init__(imgcol, annocols, figsize=None, pause=0.0001, interpolation=None, **annoargs)[source]
iterable >> ViewImageAnnotation(imgcol, annocols, figsize=None,

pause, interpolation, **annoargs)

Images must be numpy arrays in one of the following formats:
MxN - luminance (grayscale, float array only)
MxNx3 - RGB (float or uint8 array)
MxNx4 - RGBA (float or uint8 array)
See

Shapes with single-dimension axis are supported but not encouraged, e.g. MxNx1 will be converted to MxN.

Parameters
  • imgcol (int) – Index of data column that contains the image

  • annocols (int|tuple) – Index or tuple of indices specifying the data column(s) that contain annotation (labels, or geometry)

  • figsize (tuple) – Figure size in inch.

  • pause (float) – Waiting time in seconds after each plot. Pressing a key skips the waiting time.

  • interpolation (string) – Interpolation for imshow, e.g. ‘nearest’, ‘bilinear’, ‘bicubic’. for details see http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot .imshow

  • annoargs (kwargs) – Keyword arguments for visual properties of annotation, e.g. edgecolor=’y’, linewidth=1

nutsml.writer module

class WriteImage(column, pathfunc, namefunc=None)[source]

Bases: nutsflow.base.NutFunction

Write images within samples.

__call__(sample)[source]

Return sample and write image within sample

__init__(column, pathfunc, namefunc=None)[source]

Write images within samples to file.

Writes jpg, gif, png, tif and bmp format depending on file extension. Images in samples are expected to be numpy arrays. See nutsml.util.load_image for details.

Folders on output file path are created if missing.

>>> from nutsml import ReadImage
>>> from nutsflow import Collect, Get, GetCols, Consume, Unzip
>>> samples = [('nut_color', 1), ('nut_grayscale', 2)]
>>> inpath = 'tests/data/img_formats/*.bmp'
>>> img_samples = samples >> ReadImage(0, inpath) >> Collect()
>>> imagepath = 'tests/data/test_*.bmp'
>>> names = samples >> Get(0) >> Collect()
>>> img_samples >> WriteImage(0, imagepath, names) >> Consume()
>>> imagepath = 'tests/data/test_*.bmp'
>>> names = samples >> Get(0) >> Collect()
>>> images = img_samples >> Get(0)
>>> images >> WriteImage(None, imagepath, names) >> Consume()
>>> imagepath = 'tests/data/test_*.bmp'
>>> namefunc = lambda sample: sample[1]
>>> (samples >> GetCols(0,0,1) >> ReadImage(0, inpath) >>
... WriteImage(0, imagepath, namefunc) >> Consume())
Parameters
  • column (int|None) – Column in sample that contains image or take sample itself if column is None.

  • pathfunc (str|function) – Filepath with wildcard ‘*’, which is replaced by the name provided names e.g. ‘tests/data/img_formats/*.jpg’ for names = [‘nut_grayscale’] will become ‘tests/data/img_formats/nut_grayscale.jpg’ or Function to compute path to image file from sample and name, e.g. pathfunc=lambda sample, name: ‘tests/data/test_{}.jpg’.format(name)

  • namefunc (iterable|function|None) – Iterable over names to generate image paths from (length need to be the same as samples), or Function to compute filenames from sample, e.g. namefunc=lambda samples: sample[0] if None, Enumerate() is used.

Module contents