nutsml package¶
Subpackages¶
- nutsml.examples package
- Subpackages
- nutsml.examples.autoencoder package
- nutsml.examples.cifar package
- Submodules
- nutsml.examples.cifar.cnn_classify module
- nutsml.examples.cifar.cnn_train module
- nutsml.examples.cifar.read_images module
- nutsml.examples.cifar.view_augmented_images module
- nutsml.examples.cifar.view_data module
- nutsml.examples.cifar.view_train_images module
- nutsml.examples.cifar.write_images module
- Module contents
- nutsml.examples.mnist package
- Submodules
- nutsml.examples.mnist.cnn_classify module
- nutsml.examples.mnist.cnn_train module
- nutsml.examples.mnist.mlp_classify module
- nutsml.examples.mnist.mlp_train module
- nutsml.examples.mnist.mlp_view_misclassified module
- nutsml.examples.mnist.read_images module
- nutsml.examples.mnist.view_train_images module
- nutsml.examples.mnist.write_images module
- Module contents
- Module contents
- Subpackages
Submodules¶
nutsml.batcher module¶
-
class
BuildBatch
(batchsize, prefetch=1)[source]¶ Bases:
nutsflow.base.Nut
Build batches for GPU-based neural network training.
-
__init__
(batchsize, prefetch=1)[source]¶ iterable >> BuildBatch(batchsize, prefetch=1)
Take samples in iterable, extract specified columns, convert column data to numpy arrays of various types, aggregate converted samples into a batch.
The format of a batch is a list of lists: [[inputs], [outputs]] where inputs and outputs are Numpy arrays.
The following example uses PrintType() to print the shape of the batches constructed. This is useful for development and debugging but should be removed in production.
>>> from nutsflow import Collect, PrintType
>>> numbers = [4.1, 3.2, 1.1] >>> images = [np.zeros((5, 3)), np.ones((5, 3)) , np.ones((5, 3))] >>> class_ids = [1, 2, 1] >>> samples = list(zip(numbers, images, class_ids))
>>> build_batch = (BuildBatch(batchsize=2) ... .input(0, 'number', 'float32') ... .input(1, 'image', np.uint8, True) ... .output(2, 'one_hot', np.uint8, 3)) >>> batches = samples >> build_batch >> PrintType() >> Collect() [[<ndarray> 2:float32, <ndarray> 2x1x5x3:uint8], [<ndarray> 2x3:uint8]] [[<ndarray> 1:float32, <ndarray> 1x1x5x3:uint8], [<ndarray> 1x3:uint8]]
In the example above, we have multiple inputs and a single output, and the batch is of format [[number, image], [one_hot]], where each data element a Numpy array with the shown shape and dtype.
Sample columns can be ignored or reused. Assuming an autoencoder, one might whish to reuse the sample image as input and output:
>>> build_batch = (BuildBatch(2) ... .input(1, 'image', np.uint8, True) ... .output(1, 'image', np.uint8, True)) >>> batches = samples >> build_batch >> PrintType() >> Collect() [[<ndarray> 2x1x5x3:uint8], [<ndarray> 2x1x5x3:uint8]] [[<ndarray> 1x1x5x3:uint8], [<ndarray> 1x1x5x3:uint8]]
In the prediction phase no target outputs are needed. If the batch contains only inputs, the batch format is just [inputs].
>>> build_pred_batch = (BuildBatch(2) ... .input(1, 'image', 'uint8', True)) >>> batches = samples >> build_pred_batch >> PrintType() >> Collect() [<ndarray> 2x1x5x3:uint8] [<ndarray> 1x1x5x3:uint8]
- Parameters
batchsize (int) – Size of batch = number of rows in batch.
prefetch (int) – Number of batches to prefetch. This speeds up GPU based training, since one batch is built on CPU while the another is processed on the GPU. Note: if verbose=True, prefetch is set to 0 to simplify debugging.
verbose (bool) – Print batch shape when True. (and sets prefetch=0)
-
__rrshift__
(iterable)[source]¶ Convert samples in iterable into mini-batches.
Structure of output depends on fmt function used. If None output is a list of np.arrays
- Parameters
iterable (iterable) – Iterable over samples.
- Returns
Mini-batches
- Return type
list of np.array if fmt=None
-
input
(col, name, *args, **kwargs)[source]¶ Specify and add input columns for batch to create
- Parameters
col (int) – column of the sample to extract and to create a batch input column from.
name (string) – Name of the column function to apply to create a batch column, e.g. ‘image’ See the following functions for more details: ‘image’: nutsflow.batcher.build_image_batch ‘number’: nutsflow.batcher.build_number_batch ‘vector’: nutsflow.batcher.build_vector_batch ‘tensor’: nutsflow.batcher.build_tensor_batch ‘one_hot’: nutsflow.batcher.build_one_hot_batch
args (args) – Arguments for column function, e.g. dtype
kwargs (kwargs) – Keyword arguments for column function
- Returns
instance of BuildBatch
- Return type
-
output
(col, name, *args, **kwargs)[source]¶ Specify and add output columns for batch to create
- Parameters
col (int) – column of the sample to extract and to create a batch output column from.
name (string) – Name of the column function to apply to create a batch column, e.g. ‘image’ See the following functions for more details: ‘image’: nutsflow.batcher.build_image_batch ‘number’: nutsflow.batcher.build_number_batch ‘vector’: nutsflow.batcher.build_vector_batch ‘tensor’: nutsflow.batcher.build_tensor_batch ‘one_hot’: nutsflow.batcher.build_one_hot_batch
args (args) – Arguments for column function, e.g. dtype
kwargs (kwargs) – Keyword arguments for column function
- Returns
instance of BuildBatch
- Return type
-
-
Mixup
(batch, alpha)[source]¶ Mixup produces random interpolations between data and labels.
Usage: … >> BuildBatch() >> Mixup(0.1) >> network.train() >> …
Implementation based on the following paper: mixup: Beyond Empirical Risk Minimization https://arxiv.org/abs/1710.09412
- Parameters
batch (list) – Batch consisting of list of input data and list of output data, where data must be numeric, e.g. images and one-hot-encoded class labels that can be interpolated between.
alpha (float) – Control parameter for beta distribution the interpolation factors are sampled from. Range: [0,…,1] For alpha <= 0 no mixup is performed.
- Returns
-
build_image_batch
(images, dtype, channelfirst=False)[source]¶ Return batch of images.
If images have no channel a channel axis is added. For channelfirst=True it will be added/moved to front otherwise the channel comes last. All images in batch will have a channel axis. Batch is of shape (n, c, h, w) or (n, h, w, c) depending on channelfirst, where n is the number of images in the batch.
>>> from nutsflow.common import shapestr >>> images = [np.zeros((2, 3)), np.ones((2, 3))] >>> batch = build_image_batch(images, 'uint8', True) >>> shapestr(batch) '2x1x2x3'
>>> batch array([[[[0, 0, 0], [0, 0, 0]]], [[[1, 1, 1], [1, 1, 1]]]], dtype=uint8)
- Parameters
array images (numpy) – Images to batch. Must be of shape (w,h,c) or (w,h). Gray-scale with channel is fine (w,h,1) and also alpha channel is fine (w,h,4).
data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
channelfirst (bool) – If True, channel is added/moved to front.
- Returns
Image batch with shape (n, c, h, w) or (n, h, w, c).
- Return type
np.array
-
build_number_batch
(numbers, dtype)[source]¶ Return numpy array with given dtype for given numbers.
>>> numbers = (1, 2, 3, 1) >>> build_number_batch(numbers, 'uint8') array([1, 2, 3, 1], dtype=uint8)
- Parameters
number numbers (iterable) – Numbers to create batch from
data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
- Returns
Numpy array for numbers
- Return type
numpy.array
-
build_one_hot_batch
(class_ids, dtype, num_classes)[source]¶ Return one hot vectors for class ids.
>>> class_ids = [0, 1, 2, 1] >>> build_one_hot_batch(class_ids, 'uint8', 3) array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0]], dtype=uint8)
- Parameters
class_ids (iterable) – Class indices in {0, …, num_classes-1}
data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
num_classes – Number of classes
- Returns
One hot vectors for class ids.
- Return type
numpy.array
-
build_tensor_batch
(tensors, dtype, axes=None, expand=None)[source]¶ Return batch of tensors.
>>> from nutsflow.common import shapestr >>> tensors = [np.zeros((2, 3)), np.ones((2, 3))] >>> batch = build_tensor_batch(tensors, 'uint8') >>> shapestr(batch) '2x2x3'
>>> print(batch) [[[0 0 0] [0 0 0]] [[1 1 1] [1 1 1]]]
>>> batch = build_tensor_batch(tensors, 'uint8', expand=0) >>> shapestr(batch) '2x1x2x3'
>>> print(batch) [[[[0 0 0] [0 0 0]]] [[[1 1 1] [1 1 1]]]]
>>> batch = build_tensor_batch(tensors, 'uint8', axes=(1, 0)) >>> shapestr(batch) '2x3x2'
>>> print(batch) [[[0 0] [0 0] [0 0]] [[1 1] [1 1] [1 1]]]
- Parameters
tensors (iterable) – Numpy tensors
data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
axes (tuple|None) – axes order, e.g. to move a channel axis to the last position. (see numpy transpose for details)
expand (int|None) – Add empty dimension at expand dimension. (see numpy expand_dims for details).
- Returns
stack of tensors, with batch axis first.
- Return type
numpy.array
-
build_vector_batch
(vectors, dtype)[source]¶ Return batch of vectors.
>>> from nutsflow.common import shapestr >>> vectors = [np.array([1,2,3]), np.array([2, 3, 4])] >>> batch = build_vector_batch(vectors, 'uint8') >>> shapestr(batch) '2x3'
>>> batch array([[1, 2, 3], [2, 3, 4]], dtype=uint8)
- Parameters
vectors (iterable) – Numpy row vectors
data type dtype (numpy) – Data type of batch, e.g. ‘uint8’
- Returns
vstack of vectors
- Return type
numpy.array
nutsml.booster module¶
-
Boost
(iterable, batcher, network, rand=None)[source]¶ iterable >> Boost(batcher, network, rand=None)
Boost samples with high softmax probability for incorrect class. Expects one-hot encoded targets and softmax predictions for output.
NOTE: prefetching of batches must be disabled when using boosting!
network = Network()build_batch = BuildBatch(BATCHSIZE, prefetch=0).input(…).output(…)boost = Boost(build_batch, network)samples >> … ?>> boost >> build_batch >> network.train() >> Consume()- Parameters
iterable (iterable) – Iterable with samples.
batcher (nutsml.BuildBatch) – Batcher used for network training.
network (nutsml.Network) – Network used for prediction
rand (Random|None) – Random number generator used for down-sampling. If None, random.Random() is used.
- Returns
Generator over samples to boost
- Return type
generator
nutsml.checkpoint module¶
-
class
Checkpoint
(create_net, parameters, checkpointspath='checkpoints')[source]¶ Bases:
object
A factory for checkpoints to periodically save network weights and other hyper/configuration parameters.
Example usage:def create_network(lr=0.01, momentum=0.9):model = Sequential()…optimizer = opt.SGD(lr=lr, momentum=momentum)model.compile(optimizer=optimizer, metrics=[‘accuracy’])return KerasNetwork(model), optimizerdef parameters(network, optimizer):return dict(lr = optimizer.lr, momentum = optimizer.momentum)def train_network():checkpoint = Checkpoint(create_network, parameters)network, optimizer = checkpoint.load()for epoch in xrange(EPOCHS):train_err = train_network()val_err = validate_network()if epoch % 10 == 0: # Reduce learning rate every 10 epochsoptimizer.lr /= 2checkpoint.save_best(val_err)Checkpoints can also be saved under different names, e.g.
checkpoint.save_best(val_err, ‘checkpoint’+str(epoch))And specific checkpoints can be loaded:
network, config = checkpoint.load(‘checkpoint103’)If no checkpoint is specified the most recent one is loaded.
-
__init__
(create_net, parameters, checkpointspath='checkpoints')[source]¶ Create checkpoint factory.
>>> def create_network(lr=0.1): ... return 'MyNetwork', lr
>>> def parameters(network, lr): ... return dict(lr = lr)
>>> checkpoint = Checkpoint(create_network, parameters) >>> network, lr = checkpoint.load() >>> network, lr ('MyNetwork', 0.1)
- Parameters
create_net (function) – Function that takes keyword parameters and returns a nuts-ml Network and and any other values or objects needed to describe the state to be checkpointed. Note: parameters(*create_net()) must work!
parameters (function) – Function that takes output of create_net() and returns dictionary with parameters (same as the one that are used in create_net(…))
checkpointspath (string) – Path to folder that will contain checkpoint folders.
-
datapaths
(checkpointname=None)[source]¶ Return paths to network weights, parameters and config files.
If no checkpoints exist under basedir (None, None, None) is returned.
- Parameters
checkpointname (str|None) – Name of checkpoint. If name is None the most recent checkpoint is used.
- Returns
(weightspath, paramspath, configpath) or (None, None, None)
- Return type
-
dirs
()[source]¶ Return full paths to all checkpoint folders.
- Returns
Paths to all folders under the basedir.
- Return type
-
latest
()[source]¶ Find most recently modified/created checkpoint folder.
- Returns
Full path to checkpoint folder if it exists otherwise None.
- Return type
str | None
-
load
(checkpointname=None)[source]¶ Create network, load weights and parameters.
- Parameters
checkpointname (str|none) – Name of checkpoint to load. If None the most recent checkpoint is used. If no checkpoint exists yet the network will be created but no weights loaded and the default configuration will be returned.
- Returns
whatever self.create_net returns
- Return type
-
save
(checkpointname='checkpoint')[source]¶ Save network weights and parameters under the given name.
-
nutsml.common module¶
-
CheckNaN
(data)[source]¶ Raise exception if data contains NaN.
Useful to stop training if network doesn’t converge and loss function returns NaN. Example: samples >> network.train() >> CheckNan() >> log >> Consume()
>>> from nutsflow import Collect
>>> [1, 2, 3] >> CheckNaN() >> Collect() [1, 2, 3]
>>> import numpy as np >>> [1, np.NaN, 3] >> CheckNaN() >> Collect() Traceback (most recent call last): ... RuntimeError: NaN encountered: nan
- Parameters
data – Items or iterables.
- Returns
Return input data if it doesn’t contain NaN
- Return type
any
- Raise
RuntimeError if data contains NaN.
-
class
ConvertLabel
(column, labels, onehot=False)[source]¶ Bases:
nutsflow.base.NutFunction
Convert string labels to integer class ids (or one-hot) and vice versa.
-
__init__
(column, labels, onehot=False)[source]¶ Convert string labels to integer class ids (or one-hot) and vice versa.
Also converts confidence vectors, e.g. softmax output or float values to class labels.
>>> from nutsflow import Collect >>> labels = ['class0', 'class1', 'class2']
>>> convert = ConvertLabel(None, labels) >>> [1, 0] >> convert >> Collect() ['class1', 'class0'] >>> ['class1', 'class0'] >> convert >> Collect() [1, 0] >>> [0.9, 0.4, 1.6] >> convert >> Collect() ['class1', 'class0', 'class2'] >>> [[0.1, 0.7, 0.2], [0.8, 0.1, 0.1]] >> convert >> Collect() ['class1', 'class0']
>>> convert = ConvertLabel(None, labels, onehot=True) >>> ['class1', 'class0'] >> convert >> Collect() [[0, 1, 0], [1, 0, 0]]
>>> convert = ConvertLabel(1, labels) >>> [('data', 'class1'), ('data', 'class0')] >> convert >> Collect() [('data', 1), ('data', 0)] >>> [('data', 1), ('data', 2)] >> convert >> Collect() [('data', 'class1'), ('data', 'class2')] >>> [('data', 0.9)] >> convert >> Collect() [('data', 'class1')] >>> [('data', [0.1, 0.7, 0.2])] >> convert >> Collect() [('data', 'class1')]
-
-
PartitionByCol
(iterable, column, values)[source]¶ Partition samples in iterables depending on column value.
>>> samples = [(1,1), (2,0), (2,4), (1,3), (3,0)] >>> ones, twos = samples >> PartitionByCol(0, [1, 2]) >>> ones [(1, 1), (1, 3)] >>> twos [(2, 0), (2, 4)]
Note that values does not need to contain all possible values. It is sufficient to provide the values for the partitions wanted.
-
SplitLeaveOneOut
(iterable, keyfunc=None)[source]¶ Returns a leave-one-out split of the iterable.
Note that SplitLeaveOneOut consumes the entire input stream and returns a generator over the leave-one-out splits. The splits are stable across Python version 2.x or 3.x and deterministic.
>>> from nutsflow.common import console # just for printing
>>> samples = [1, 2, 3] >>> for train, test in samples >> SplitLeaveOneOut(): ... console(train, ' ', test) [2, 3] [1] [1, 3] [2] [1, 2] [3]
>>> samples = [(1, 1), (2, 0), (2, 4), (1, 3), (3, 0)] >>> splits = samples >> SplitLeaveOneOut(lambda x: x[0]) >>> for train, test in splits: ... console(train, ' ', test) [(2, 0), (2, 4), (3, 0)] [(1, 1), (1, 3)] [(1, 1), (1, 3), (3, 0)] [(2, 0), (2, 4)] [(1, 1), (1, 3), (2, 0), (2, 4)] [(3, 0)]
- Parameters
iterable (iterable) – Iterable over anything. Will be consumed!
keyfunc (function/None) – Function that returns value the split is based on. If None, the sample itself serves as key.
- Returns
generator over leave-one-out train and test splits (train, test)
- Return type
-
SplitRandom
(iterable, ratio=0.7, constraint=None, rand=None)[source]¶ Randomly split iterable into partitions.
For the same input data the same split is created every time and is stable across different Python version 2.x or 3.x. A random number generator can be provided to create varying splits.
>>> train, val = range(10) >> SplitRandom(ratio=0.7) >>> train, val ([6, 3, 1, 7, 0, 2, 4], [5, 9, 8])
>>> range(10) >> SplitRandom(ratio=0.7) # Same split again [[6, 3, 1, 7, 0, 2, 4], [5, 9, 8]]
>>> train, val, test = range(10) >> SplitRandom(ratio=(0.6, 0.3, 0.1)) >>> train, val, test ([6, 1, 4, 0, 3, 2], [8, 7, 9], [5])
>>> data = zip('aabbccddee', range(10)) >>> same_letter = lambda t: t[0] >>> train, val = data >> SplitRandom(ratio=0.6, constraint=same_letter) >>> sorted(train) [('a', 0), ('a', 1), ('b', 2), ('b', 3), ('d', 6), ('d', 7)] >>> sorted(val) [('c', 4), ('c', 5), ('e', 8), ('e', 9)]
- Parameters
iterable (iterable) – Iterable over anything. Will be consumed!
ratio (float|tuple) – Ratio of two partition e.g. a ratio of 0.7 means 70%, 30% split. Alternatively a list or ratios can be provided, e.g. ratio=(0.6, 0.3, 0.1). Note that ratios must sum up to one and cannot be zero.
constraint (function|None) – Function that returns key the elements of the iterable are grouped by before partitioning. Useful to ensure that a partition contains related elements, e.g. left and right eye images are not scattered across partitions. Note that constrains have precedence over ratios.
rand (Random|None) – Random number generator. The default None ensures that the same split is created every time SplitRandom is called. This is important when continuing an interrupted training session or running the same training on machines with different Python versions. Note that Python’s random.Random(0) generates different number for Python 2.x and 3.x!
- Returns
partitions of iterable with sizes according to provided ratios.
- Return type
nutsml.config module¶
-
class
Config
(*args, **kwargs)[source]¶ Bases:
dict
Dictionary that allows access via keys or attributes.
Used to store and access configuration data.
-
__init__
(*args, **kwargs)[source]¶ Create dictionary.
>>> contact = Config({'name':'stefan', 'address':{'number':12}}) >>> contact['name'] 'stefan'
>>> contact.name 'stefan'
>>> contact.address.number 12
>>> contact.surname = 'maetschke' >>> contact.surname 'maetschke'
- Parameters
args (args) – See dict
kwargs (kwargs) – See dict
-
-
load_config
(filename)[source]¶ Load configuration file in YAML format from locations in defined order.
The search order for the config file is: 1) user home dir 2) current dir 3) full path
Example file: ‘tests/data/config.yaml’filepath : c:/Maetimagesize : [100, 200]>>> cfg = load_config('tests/data/config.yaml') >>> cfg.filepath 'c:/Maet'
>>> cfg['imagesize'] [100, 200]
- Parameters
filename – Name or full path of configuration file.
- Returns
dictionary with config data. Note that config data can be accessed by key or attribute, e.g. cfg.filepath or cfg.[‘filepath’]
- Return type
ConfigDict
nutsml.datautil module¶
-
col_map
(sample, columns, func, *args, **kwargs)[source]¶ Map function to given columns of sample and keep other columns
>>> sample = (1, 2, 3) >>> add_n = lambda x, n: x + n >>> col_map(sample, 1, add_n, 10) (1, 12, 3)
>>> col_map(sample, (0, 2), add_n, 10) (11, 2, 13)
- Parameters
sample (tuple|list) – Sample
columns (int|tuple) – Single or multiple column indices.
func (function) – Function to map
args (args) – Arguments passed on to function
kwargs (kwargs) – Keyword arguments passed on to function
- Returns
Sample where function has been applied to elements in the given columns.
-
group_by
(elements, keyfunc, ordered=False)[source]¶ Group elements using the given key function.
>>> is_odd = lambda x: bool(x % 2) >>> numbers = [0, 1, 2, 3, 4] >>> group_by(numbers, is_odd, True) OrderedDict([(False, [0, 2, 4]), (True, [1, 3])])
- Parameters
elements (iterable) – Any iterable
keyfunc (function) – Function that returns key to group by
ordered (bool) – True: return OrderedDict else return dict
- Returns
dictionary with results of keyfunc as keys and the elements for that key as value
- Return type
dict|OrderedDict
-
group_samples
(samples, labelcol, ordered=False)[source]¶ Return samples grouped by label and label counts.
>>> samples = [('pos', 1), ('pos', 1), ('neg', 0)] >>> groups, labelcnts = group_samples(samples, 1, True) >>> groups OrderedDict([(1, [('pos', 1), ('pos', 1)]), (0, [('neg', 0)])])
>>> labelcnts Counter({1: 2, 0: 1})
- Parameters
- Returns
(groups, labelcnts) where groups is a dict containing samples grouped by label, and labelcnts is a Counter dict containing label frequencies.
- Return type
-
random_downsample
(samples, labelcol, rand=None, ordered=False)[source]¶ Randomly down-sample samples.
Creates stratified samples by down-sampling larger classes to the size of the smallest class.
Note: The example shown below uses StableRandom(i) to create a deterministic sequence of randomly stratified samples. Usually it is sufficient to use the default (rand=None). Do NOT use rnd.Random(0) since this will generate the same subsample every time.
>>> from __future__ import print_function >>> from nutsflow.common import StableRandom
>>> samples = [('pos1', 1), ('pos2', 1), ('pos3', 1), ... ('neg1', 0), ('neg2', 0)] >>> for i in range(3): ... print(random_downsample(samples, 1, StableRandom(i), True)) [('pos2', 1), ('pos3', 1), ('neg2', 0), ('neg1', 0)] [('pos2', 1), ('pos3', 1), ('neg2', 0), ('neg1', 0)] [('pos2', 1), ('pos1', 1), ('neg1', 0), ('neg2', 0)]
- Parameters
samples (iterable) – Iterable of samples where each sample has a label at a fixed position (labelcol). Labels can be any hashable type, e.g. int, str, bool
labelcol (int) – Index of label in sample
rand (Random|None) – Random number generator. If None, random.Random(None) is used.
ordered (bool) – True: samples are kept in order when downsampling.
- Returns
Stratified sample set.
- Return type
list of samples
-
shuffle_sublists
(sublists, rand)[source]¶ Shuffles the lists within a list but not the list itself.
>>> from nutsflow.common import StableRandom >>> rand = StableRandom(0)
>>> sublists = [[1, 2, 3], [4, 5, 6, 7]] >>> shuffle_sublists(sublists, rand) >>> sublists [[1, 3, 2], [4, 5, 7, 6]]
- Parameters
sublists – A list containing lists
rand (Random) – A random number generator.
-
upsample
(samples, labelcol, rand=None)[source]¶ Up-sample sample set.
Creates stratified samples by up-sampling smaller classes to the size of the largest class.
Note: The example shown below uses rnd.Random(i) to create a deterministic sequence of randomly stratified samples. Usually it is sufficient to use the default (rand=None).
>>> from __future__ import print_function >>> import random as rnd >>> samples = [('pos1', 1), ('pos2', 1), ('neg1', 0)] >>> for i in range(3): ... print(upsample(samples, 1, rand=rnd.Random(i))) [('neg1', 0), ('neg1', 0), ('pos1', 1), ('pos2', 1)] [('pos2', 1), ('neg1', 0), ('pos1', 1), ('neg1', 0)] [('neg1', 0), ('neg1', 0), ('pos1', 1), ('pos2', 1)]
- Parameters
samples (iterable) – Iterable of samples where each sample has a label at a fixed position (labelcol). Labels can by any hashable type, e.g. int, str, bool
labelcol (int) – Index of label in sample
rand (Random|None) – Random number generator. If None, random.Random(None) is used.
- Returns
Stratified sample set.
- Return type
list of samples
nutsml.fileutil module¶
-
clear_folder
(path)[source]¶ Remove all content (files and folders) within the specified folder.
- Parameters
path (str) – Path of folder to clear.
-
create_folders
(path, mode=511)[source]¶ Create folder(s). Don’t fail if already existing.
See related functions
delete_folders()
andclear_folder()
.
-
create_temp_filepath
(prefix='', ext='', relative=True)[source]¶ Create a temporary folder under
TEMP_FOLDER
.If the folder already exists do nothing. Return relative (default) or absolute path to a temp file with a unique name.
See related function
create_filename()
.
-
delete_file
(path)[source]¶ Remove file at given path. Don’t fail if non-existing.
- Parameters
path (str) – Path to file to delete, e.g. ‘foo/bar/file.txt’
-
delete_folders
(path)[source]¶ Remove folder and sub-folders. Don’t fail if non-existing or not empty.
- Parameters
path (str) – Path of folders to delete, e.g. ‘foo/bar’
-
reader_filepath
(sample, filename, pathfunc)[source]¶ Construct filepath from sample, filename and/or pathfunction.
Helper function used in ReadImage and ReadNumpy.
- Parameters
sample (tuple|list) – E.g. (‘nut_color’, 1)
filename –
pathfunc (string|function|None) – Filepath with wildcard ‘*’, which is replaced by the file id/name provided in the sample, e.g. ‘tests/data/img_formats/*.jpg’ for sample (‘nut_grayscale’, 2) will become ‘tests/data/img_formats/nut_grayscale.jpg’ or Function to compute path to image file from sample, e.g. lambda sample: ‘tests/data/img_formats/{1}.jpg’.format(*sample) or None, in this case the filename is taken as the filepath.
- Returns
nutsml.imageutil module¶
-
add_channel
(image, channelfirst)[source]¶ Add channel if missing and make first axis if requested.
>>> import numpy as np >>> image = np.ones((10, 20)) >>> image = add_channel(image, True) >>> shapestr(image) '1x10x20'
- Parameters
image (ndarray) – RBG (h,w,3) or gray-scale image (h,w).
channelfirst (bool) – If True, make channel first axis
- Returns
Numpy array with channel (as first axis if makefirst=True)
- Return type
numpy.array
-
annotation2coords
(image, annotation)[source]¶ Convert geometric annotation in image to pixel coordinates.
For example, given a rectangular region annotated in an image as (‘rect’, ((x, y, w, h))) the function returns the coordinates of all pixels within this region as (row, col) position tuples.
The following annotation formats are supported: (‘point’, ((x, y), … )) (‘circle’, ((x, y, r), …)) (‘ellipse’, ((x, y, rx, ry, rot), …)) (‘rect’, ((x, y, w, h), …)) (‘polyline’, (((x, y), (x, y), …), …))
Annotation regions can exceed the image dimensions and will be clipped. Note that annotation is in x,y order while output is r,c (row, col).
>>> import numpy as np >>> img = np.zeros((5, 5), dtype='uint8') >>> anno = ('point', ((1, 1), (1, 2))) >>> for rr, cc in annotation2coords(img, anno): ... print(list(rr), list(cc)) [1] [1] [2] [1]
- Parameters
image (ndarray) – Image
annotation (annotation) – Annotation of an image region such as point, circle, rect or polyline
- Returns
Coordinates of pixels within the (clipped) region.
- Return type
generator over tuples (row, col)
-
annotation2mask
(image, annotations, pos=255)[source]¶ Convert geometric annotation to mask.
For annotation formats see: imageutil.annotation2coords
>>> import numpy as np >>> img = np.zeros((3, 3), dtype='uint8') >>> anno = ('point', ((0, 1), (2, 0))) >>> annotation2mask(img, anno) array([[ 0, 0, 255], [255, 0, 0], [ 0, 0, 0]], dtype=uint8)
- Parameters
annotation (annotation) – Annotation of an image region such as point, circle, rect or polyline
pos (int) – Value to write in mask for regions defined by annotation
array image (numpy) – Image annotation refers to. Returned mask will be of same size.
- Returns
Mask with annotation
- Return type
numpy array
-
annotation2pltpatch
(annotation, **kwargs)[source]¶ Convert geometric annotation to matplotlib geometric objects (=patches)
For details regarding matplotlib patches see: http://matplotlib.org/api/patches_api.html For annotation formats see: imageutil.annotation2coords
- Parameters
annotation (annotation) – Annotation of an image region such as point, circle, rect or polyline
- Returns
matplotlib.patches
- Return type
generator over matplotlib patches
-
arr_to_pil
(image)[source]¶ Convert numpy array to PIL image.
>>> import numpy as np >>> rgb_arr = np.ones((5, 4, 3), dtype='uint8') >>> pil_img = arr_to_pil(rgb_arr) >>> pil_img.size (4, 5)
- Parameters
image (ndarray) – Numpy array with dtype ‘uint8’ and dimensions (h,w,c) for RGB or (h,w) for gray-scale images.
- Returns
PIL image
- Return type
PIL.Image
-
centers_inside
(centers, image, pshape)[source]¶ Filter center points of patches ensuring that patch is inside of image.
>>> centers = np.array([[1, 2], [0,1]]) >>> image = np.zeros((3, 4)) >>> centers_inside(centers, image, (3, 3)).astype('uint8') array([[1, 2]], dtype=uint8)
- Parameters
centers (ndarray(n,2)) – Center points of patches.
image (ndarray(h,w)) – Image the patches should be inside.
pshape (tuple) – Patch shape of form (h,w)
- Returns
Patch centers where the patch is completely inside the image.
- Return type
ndarray of shape (n, 2)
-
change_brightness
(image, brightness=1.0)[source]¶ Change brightness of image.
>>> image = np.eye(3, dtype='uint8') * 255 >>> change_brightness(image, 0.5) array([[127, 0, 0], [ 0, 127, 0], [ 0, 0, 127]], dtype=uint8)
See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Brightness
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
brightness (float) – Brightness [0, 1]
- Returns
Image with changed brightness
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
change_color
(image, color=1.0)[source]¶ Change color of image.
>>> image = np.eye(3, dtype='uint8') * 255 >>> change_color(image, 0.5) array([[255, 0, 0], [ 0, 255, 0], [ 0, 0, 255]], dtype=uint8)
See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Color
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
color (float) – Color [0, 1]
- Returns
Image with changed color
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
change_contrast
(image, contrast=1.0)[source]¶ Change contrast of image.
>>> image = np.eye(3, dtype='uint8') * 255 >>> change_contrast(image, 0.5) array([[170, 42, 42], [ 42, 170, 42], [ 42, 42, 170]], dtype=uint8)
See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Contrast
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
contrast (float) – Contrast [0, 1]
- Returns
Image with changed contrast
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
change_sharpness
(image, sharpness=1.0)[source]¶ Change sharpness of image.
>>> image = np.eye(3, dtype='uint8') * 255 >>> change_sharpness(image, 0.5) array([[255, 0, 0], [ 0, 196, 0], [ 0, 0, 255]], dtype=uint8)
See http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html#PIL.ImageEnhance.Sharpness
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
sharpness (float) – Sharpness [0, …]
- Returns
Image with changed sharpness
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
crop
(image, x1, y1, x2, y2)[source]¶ Crop image.
>>> import numpy as np >>> image = np.reshape(np.arange(16, dtype='uint8'), (4, 4)) >>> crop(image, 1, 2, 5, 5) array([[ 9, 10, 11], [13, 14, 15]], dtype=uint8)
- Parameters
- Returns
Cropped image
- Return type
numpy array
-
crop_center
(image, w, h)[source]¶ Crop region with size w, h from center of image.
Note that the crop is specified via w, h and not via shape (h,w). Furthermore if the image or the crop region have even dimensions, coordinates are rounded down.
>>> import numpy as np >>> image = np.reshape(np.arange(16, dtype='uint8'), (4, 4)) >>> crop_center(image, 3, 2) array([[ 4, 5, 6], [ 8, 9, 10]], dtype=uint8)
-
crop_square
(image)[source]¶ Crop image to square shape.
Crops symmetrically left and right or top and bottom to achieve aspect ratio of one and preserves the largest dimension.
- Parameters
array image (numpy) – Numpy array.
- Returns
Cropped image
- Return type
numpy array
-
distort_elastic
(image, smooth=10.0, scale=100.0, seed=0)[source]¶ Elastic distortion of images.
Channel axis in RGB images will not be distorted but grayscale or RGB images are both valid inputs. RGB and grayscale images will be distorted identically for the same seed.
Simard, et. al, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.
- Parameters
- Returns
Distorted image with same shape as input image.
- Return type
ndarray
-
enhance
(image, func, *args, **kwargs)[source]¶ Enhance image using a PIL enhance function
See the following link for details on PIL enhance functions: http://pillow.readthedocs.io/en/3.1.x/reference/ImageEnhance.html
>>> from PIL.ImageEnhance import Brightness >>> image = np.ones((3,2), dtype='uint8') >>> enhance(image, Brightness, 0.0) array([[0, 0], [0, 0], [0, 0]], dtype=uint8)
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
func (function) – PIL ImageEnhance function
args (args) – Argument list passed on to enhance function.
kwargs (kwargs) – Key-word arguments passed on to enhance function
- Returns
Enhanced image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
extract_edges
(image, sigma)[source]¶ Extract edges using the Canny algorithm.
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
sigma (float) – Standard deviation of the Gaussian filter.
- Returns
Binary image with extracted edges
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
extract_patch
(image, pshape, r, c)[source]¶ Extract a patch of given shape, centered at r,c of given shape from image.
Note that there is no checking if the patch region is inside the image.
>>> image = np.reshape(np.arange(16, dtype='uint8'), (4, 4)) >>> extract_patch(image, (2, 3), 2, 2) array([[ 5, 6, 7], [ 9, 10, 11]], dtype=uint8)
- Parameters
- Returns
numpy array with shape pshape
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
fliplr
(image)[source]¶ Flip image left to right.
>>> image = np.reshape(np.arange(4, dtype='uint8'), (2,2)) >>> fliplr(image) array([[1, 0], [3, 2]], dtype=uint8)
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
- Returns
Flipped image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
flipud
(image)[source]¶ Flip image up to down.
>>> image = np.reshape(np.arange(4, dtype='uint8'), (2,2)) >>> flipud(image) array([[2, 3], [0, 1]], dtype=uint8)
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
- Returns
Flipped image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
floatimg2uint8
(image)[source]¶ Convert array with floats to ‘uint8’ and rescale from [0,1] to [0, 256].
Converts only if image.dtype != uint8.
>>> import numpy as np >>> image = np.eye(10, 20, dtype=float) >>> arr = floatimg2uint8(image) >>> np.max(arr) 255
- Parameters
image (numpy.array) – Numpy array with range [0,1]
- Returns
Numpy array with range [0,255] and dtype ‘uint8’
- Return type
numpy array
-
gray2rgb
(image)[source]¶ Grayscale scale image to RGB image
>>> image = np.eye(3, dtype='uint8') * 255 >>> gray2rgb(image) array([[[255, 255, 255], [ 0, 0, 0], [ 0, 0, 0]], [[ 0, 0, 0], [255, 255, 255], [ 0, 0, 0]], [[ 0, 0, 0], [ 0, 0, 0], [255, 255, 255]]], dtype=uint8)
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
- Returns
RGB image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
identical
(image)[source]¶ Return input image unchanged.
- Parameters
image (numpy.array) – Should be a numpy array of an image.
- Returns
Same as input
- Return type
Same as input
-
load_image
(filepath, as_grey=False, dtype='uint8', no_alpha=True)[source]¶ Load image as numpy array from given filepath.
Supported formats: gif, png, jpg, bmp, tif, npy
>>> img = load_image('tests/data/img_formats/nut_color.jpg') >>> shapestr(img) '213x320x3'
- Parameters
filepath (string) – Filepath to image file or numpy array.
as_grey (bool) –
- Returns
numpy array with shapes (h, w) for grayscale or monochrome, (h, w, 3) for RGB (3 color channels in last axis) (h, w, 4) for RGBA (for no_alpha = False) (h, w, 3) for RGBA (for no_alpha = True) pixel values are in range [0,255] for dtype = uint8
- Return type
numpy ndarray
-
mask_choice
(mask, value, n)[source]¶ Random selection of n points where mask has given value
>>> np.random.seed(1) # ensure same random selection for doctest >>> mask = np.eye(3, dtype='uint8') >>> mask_choice(mask, 1, 2).tolist() [[0, 0], [2, 2]]
- Parameters
array mask (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
n (int) – Number of points to select. If n is larger than the points available only the available points will be returned.
- Returns
Array with x,y coordinates
- Return type
numpy array with shape nx2 where each row contains x, y
-
mask_where
(mask, value)[source]¶ Return x,y coordinates where mask has specified value
>>> mask = np.eye(3, dtype='uint8') >>> mask_where(mask, 1).tolist() [[0, 0], [1, 1], [2, 2]]
- Parameters
array mask (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
- Returns
Array with x,y coordinates
- Return type
numpy array with shape Nx2 where each row contains x, y
-
normalize_histo
(image, gamma=1.0)[source]¶ Perform histogram normalization on image.
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
gamma (float) – Factor for gamma adjustment.
- Returns
Normalized image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
occlude
(image, x, y, w, h, color=0)[source]¶ Occlude image with a rectangular region.
Occludes an image region with dimensions w,h centered on x,y with the given color. Invalid x,y coordinates will be clipped to ensure complete occlusion rectangle is within the image.
>>> import numpy as np >>> image = np.ones((4, 5)).astype('uint8') >>> occlude(image, 2, 2, 2, 3) array([[1, 1, 1, 1, 1], [1, 0, 0, 1, 1], [1, 0, 0, 1, 1], [1, 0, 0, 1, 1]], dtype=uint8)
>>> image = np.ones((4, 4)).astype('uint8') >>> occlude(image, 0.5, 0.5, 0.5, 0.5) array([[1, 1, 1, 1], [1, 0, 0, 1], [1, 0, 0, 1], [1, 1, 1, 1]], dtype=uint8)
- Parameters
array image (numpy) – Numpy array.
x (int|float) – x coordinate for center of occlusion region. Can be provided as fraction (float) of image width
y (int|float) – y coordinate for center of occlusion region. Can be provided as fraction (float) of image height
w (int|float) – width of occlusion region. Can be provided as fraction (float) of image width
h (int|float) – height of occlusion region. Can be provided as fraction (float) of image height
color (int|tuple) – gray-scale or RGB color of occlusion.
- Returns
Copy of input image with occluded region.
- Return type
numpy array
-
patch_iter
(image, shape=(3, 3), stride=1)[source]¶ Extracts patches from images with given shape.
Patches are extracted in a regular grid with the given stride, starting in the left upper corner and then row-wise. Image can be gray-scale (no third channel dim) or color.
>>> import numpy as np >>> img = np.reshape(np.arange(12), (3, 4)) >>> for p in patch_iter(img, (2, 2), 2): ... print(p) [[0 1] [4 5]] [[2 3] [6 7]]
-
pil_to_arr
(image)[source]¶ Convert PIL image to Numpy array.
>>> import numpy as np >>> rgb_arr = np.ones((5, 4, 3), dtype='uint8') >>> pil_img = arr_to_pil(rgb_arr) >>> arr = pil_to_arr(pil_img) >>> shapestr(arr) '5x4x3'
- Parameters
image (PIL.Image) – PIL image (RGB or grayscale)
- Returns
Numpy array
- Return type
numpy.array with dtype ‘uint8’
-
polyline2coords
(points)[source]¶ Return row and column coordinates for a polyline.
>>> rr, cc = polyline2coords([(0, 0), (2, 2), (2, 4)]) >>> list(rr) [0, 1, 2, 2, 3, 4] >>> list(cc) [0, 1, 2, 2, 2, 2]
- Parameters
of tuple points (list) – Polyline in format [(x1,y1), (x2,y2), …]
- Returns
tuple with row and column coordinates in numpy arrays
- Return type
tuple of numpy array
-
rerange
(image, old_min, old_max, new_min, new_max, dtype)[source]¶ Return image with values in new range.
Note: The default range of images is [0, 255] and most image processing functions expect this range and will fail otherwise. However, as input to neural networks re-ranged images, e.g [-1, +1] are sometimes needed.
>>> import numpy as np >>> image = np.array([[0, 255], [255, 0]]) >>> rerange(image, 0, 255, -1, +1, 'float32') array([[-1., 1.], [ 1., -1.]], dtype=float32)
- Parameters
image (numpy.array) – Should be a numpy array of an image.
old_min (int|float) – Current minimum value of image, e.g. 0
old_max (int|float) – Current maximum value of image, e.g. 255
new_min (int|float) – New minimum, e.g. -1.0
new_max (int|float) – New maximum, e.g. +1.0
datatype dtype (numpy) – Data type of output image, e.g. float32’ or np.uint8
- Returns
Image with values in new range.
-
resize
(image, w, h, anti_aliasing=False, **kwargs)[source]¶ Resize image.
Image can be up- or down-sized (using interpolation). For details see: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.resize
>>> image = np.ones((10,5), dtype='uint8') >>> resize(image, 4, 3) array([[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], dtype=uint8)
- Parameters
- Returns
Resized image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
rgb2gray
(image)[source]¶ RGB scale image to grayscale image
>>> image = np.eye(3, dtype='uint8') * 255 >>> rgb2gray(image) array([[255, 0, 0], [ 0, 255, 0], [ 0, 0, 255]], dtype=uint8)
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
- Returns
grayscale image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
rotate
(image, angle=0, **kwargs)[source]¶ Rotate image.
For details see: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.rotate
For a smooth interpolation of images set ‘order=1’. To rotate masks use the default ‘order=0’.
>>> image = np.eye(3, dtype='uint8') >>> rotate(image, 90) array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=uint8)
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
angle (float) – Angle in degrees in counter-clockwise direction
kwargs (kwargs) – Keyword arguments for the underlying scikit-image rotate function, e.g. order=1 for linear interpolation.
- Returns
Rotated image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
sample_labeled_patch_centers
(mask, value, pshape, n, label)[source]¶ Randomly pick n points in mask where mask has given value and add label.
Same as imageutil.sample_mask but adds given label to each center
>>> mask = np.zeros((3, 4)) >>> mask[1, 2] = 1 >>> sample_labeled_patch_centers(mask, 1, (1, 1), 1, 0) array([[1, 2, 0]], dtype=uint16)
- Parameters
mask (ndarray) – Mask
value (int) – Sample points in mask that have this value.
pshape (tuple) – Patch shape of form (h,w)
n (int) – Number of points to sample. If there is not enough points to sample from a smaller number will be returned. If there are not points at all np.empty((0, 2)) will be returned.
label (int) – Numeric label to append to each center point
- Returns
Center points of patches within the mask where the center point has the given mask value and the label
- Return type
ndarray of shape (n, 3)
-
sample_mask
(mask, value, pshape, n)[source]¶ Randomly pick n points in mask where mask has given value.
Ensure that only points picked that can be center of a patch with shape pshape that is inside the mask.
>>> mask = np.zeros((3, 4)) >>> mask[1, 2] = 1 >>> sample_mask(mask, 1, (1, 1), 1) array([[1, 2]], dtype=uint16)
- Parameters
mask (ndarray) – Mask
value (int) – Sample points in mask that have this value.
pshape (tuple) – Patch shape of form (h,w)
n (int) – Number of points to sample. If there is not enough points to sample from a smaller number will be returned. If there are not points at all np.empty((0, 2)) will be returned.
- Returns
Center points of patches within the mask where the center point has the given mask value.
- Return type
ndarray of shape (n, 2)
-
sample_patch_centers
(mask, pshape, npos, nneg, pos=255, neg=0)[source]¶ Sample positive and negative patch centers where mask value is pos or neg.
The sampling routine ensures that the patch is completely inside the mask.
>>> np.random.seed(0) # just to ensure consistent doctest >>> mask = np.zeros((3, 4)) >>> mask[1, 2] = 255 >>> sample_patch_centers(mask, (2, 2), 1, 1) array([[1, 1, 0], [1, 2, 1]], dtype=uint16)
- Parameters
- Returns
Center points of patches within the mask where the center point has the given mask value (pos, neg) and the label (1, 0)
- Return type
ndarray of shape (n, 3)
-
sample_pn_patches
(image, mask, pshape, npos, nneg, pos=255, neg=0)[source]¶ Sample positive and negative patches where mask value is pos or neg.
The sampling routine ensures that the patch is completely inside the image and mask and that a patch a the same position is extracted from the image and the mask.
>>> np.random.seed(0) # just to ensure consistent doctest >>> mask = np.zeros((3, 4), dtype='uint8') >>> img = np.reshape(np.arange(12, dtype='uint8'), (3, 4)) >>> mask[1, 2] = 255 >>> for ip, mp, l in sample_pn_patches(img, mask, (2, 2), 1, 1): ... print(ip) ... print(mp) ... print(l) [[0 1] [4 5]] [[0 0] [0 0]] 0 [[1 2] [5 6]] [[ 0 0] [ 0 255]] 1
- Parameters
- Returns
Image and mask patches where the patch center point has the given mask value (pos, neg) and the label (1, 0)
- Return type
tuple(image_patch, mask_patch, label)
-
save_image
(filepath, image)[source]¶ Save numpy array as image (or numpy array) to given filepath.
Supported formats: gif, png, jpg, bmp, tif, npy
- Parameters
filepath (string) – File path for image file. Extension determines image file format, e.g. .gif
array image (numpy) – Numpy array to save as image. Must be of shape (h,w) or (h,w,3) or (h,w,4)
-
set_default_order
(kwargs)[source]¶ Set order parameter in kwargs for scikit-image functions.
Default order is 1, which performs a linear interpolation of pixel values when images are rotated, resized and sheared. This is fine for images but causes unwanted pixel values in masks. This function set the default order to 0, which disables the interpolation.
- Parameters
kwargs (kwargs) – Dictionary with keyword arguments.
-
shear
(image, shear_factor, **kwargs)[source]¶ Shear image.
For details see: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.AffineTransform
>>> image = np.eye(3, dtype='uint8') >>> rotated = rotate(image, 45)
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
shear_factor (float) – Shear factor [0, 1]
kwargs (kwargs) – Keyword arguments for the underlying scikit-image warp function, e.g. order=1 for linear interpolation.
- Returns
Sheared image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
-
translate
(image, dx, dy, **kwargs)[source]¶ Shift image horizontally and vertically
>>> image = np.eye(3, dtype='uint8') * 255 >>> translate(image, 2, 1) array([[ 0, 0, 0], [ 0, 0, 255], [ 0, 0, 0]], dtype=uint8)
- Parameters
array image (numpy) – Numpy array with range [0,255] and dtype ‘uint8’.
dx – horizontal translation in pixels
dy – vertical translation in pixels
kwargs (kwargs) – Keyword arguments for the underlying scikit-image rotate function, e.g. order=1 for linear interpolation.
- Returns
translated image
- Return type
numpy array with range [0,255] and dtype ‘uint8’
nutsml.logger module¶
-
class
LogCols
(filepath, cols=None, colnames=None, reset=True, delimiter=',')[source]¶ Bases:
nutsml.logger.LogToFile
-
__init__
(filepath, cols=None, colnames=None, reset=True, delimiter=',')[source]¶ Construct logger.
>>> from __future__ import print_function >>> from nutsflow import Consume >>> filepath = 'tests/data/temp_logfile.csv' >>> data = [[1, 2], [3, 4]]
>>> with LogToFile(filepath) as logtofile: ... data >> logtofile >> Consume() >>> print(open(filepath).read()) 1,2 3,4
>>> logtofile = LogToFile(filepath, cols=(1, 0), colnames=['a', 'b']) >>> data >> logtofile >> Consume() >>> print(open(filepath).read()) a,b 2,1 4,3 >>> logtofile.close() >>> logtofile.delete()
- Parameters
filepath (string) – Path to file to write log to.
cols (int|tuple|None) – Indices of columns of input data to write. None: write all columns int: only write the single given column tuple: list of column indices
colnames (tuple|None) – Column names to write in first line. If None no colnames are written.
reset (bool) – If True the writing to the log file is reset if the logger is recreated. Otherwise log data is appended to the log file.
delimiter (str) – Delimiter for columns in log file.
-
-
class
LogToFile
(filepath, cols=None, colnames=None, reset=True, delimiter=',')[source]¶ Bases:
nutsflow.base.NutFunction
Log columns of data to file.
-
__call__
(x)[source]¶ Log x
- Parameters
x (any) – Any type of data. Special support for numpy arrays.
- Returns
Return input unchanged
- Return type
Same as input
-
__init__
(filepath, cols=None, colnames=None, reset=True, delimiter=',')[source]¶ Construct logger.
>>> from __future__ import print_function >>> from nutsflow import Consume >>> filepath = 'tests/data/temp_logfile.csv' >>> data = [[1, 2], [3, 4]]
>>> with LogToFile(filepath) as logtofile: ... data >> logtofile >> Consume() >>> print(open(filepath).read()) 1,2 3,4
>>> logtofile = LogToFile(filepath, cols=(1, 0), colnames=['a', 'b']) >>> data >> logtofile >> Consume() >>> print(open(filepath).read()) a,b 2,1 4,3 >>> logtofile.close() >>> logtofile.delete()
- Parameters
filepath (string) – Path to file to write log to.
cols (int|tuple|None) – Indices of columns of input data to write. None: write all columns int: only write the single given column tuple: list of column indices
colnames (tuple|None) – Column names to write in first line. If None no colnames are written.
reset (bool) – If True the writing to the log file is reset if the logger is recreated. Otherwise log data is appended to the log file.
delimiter (str) – Delimiter for columns in log file.
-
nutsml.network module¶
-
EvalNut
(batches, network, metrics, compute, predcol=None)[source]¶ batches >> EvalNut(network, metrics)
Create nut to evaluate network performance for given metrics. Returned when network.evaluate() is called.
- Parameters
over batches batches (iterable) – Batches to evaluate
network (nutmsml.Network) –
of functions metrics (list) – List of functions that compute some metric, e.g. accuracy, F1, kappa-score. Each metric function must take vectors with true and predicted classes/probabilities and must compute the metric over the entire input (not per sample/mini-batch).
compute (function) – Function of the form f(metric, targets, preds) that computes the given metric (e.g. mean accuracy) for the given targets and predictions.
predcol (int|None) – Index of column in prediction to extract for evaluation. If None a single prediction output is expected.
- Returns
Result(s) of evaluation, e.g. accuracy, precision, …
- Return type
float or tuple of floats if there is more than one metric
-
class
KerasNetwork
(model, weightspath='weights_keras_net.hd5')[source]¶ Bases:
nutsml.network.Network
Wrapper for Keras models: https://keras.io/
-
__init__
(model, weightspath='weights_keras_net.hd5')[source]¶ Construct wrapper around Keras model.
- Parameters
model model (Keras) – Keras model to wrap. See https://keras.io/models/sequential/ https://keras.io/models/model/
weightspath (string) – Filepath to save/load model weights.
-
evaluate
(metrics, predcol=None)[source]¶ Evaluate performance of network for given metrices
>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])
- Parameters
- Returns
Result for each metric as a tuple or a single float if there is only one metric.
-
load_weights
(weightspath=None)[source]¶ Load network weights.
network.load_weights()- Parameters
weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
-
predict
(flatten=True)[source]¶ Get network predictions
>>> predictions = samples >> batcher >> network.predict() >> Collect()
- Parameters
flatten (bool) – True: return individual predictions instead of batched prediction
- Returns
Typically returns softmax class probabilities.
- Return type
ndarray
-
save_weights
(weightspath=None)[source]¶ Save network weights.
network.save_weights()- Parameters
weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
-
-
class
LasagneNetwork
(out_layer, train_fn, val_fn, pred_fn, weightspath='weights_lasagne_net.npz')[source]¶ Bases:
nutsml.network.Network
Wrapper for Lasagne models: https://lasagne.readthedocs.io/en/latest/
-
__init__
(out_layer, train_fn, val_fn, pred_fn, weightspath='weights_lasagne_net.npz')[source]¶ Construct wrapper around Lasagne network.
- Parameters
layer out_layer (Lasgane) – Output layer of Lasagne network.
function train_fn (Theano) – Training function
function val_fn (Theano) – Validation function
function pred_fn (Theano) – Prediction function
weightspath (string) – Filepath to save/load model weights.
-
evaluate
(metrics, predcol=None)[source]¶ Evaluate performance of network for given metrices
>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])
- Parameters
- Returns
Result for each metric as a tuple or a single float if there is only one metric.
-
load_weights
(weightspath=None)[source]¶ Load network weights.
network.load_weights()- Parameters
weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
-
predict
(flatten=True)[source]¶ Get network predictions
>>> predictions = samples >> batcher >> network.predict() >> Collect()
- Parameters
flatten (bool) – True: return individual predictions instead of batched prediction
- Returns
Typically returns softmax class probabilities.
- Return type
ndarray
-
save_weights
(weightspath=None)[source]¶ Save network weights.
network.save_weights()- Parameters
weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
-
-
class
Network
(weightspath)[source]¶ Bases:
object
Abstract base class for networks. Allows to wrap existing network APIs such as Lasagne, Keras or Pytorch into an API that enables direct usage of the network as a Nut in a nuts flow.
-
__init__
(weightspath)[source]¶ Constructs base wrapper for networks.
- Parameters
weightspath (string) – Filepath where network weights are saved to and loaded from.
-
evaluate
(metrics, predcol=None, targetcol=- 1)[source]¶ Evaluate performance of network for given metrices
>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])
- Parameters
- Returns
Result for each metric as a tuple or a single float if there is only one metric.
-
load_weights
(weightspath=None)[source]¶ Load network weights.
network.load_weights()- Parameters
weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
-
predict
(flatten=True)[source]¶ Get network predictions
>>> predictions = samples >> batcher >> network.predict() >> Collect()
- Parameters
flatten (bool) – True: return individual predictions instead of batched prediction
- Returns
Typically returns softmax class probabilities.
- Return type
ndarray
-
save_weights
(weightspath=None)[source]¶ Save network weights.
network.save_weights()- Parameters
weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
-
-
PredictNut
(batches, func, flatten=True)[source]¶ batches >> PredictNut(func)
Create nut to perform network predictions.
- Parameters
over batches batches (iterable) – Batches to create predictions for.
func (function) – Prediction function
flatten (bool) – True: flatten output. Instead of returning batch of predictions return individual predictions
- Returns
Result(s) of prediction
- Return type
typically array with class probabilities (softmax vector)
-
class
PytorchNetwork
(model, weightspath='weights_pytorch_net.pt')[source]¶ Bases:
nutsml.network.Network
Wrapper for Pytorch models: https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html
-
__init__
(model, weightspath='weights_pytorch_net.pt')[source]¶ Construct wrapper around Pytorch model.
- Parameters
model model (Pytorch) – Pytorch model to wrap. model needs to have three attributes: | model.device:, e.g ‘cuda:0’ or ‘cpu’ | model.optimizer: e.g. torch.optim.SGD | model.losses: (list of) loss functions, e.g. F.cross_entropy
weightspath (string) – Filepath to save/load model weights.
-
evaluate
(metrics, predcol=None)[source]¶ Evaluate performance of network for given metrices
>>> acc, f1 = samples >> batcher >> network.evaluate([accuracy, f1_score])
- Parameters
- Returns
Result for each metric as a tuple or a single float if there is only one metric.
-
load_weights
(weightspath=None)[source]¶ Load network weights.
network.load_weights()- Parameters
weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
-
predict
(flatten=True)[source]¶ Get network predictions
>>> predictions = samples >> batcher >> network.predict() >> Collect()
- Parameters
flatten (bool) – True: return individual predictions instead of batched prediction
- Returns
Typically returns softmax class probabilities.
- Return type
ndarray
-
print_layers
(input_shape=None)[source]¶ Print network architecture (and layer dimensions).
- Parameters
input_shape (tuple|None) – (C, H, W) or None If None, layer dimensions and param numbers are not printed.
-
save_weights
(weightspath=None)[source]¶ Save network weights.
network.save_weights()- Parameters
weightspath (string) – Path to network weights. self.weightspath is used if weightspath is None.
-
-
TrainValNut
(batches, func, **kwargs)[source]¶ batches >> TrainValNut(func, **kwargs)
Create nut to train or validate a network.
- Parameters
over batches batches (iterable) – Batches to train/validate.
func (function) – Training or validation function of network.
kwargs (kwargs) – Keyword arguments passed on to function.
- Returns
Result(s) of training/validation function, e.g. loss, accuracy, …
- Return type
float or array/tuple of floats
nutsml.plotter module¶
-
class
PlotLines
(ycols, xcols=None, layout=(1, None), titles=None, every_sec=0, every_n=0, filterfunc=<function PlotLines.<lambda>>, figsize=None, filepath=None)[source]¶ Bases:
nutsflow.base.NutFunction
Plot line graph for selected data columns.
-
__init__
(ycols, xcols=None, layout=(1, None), titles=None, every_sec=0, every_n=0, filterfunc=<function PlotLines.<lambda>>, figsize=None, filepath=None)[source]¶ iterable >> PlotLines(ycols) >> Consume()
>>> import os >>> import numpy as np >>> from nutsflow import Consume
>>> fp = 'tests/data/temp_plotter.png' >>> xs = np.arange(0, 6.3, 1.2) >>> ysin, ycos = np.sin(xs), np.cos(xs) >>> data = zip(xs, ysin, ycos)
>>> data >> PlotLines(1, 0, filepath=fp) >> Consume()
>>> list(ycos) >> PlotLines(0, filepath=fp) >> Consume()
>>> data >> PlotLines(ycols=(1,2), filepath=fp) >> Consume()
>>> ysin.tolist() >> PlotLines(ycols=None, filepath=fp) >> Consume()
>>> if os.path.exists(fp): os.remove(fp)
- Parameters
ycols (int|tuple|None) – Index or tuple of indices of the data columns that contain the y-data for the plot. If None data is used directly.
xcols (int|tuple|function|iterable|None) – Index or tuple of indices of the data columns that contain the x-data for the plot. Alternatively an iterator or a function can be provided that generates the x-data for the plot, e.g. xcols = itertools.count() or xcols = lambda: epoch For xcols==None, itertools.count() will be used.
layout (tuple) – Rows and columns of the plotter layout., e.g. a layout of (2,3) means that 6 plots in the data are arranged in 2 rows and 3 columns. Number of cols can be None is then derived from ycols
every_sec (float) – Plot every given second, e.g. to plot every 2.5 sec every_sec = 2.5
every_n (int) – Plot every n-th call.
filterfunc (function) – Boolean function to filter plot data.
figsize (tuple) – Figure size in inch.
filepath – Path to a file to draw plot to. If provided the plot will not appear on the screen.
- Returns
Returns input unaltered
- Return type
any
-
nutsml.reader module¶
-
ReadImage
(sample, columns, pathfunc=None, as_grey=False, dtype='uint8')[source]¶ Load images from filesystem for samples.
Loads images in jpg, gif, png, tif and bmp format. Images are returned as numpy arrays of shape (h, w, c) or (h, w) for color images or gray scale images respectively. See nutsml.imageutil.load_image for details.
Note that the loaded images replace the image file name|path in the sample. If the images file paths are directly proved (not as a tuple sample) still tuples with the loaded image are returned.
>>> from nutsflow import Consume, Collect >>> from nutsml import PrintColType
>>> images = ['tests/data/img_formats/nut_color.gif'] >>> images >> ReadImage(None) >> PrintColType() >> Consume() item 0: <tuple> 0: <ndarray> shape:213x320x3 dtype:uint8 range:0..255
>>> samples = [('tests/data/img_formats/nut_color.gif', 'class0')] >>> img_samples = samples >> ReadImage(0) >> Collect()
>>> imagepath = 'tests/data/img_formats/*.gif' >>> samples = [(1, 'nut_color'), (2, 'nut_grayscale')] >>> samples >> ReadImage(1, imagepath) >> PrintColType() >> Consume() item 0: <tuple> 0: <int> 1 1: <ndarray> shape:213x320x3 dtype:uint8 range:0..255 item 1: <tuple> 0: <int> 2 1: <ndarray> shape:213x320 dtype:uint8 range:20..235
>>> pathfunc = lambda s: 'tests/data/img_formats/{1}.jpg'.format(*s) >>> img_samples = samples >> ReadImage(1, pathfunc) >> Collect()
- Parameters
sample (tuple|list) – (‘nut_color’, 1)
columns (None|int|tuple) – Indices of columns in sample to be replaced by image (based on image id in that column) If None then a flat samples is assumed and a tuple with the image is returned.
pathfunc (string|function|None) – Filepath with wildcard ‘*’, which is replaced by the imageid provided in the sample, e.g. ‘tests/data/img_formats/*.jpg’ for sample (‘nut_grayscale’, 2) will become ‘tests/data/img_formats/nut_grayscale.jpg’ or Function to compute path to image file from sample, e.g. lambda sample: ‘tests/data/img_formats/{1}.jpg’.format(*sample) or None, in this case the image id is taken as the filepath.
as_grey (bool) – If true, load as grayscale image.
dtype (dtype) – Numpy data type of the image.
- Returns
Sample with image ids replaced by image (=ndarray) of shape (h, w, c) or (h, w)
- Return type
-
ReadLabelDirs
(basedir, filepattern='*', exclude='_*')[source]¶ Read file paths from label directories.
Typically used when classification data is organized in folders, where the folder name represents the class label and the files in the folder the data samples (images, documents, …) for that class.
>>> from __future__ import print_function >>> from nutsflow import Sort
>>> read = ReadLabelDirs('tests/data/labeldirs', '*.txt') >>> samples = read >> Sort() >>> for sample in samples: ... print(sample) ... ('tests/data/labeldirs/0/test0.txt', '0') ('tests/data/labeldirs/1/test1.txt', '1') ('tests/data/labeldirs/1/test11.txt', '1')
- Parameters
basedir (string) – Path to folder that contains label directories.
filepattern (string) – Pattern for filepaths to read from label directories, e.g. ‘.jpg’, ‘.txt’
exclude (string) – Pattern for label directories to exclude. Default is ‘_*’ which excludes all label folders prefixed with ‘_’.
- Returns
iterator over labeled file paths
- Return type
iterator
-
ReadNumpy
(sample, columns, pathfunc=None, allow_pickle=False)[source]¶ Load numpy arrays from filesystem.
Note that the loaded numpy array replace the file name|path in the sample.
>>> from nutsflow import Consume, Collect, PrintType
>>> samples = ['tests/data/img_arrays/nut_color.jpg.npy'] >>> samples >> ReadNumpy(None) >> PrintType() >> Consume() (<ndarray> 213x320x3:uint8)
>>> samples = [('tests/data/img_arrays/nut_color.jpg.npy', 'class0')] >>> samples >> ReadNumpy(0) >> PrintType() >> Consume() (<ndarray> 213x320x3:uint8, <str> class0)
>>> filepath = 'tests/data/img_arrays/*.jpg.npy' >>> samples = [(1, 'nut_color'), (2, 'nut_grayscale')] >>> samples >> ReadNumpy(1, filepath) >> PrintType() >> Consume() (<int> 1, <ndarray> 213x320x3:uint8) (<int> 2, <ndarray> 213x320:uint8)
>>> pathfunc = lambda s: 'tests/data/img_arrays/{1}.jpg.npy'.format(*s) >>> samples >> ReadNumpy(1, pathfunc) >> PrintType() >> Consume() (<int> 1, <ndarray> 213x320x3:uint8) (<int> 2, <ndarray> 213x320:uint8)
- Parameters
sample (tuple|list) – (‘nut_data’, 1)
columns (None|int|tuple) – Indices of columns in sample to be replaced by numpy array (based on fileid in that column) If None then a flat samples is assumed and a tuple with the numpy array is returned.
pathfunc (string|function|None) – Filepath with wildcard ‘*’, which is replaced by the file id/name provided in the sample, e.g. ‘tests/data/img_arrays/*.jpg.npy’ for sample (‘nut_grayscale’, 2) will become ‘tests/data/img_arrays/nut_grayscale.jpg.npy’ or Function to compute path to numnpy file from sample, e.g. lambda sample: ‘tests/data/img_arrays/{1}.jpg.npy’.format(*sample) or None, in this case the file id/name is taken as the filepath.
:param bool allow_pickle : Allow loading pickled object arrays in npy files. :return: Sample with file ids/names replaced by numpy arrays. :rtype: tuple
-
class
ReadPandas
(filepath, rows=None, colnames=None, dropnan=True, replacenan=False, rowname='Row', **kwargs)[source]¶ Bases:
nutsflow.base.NutSource
Read data as Pandas table from file system.
-
__init__
(filepath, rows=None, colnames=None, dropnan=True, replacenan=False, rowname='Row', **kwargs)[source]¶ Create reader for Pandas tables.
The reader returns the table contents as an interator over named tuples, where the column names are derived from the table columns. The order and selection of columns can be changed.
>>> from nutsflow import Collect, Consume, Print >>> filepath = 'tests/data/pandas_table.csv'
>>> ReadPandas(filepath) >> Print() >> Consume() Row(col1=1.0, col2=4.0) Row(col1=3.0, col2=6.0)
>>> (ReadPandas(filepath, dropnan=False, rowname='Sample') >> ... Print() >> Consume()) Sample(col1=1.0, col2=4.0) Sample(col1=2.0, col2=nan) Sample(col1=3.0, col2=6.0)
>>> ReadPandas(filepath, replacenan=None) >> Print() >> Consume() Row(col1=1.0, col2=4.0) Row(col1=2.0, col2=None) Row(col1=3.0, col2=6.0)
>>> colnames=['col2', 'col1'] # swap order >>> ReadPandas(filepath, colnames=colnames) >> Print() >> Consume() Row(col2=4.0, col1=1.0) Row(col2=6.0, col1=3.0)
>>> ReadPandas(filepath, rows='col1 > 1', replacenan=0) >> Collect() [Row(col1=2.0, col2=0), Row(col1=3.0, col2=6.0)]
- Parameters
filepath (str) – Path to a table in CSV, TSV, XLSX or Pandas pickle format. Depending on file extension (e.g. .csv) the table format is picked. Note tables must have a header with the column names.
rows (str) – Rows to filter. Any Pandas filter expression. If rows = None all rows of the table are returned.
columns (list) – List of names for the table columns to return. For columns = None all columns are returned.
dropnan (bool) – If True all rows that contain NaN are dropped.
replacenan (object) – If not False all NaNs are replaced by the value of replacenan
rowname (str) – Name of named tuple return as rows.
kwargs (kwargs) – Key word arguments passed on the the Pandas methods for data reading, e.g, header=None. See pandas/pandas/io/parsers.py for detais
-
nutsml.stratify module¶
-
CollectStratified
(iterable, labelcol, mode='downrnd', container=<class 'list'>, rand=None)[source]¶ - iterable >> CollectStratified(labelcol, mode=’downrnd’, container=list,
rand=rnd.Random())
Collects samples in a container and stratifies them by either randomly down-sampling classes or up-sampling classes by duplicating samples.
>>> from nutsflow import Collect >>> samples = [('pos', 1), ('pos', 1), ('neg', 0)] >>> samples >> CollectStratified(1) >> Sort() [('neg', 0), ('pos', 1)]
- Parameters
over tuples iterable (iterable) – Iterable of tuples where column labelcol contains a sample label that is used for stratification
labelcol (int) – Column of tuple/samples that contains the label
mode (string) – ‘downrnd’ : randomly down-sample ‘up’ : up-sample
container (container) – Some container, e.g. list, set, dict that can be filled from an iterable
rand (Random|None) – Random number generator used for sampling. If None, random.Random() is used.
- Returns
Stratified samples
- Return type
List of tuples
-
Stratify
(iterable, labelcol, labeldist, rand=None)[source]¶ iterable >> Stratify(labelcol, labeldist, rand=None)
Stratifies samples by randomly down-sampling according to the given label distribution. In detail: samples belonging to the class with the smallest number of samples are returned with probability one. Samples from other classes are randomly down-sampled to match the number of samples in the smallest class.
Note that in contrast to SplitRandom, which generates the same random split per default, Stratify generates different stratifications. Furthermore, while the downsampling is random the order of samples remains the same!
While labeldist needs to be provided or computed upfront the actual stratification occurs online and only one sample per time is stored in memory.
>>> from nutsflow import Collect, CountValues >>> from nutsflow.common import StableRandom >>> fix = StableRandom(1) # Stable random numbers for doctest
>>> samples = [('pos', 1), ('pos', 1), ('neg', 0)] >>> labeldist = samples >> CountValues(1) >>> samples >> Stratify(1, labeldist, rand=fix) >> Sort() [('neg', 0), ('pos', 1)]
- Parameters
over tuples iterable (iterable) – Iterable of tuples where column labelcol contains a sample label that is used for stratification
labelcol (int) – Column of tuple/samples that contains the label,
labeldist (dict) – Dictionary with numbers of different labels, e.g. {‘good’:12, ‘bad’:27, ‘ugly’:3}
rand (Random|None) – Random number generator used for down-sampling. If None, random.Random() is used.
- Returns
Stratified samples
- Return type
Generator over tuples
nutsml.transformer module¶
-
class
AugmentImage
(imagecols, rand=None)[source]¶ Bases:
nutsflow.base.Nut
Random augmentation of images in samples
-
__init__
(imagecols, rand=None)[source]¶ samples >> AugmentImage(imagecols, rand=None)
Randomly augment images, e.g. changing contrast. See TransformImage for a full list of available augmentations. Every transformation can be used as an augmentation. Note that the same (random) augmentation is applied to all images specified in imagecols. This ensures that an image and its mask are randomly rotated by the same angle, for instance.
>>> augment_img = (AugmentImage(0) ... .by('identical', 1.0) ... .by('brightness', 0.5, [0.7, 1.3]) ... .by('contrast', 0.5, [0.7, 1.3]) ... .by('fliplr', 0.5) ... .by('flipud', 0.5) ... .by('occlude', 0.5, [0, 1], [0, 1],[0.1, 0.5], [0.1, 0.5]) ... .by('rotate', 0.5, [0, 360]))
See
nutsml.transformer.TransformImage.by()
for full list of available augmentations.Note that each augmentation is applied independently. This is in contrast to transformations which are applied in sequence and result in one image. Augmentation on the other hand are randomly applied and can result in many images. However, augmenters can be chained to achieve combinations of augmentation, e.g. contrast or brightness combined with rotation or shearing:
>>> augment1 = (AugmentImage(0) ... .by('brightness', 0.5, [0.7, 1.3]) ... .by('contrast', 0.5, [0.7, 1.3]))
>>> augment2 = (AugmentImage(0) ... .by('shear', 0.5, [0, 0.2]) ... .by('rotate', 0.5, [0, 360]))
>>> samples >> augment1 >> augment2 >> Consume()
- Parameters
imagecols (int|tuple) – Indices of sample columns that contain images.
rand (Random|None) – Random number generator. If None, random.Random() is used.
-
__rrshift__
(iterable)[source]¶ Apply augmentation to samples in iterable.
- Parameters
iterable (iterable) – Samples
- Returns
iterable with augmented samples
- Return type
generator
-
by
(name, prob, *ranges, **kwargs)[source]¶ Specify and add augmentation to be performed.
>>> augment_img = AugmentImage(0).by('rotate', 0.5, [0, 360])
- Parameters
name (string) – Name of the augmentation/transformation, e.g. ‘rotate’
prob (float|int) – If prob <= 1: probability [0,1] that the augmentation is applied If prob > 1: number of times augmentation is applied.
of lists ranges (list) –
Lists with ranges for each argument of the augmentation, e.g. [0, 360] degrees, where parameters are
randomly sampled from.
kwargs (kwargs) – Keyword arguments passed on the the augmentation.
- Returns
instance of AugmentImage
- Return type
-
-
ImageAnnotationToMask
(iterable, imagecol, annocol)[source]¶ samples >> ImageAnnotationToMask(imagecol, annocol)
Create mask for image annotation. Annotation are of the following formats. See imageutil.annotation2coords for details. (‘point’, ((x, y), … )) (‘circle’, ((x, y, r), …)) (‘rect’, ((x, y, w, h), …)) (‘polyline’, (((x, y), (x, y), …), …))
>>> import numpy as np >>> from nutsflow import Collect
>>> img = np.zeros((3, 3), dtype='uint8') >>> anno = ('point', ((0, 1), (2, 0))) >>> samples = [(img, anno)] >>> masks = samples >> ImageAnnotationToMask(0, 1) >> Collect() >>> print(masks[0][1]) [[ 0 0 255] [255 0 0] [ 0 0 0]]
-
class
ImageChannelMean
(imagecol, filepath='image_channel_means.npy', means=None)[source]¶ Bases:
nutsflow.base.NutFunction
Compute, save per-channel means over images and subtract from images.
-
__call__
(sample)[source]¶ Subtract per-channel mean from images in samples.
sub_mean = ImageChannelMean(imagecol, filepath=’means.npy’) samples >> sub_mean >> Consume()
sub_mean = ImageChannelMean(imagecol, means=[197, 87, 101]) samples >> sub_mean >> Consume()
-
__init__
(imagecol, filepath='image_channel_means.npy', means=None)[source]¶ - samples >> ImageChannelMean(imagecol,
filepath=’image_channel_means.npy’, means=None)
Construct ImageChannelMean nut.
- Parameters
imagecol (int) – Index of sample column that contain image
filepath (string) – Path to file were mean values are saved and loaded from.
means (list|tuple) – Mean values can be provided directly. In this case filepath will be ignored and training is not necessary.
-
-
class
ImageMean
(imagecol, filepath='image_means.npy')[source]¶ Bases:
nutsflow.base.NutFunction
Compute, save mean over images and subtract from images.
-
__call__
(sample)[source]¶ Subtract mean from images in samples.
sub_mean = ImageMean(imagecol, filepath) samples >> sub_mean >> Consume()
-
-
ImagePatchesByAnnotation
(iterable, imagecol, annocol, pshape, npos, nneg=<function <lambda>>, pos=255, neg=0, retlabel=True)[source]¶ - samples >> ImagePatchesByAnnotation(imagecol, annocol, pshape, npos,
nneg=lambda npos: npos, pos=255, neg=0, retlabel=True)
Randomly sample positive/negative patches from image based on annotation. See imageutil.annotation2coords for annotation format. A patch is positive if its center point is within the annotated region and is negative otherwise.
>>> import numpy as np >>> np.random.seed(0) # just to ensure stable doctest >>> img = np.reshape(np.arange(25), (5, 5)) >>> anno = ('point', ((3, 2), (2, 3),)) >>> samples = [(img, anno)]
>>> getpatches = ImagePatchesByAnnotation(0, 1, (3, 3), 1, 1) >>> for (p, l) in samples >> getpatches: ... print(p.tolist(), l) [[12, 13, 14], [17, 18, 19], [22, 23, 24]] 0 [[11, 12, 13], [16, 17, 18], [21, 22, 23]] 1 [[7, 8, 9], [12, 13, 14], [17, 18, 19]] 1
- Parameters
iterable (iterable) – Samples with images
imagecol (int) – Index of sample column that contain image
annocol (int) – Index of sample column that contain annotation
pshape (tuple) – Shape of patch
npos (int) – Number of positive patches to sample
nneg (int|function) – Number of negative patches to sample or a function hat returns the number of negatives based on number of positives.
pos (int) – Mask value indicating positives
neg (int) – Mask value indicating negatives
retlabel (bool) – True return label, False return mask patch
- Returns
Iterator over samples where images are replaced by image patches and masks are replaced by labels [0,1] or mask patches
- Return type
generator
-
ImagePatchesByMask
(iterable, imagecol, maskcol, pshape, npos, nneg=<function <lambda>>, pos=255, neg=0, retlabel=True)[source]¶ - samples >> ImagePatchesByMask(imagecol, maskcol, pshape, npos,
nneg=lambda npos: npos, pos=255, neg=0, retlabel=True)
Randomly sample positive/negative patches from image based on mask.
A patch is positive if its center point has the value ‘pos’ in the mask (corresponding to the input image) and is negative for value ‘neg’ The mask must be of same size as image.
>>> >>> import numpy as np >>> np.random.seed(0) # just to ensure stable doctest >>> img = np.reshape(np.arange(25), (5, 5)) >>> mask = np.eye(5, dtype='uint8') * 255 >>> samples = [(img, mask)]
>>> getpatches = ImagePatchesByMask(0, 1, (3, 3), 2, 1) >>> for (p, l) in samples >> getpatches: ... print(p.tolist(), l) [[10, 11, 12], [15, 16, 17], [20, 21, 22]] 0 [[12, 13, 14], [17, 18, 19], [22, 23, 24]] 1 [[6, 7, 8], [11, 12, 13], [16, 17, 18]] 1
>>> np.random.seed(0) # just to ensure stable doctest >>> patches = ImagePatchesByMask(0, 1, (3, 3), 1, 1, retlabel=False) >>> for (p, m) in samples >> getpatches: ... print(p.tolist(), l) [[10, 11, 12], [15, 16, 17], [20, 21, 22]] 1 [[12, 13, 14], [17, 18, 19], [22, 23, 24]] 1 [[6, 7, 8], [11, 12, 13], [16, 17, 18]] 1
- Parameters
iterable (iterable) – Samples with images
imagecol (int) – Index of sample column that contain image
maskcol (int) – Index of sample column that contain mask
pshape (tuple) – Shape of patch
npos (int) – Number of positive patches to sample
nneg (int|function) – Number of negative patches to sample or a function hat returns the number of negatives based on number of positives.
pos (int) – Mask value indicating positives
neg (int) – Mask value indicating negatives
retlabel (bool) – True return label, False return mask patch
- Returns
Iterator over samples where images are replaced by image patches and masks are replaced by labels [0,1] or mask patches
- Return type
generator
-
RandomImagePatches
(iterable, imagecols, pshape, npatches)[source]¶ samples >> RandomImagePatches(imagecols, shape, npatches)
Extract patches at random locations from images.
>>> import numpy as np >>> np.random.seed(0) # just to ensure stable doctest >>> img = np.reshape(np.arange(30), (5, 6)) >>> samples = [(img, 0)] >>> getpatches = RandomImagePatches(0, (2, 3), 3) >>> for (p, l) in samples >> getpatches: ... print(p.tolist(), l) [[7, 8, 9], [13, 14, 15]] 0 [[8, 9, 10], [14, 15, 16]] 0 [[8, 9, 10], [14, 15, 16]] 0
- Parameters
- Returns
Iterator over samples where images are replaced by patches.
- Return type
generator
-
RegularImagePatches
(iterable, imagecols, pshape, stride)[source]¶ samples >> RegularImagePatches(imagecols, shape, stride)
Extract patches in a regular grid from images.
>>> import numpy as np >>> img = np.reshape(np.arange(12), (3, 4)) >>> samples = [(img, 0)] >>> getpatches = RegularImagePatches(0, (2, 2), 2) >>> for p in samples >> getpatches: ... print(p) (array([[0, 1], [4, 5]]), 0) (array([[2, 3], [6, 7]]), 0)
- Parameters
- Returns
Iterator over samples where images are replaced by patches.
- Return type
generator
-
class
TransformImage
(imagecols)[source]¶ Bases:
nutsflow.base.NutFunction
Transformation of images in samples.
-
__init__
(imagecols)[source]¶ samples >> TransformImage(imagecols)
Images are expected to be numpy arrays of the shape (h, w, c) or (h, w) with a range of [0,255] and a dtype of uint8. Transformation should result in images with the same properties.
>>> transform = TransformImage(0).by('resize', 10, 20)
- Parameters
imagecols (int|tuple) – Indices of sample columns the transformation should be applied to. Can be a single index or a tuple of indices.
transspec (tuple) – Transformation specification. Either a tuple with the name of the transformation function or a tuple with the name, arguments and keyword arguments of the transformation function. The list of argument values and dictionaries provided in the transspec are simply passed on to the transformation function. See the relevant functions for details.
-
by
(name, *args, **kwargs)[source]¶ Specify and add transformations to be performed.
>>> transform = TransformImage(0).by('resize', 10, 20).by('fliplr')
Available transformations:rerange
(old_min, old_max, new_min, new_max, dtype)crop
(x1, y1, x2, y2)crop_center
(w, h)normalize_histo
(gamma)resize
(w, h)translate
(dx, dy)rotate
(angle)contrast
(contrast)sharpness
(sharpness)brightness
(brightness)color
(color)edges
(sigma)shear
(shear_factor)elastic
(smooth, scale, seed)occlude
(x, y, w, h)- Parameters
name (string) – Name of the transformation to apply, e.g. ‘resize’
args (args) – Arguments for the transformation, e.g. width and height for resize.
kwargs (kwargs) – Keyword arguments passed on to the transformation
- Returns
instance of TransformImage with added transformation
- Return type
-
classmethod
register
(name, transformation)[source]¶ Register new transformation function.
>>> brighter = lambda image, c: image * c >>> TransformImage.register('brighter', brighter) >>> transform = TransformImage(0).by('brighter', 1.5)
- Parameters
name (string) – Name of transformation
transformation (function) – Transformation function.
-
transformations
= {'brightness': <function change_brightness>, 'color': <function change_color>, 'contrast': <function change_contrast>, 'crop': <function crop>, 'crop_center': <function crop_center>, 'crop_square': <function crop_square>, 'edges': <function extract_edges>, 'elastic': <function distort_elastic>, 'fliplr': <function fliplr>, 'flipud': <function flipud>, 'gray2rgb': <function gray2rgb>, 'identical': <function identical>, 'normalize_histo': <function normalize_histo>, 'occlude': <function occlude>, 'rerange': <function rerange>, 'resize': <function resize>, 'rgb2gray': <function rgb2gray>, 'rotate': <function rotate>, 'sharpness': <function change_sharpness>, 'shear': <function shear>, 'translate': <function translate>}¶
-
-
map_transform
(sample, imagecols, spec)[source]¶ Map transformation function on columns of sample.
- Parameters
sample (tuple) – Sample with images
imagecols (int|tuple) – Indices of sample columns the transformation should be applied to. Can be a single index or a tuple of indices.
spec (tuple) – Transformation specification. Either a tuple with the name of the transformation function or a tuple with the name, arguments and keyword arguments of the transformation function.
- Returns
Sample with transformations applied. Columns not specified remain unchained.
- Return type
nutsml.viewer module¶
-
class
ViewImage
(imgcols, layout=(1, None), figsize=None, pause=0.0001, axis_off=False, labels_off=False, titles=None, every_sec=0, every_n=0, **imargs)[source]¶ Bases:
nutsflow.base.NutFunction
Display images in window.
-
__init__
(imgcols, layout=(1, None), figsize=None, pause=0.0001, axis_off=False, labels_off=False, titles=None, every_sec=0, every_n=0, **imargs)[source]¶ iterable >> ViewImage(imgcols, layout=(1, None), figsize=None, **plotargs)
Images should be numpy arrays in one of the following formats:MxN - luminance (grayscale, float array only)MxNx3 - RGB (float or uint8 array)MxNx4 - RGBA (float or uint8 array)Shapes with single-dimension axis are supported but not encouraged, e.g. MxNx1 will be converted to MxN.
See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow
>>> from nutsflow import Consume >>> from nutsml import ReadImage
>>> imagepath = 'tests/data/img_formats/*.jpg' >>> samples = [(1, 'nut_color'), (2, 'nut_grayscale')] >>> read_image = ReadImage(1, imagepath) >>> samples >> read_image >> ViewImage(1) >> Consume()
>>> view_gray = ViewImage(1, cmap='gray') >>> samples >> read_image >> view_gray >> Consume()
- Parameters
imgcols (int|tuple|None) – Index or tuple of indices of data columns containing images (ndarray). Use None if images are provided directly, e.g. [img1, img2, …] >> ViewImage(None) >> Consume()
layout (tuple) – Rows and columns of the viewer layout., e.g. a layout of (2,3) means that 6 images in the data are arranged in 2 rows and 3 columns. Number of cols can be None is then derived from imgcols
figsize (tuple) – Figure size in inch.
pause (float) – Waiting time in seconds after each plot. Pressing a key skips the waiting time.
axis_off (bool) – Enable or disable display of figure axes.
lables_off (bool) – Enable or disable display of axes labels.
every_sec (float) – View every given second, e.g. to print every 2.5 sec every_sec = 2.5
every_n (int) – View every n-th call.
imargs (kwargs) – Keyword arguments passed on to matplotlib’s imshow() function, e.g. cmap=’gray’. See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow
-
-
class
ViewImageAnnotation
(imgcol, annocols, figsize=None, pause=0.0001, interpolation=None, **annoargs)[source]¶ Bases:
nutsflow.base.NutFunction
Display images and annotation in window.
-
SHAPEPROP
= {'edgecolor': 'y', 'facecolor': 'none', 'linewidth': 1}¶
-
TEXTPROP
= {'backgroundcolor': (1, 1, 1, 0.5), 'edgecolor': 'k'}¶
-
__init__
(imgcol, annocols, figsize=None, pause=0.0001, interpolation=None, **annoargs)[source]¶ - iterable >> ViewImageAnnotation(imgcol, annocols, figsize=None,
pause, interpolation, **annoargs)
Images must be numpy arrays in one of the following formats:MxN - luminance (grayscale, float array only)MxNx3 - RGB (float or uint8 array)MxNx4 - RGBA (float or uint8 array)SeeShapes with single-dimension axis are supported but not encouraged, e.g. MxNx1 will be converted to MxN.
- Parameters
imgcol (int) – Index of data column that contains the image
annocols (int|tuple) – Index or tuple of indices specifying the data column(s) that contain annotation (labels, or geometry)
figsize (tuple) – Figure size in inch.
pause (float) – Waiting time in seconds after each plot. Pressing a key skips the waiting time.
interpolation (string) – Interpolation for imshow, e.g. ‘nearest’, ‘bilinear’, ‘bicubic’. for details see http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot .imshow
annoargs (kwargs) – Keyword arguments for visual properties of annotation, e.g. edgecolor=’y’, linewidth=1
-
nutsml.writer module¶
-
class
WriteImage
(column, pathfunc, namefunc=None)[source]¶ Bases:
nutsflow.base.NutFunction
Write images within samples.
-
__init__
(column, pathfunc, namefunc=None)[source]¶ Write images within samples to file.
Writes jpg, gif, png, tif and bmp format depending on file extension. Images in samples are expected to be numpy arrays. See nutsml.util.load_image for details.
Folders on output file path are created if missing.
>>> from nutsml import ReadImage >>> from nutsflow import Collect, Get, GetCols, Consume, Unzip >>> samples = [('nut_color', 1), ('nut_grayscale', 2)] >>> inpath = 'tests/data/img_formats/*.bmp' >>> img_samples = samples >> ReadImage(0, inpath) >> Collect()
>>> imagepath = 'tests/data/test_*.bmp' >>> names = samples >> Get(0) >> Collect() >>> img_samples >> WriteImage(0, imagepath, names) >> Consume()
>>> imagepath = 'tests/data/test_*.bmp' >>> names = samples >> Get(0) >> Collect() >>> images = img_samples >> Get(0) >>> images >> WriteImage(None, imagepath, names) >> Consume()
>>> imagepath = 'tests/data/test_*.bmp' >>> namefunc = lambda sample: sample[1] >>> (samples >> GetCols(0,0,1) >> ReadImage(0, inpath) >> ... WriteImage(0, imagepath, namefunc) >> Consume())
- Parameters
column (int|None) – Column in sample that contains image or take sample itself if column is None.
pathfunc (str|function) – Filepath with wildcard ‘*’, which is replaced by the name provided names e.g. ‘tests/data/img_formats/*.jpg’ for names = [‘nut_grayscale’] will become ‘tests/data/img_formats/nut_grayscale.jpg’ or Function to compute path to image file from sample and name, e.g. pathfunc=lambda sample, name: ‘tests/data/test_{}.jpg’.format(name)
namefunc (iterable|function|None) – Iterable over names to generate image paths from (length need to be the same as samples), or Function to compute filenames from sample, e.g. namefunc=lambda samples: sample[0] if None, Enumerate() is used.
-