Nestly is a small package designed to ease running software with combinatorial choices of parameters. It can easily do so for “cartesian products” of parameter choices, but can do much more– arbitrary “backwards-looking” dependencies can be used.
To find out more, look in the examples/ subdirectory.
Contents:
nestly is a collection of functions designed to make running software with combinatorial choices of parameters easier.
Core functions for building nests.
Bases: object
Nests are used to build nested parameter selections, culminating in a directory structure representing choices made, and a JSON dictionary with all selections.
Build parameter combinations with Nest.add(), then create a nested directory structure with Nest.build().
Parameters: |
|
---|
Add a level to the nest
Parameters: |
|
---|
Generate the names of all control files under base_dir
Apply map_fn to the directories defined by control_iter
For each control file in control_iter, map_fn is called with the directory and control file contents as arguments.
Example:
>>> list(nest_map(['run1/control.json', 'run2/control.json'],
... lambda d, c: c['run_id']))
[1, 2]
Parameters: |
|
---|---|
Returns: | A generator of the results of applying map_fn to elements in control_iter |
nestrun.py - run commands based on control dictionaries.
Bases: object
Metadata about a process run
‘Type’ for argparse - checks that file exists but does not open.
Substitute template arguments in in_file from variables in d, write the result to out_fobj.
Aggregate results of nestly runs.
Execute delim action.
Parameters: | arguments – Parsed command line arguments from main() |
---|
nestrun takes a command template and a list of control.json files with variables to substitute. Substitution is performed using the Python built-in str.format method. See the Python Formatter documentation for details on syntax, and examples/jsonrun/do_nestrun.sh for an example.
usage: nestrun.py [-h] [-j N] [--template 'template text'] [--stop-on-error]
[--template-file FILE] [--save-cmd-file SAVECMD_FILE]
[--log-file LOG_FILE | --no-log] [--dry-run]
[--summary-file SUMMARY_FILE]
json_files [json_files ...]
nestrun - substitute values into a template and run commands in parallel.
positional arguments:
json_files Nestly control dictionaries
optional arguments:
-h, --help show this help message and exit
-j N, --processes N, --local N
Run a maximum of N processes in parallel locally
(default: 2)
--template 'template text'
Command-execution template, e.g. bash {infile}. By
default, nestrun executes the templatefile.
--stop-on-error Terminate remaining processes if any process returns
non-zero exit status (default: False)
--template-file FILE Command-execution template file path.
--save-cmd-file SAVECMD_FILE
Name of the file that will contain the command that
was executed.
--log-file LOG_FILE Name of the file that will contain output of the
executed command.
--no-log Don't create a log file
--dry-run Dry run mode, does not execute commands.
--summary-file SUMMARY_FILE
Write a summary of the run to the specified file
The nestagg command provides a mechanism for combining results of multiple runs. Currently, the only supported action is merging delimited files from a set of leaves, adding values from the control dictionary on each.
usage: nestagg.py delim [-h] [-k KEYS | -x EXCLUDE_KEYS] [-m {fail,warn}]
[-s SEPARATOR] [-t] [-o OUTPUT]
file_template control.json [control.json ...]
positional arguments:
file_template Template for the delimited file to read in each
directory [e.g. '{run_id}.csv']
control.json Control files
optional arguments:
-h, --help show this help message and exit
-k KEYS, --keys KEYS Comma separated list of keys from the JSON file to
include [default: all keys]
-x EXCLUDE_KEYS, --exclude-keys EXCLUDE_KEYS
Comma separated list of keys from the JSON file not to
include [default: None]
-m {fail,warn}, --missing-action {fail,warn}
Action to take when a file is missing [default: fail]
-s SEPARATOR, --separator SEPARATOR
Separator [default: ,]
-t, --tab Files are tab-separated
-o OUTPUT, --output OUTPUT
Output file [default: stdout]
nestly is a collection of functions designed to make running software with combinatorial choices of parameters easier.
Core functions for building nests.
Bases: object
Nests are used to build nested parameter selections, culminating in a directory structure representing choices made, and a JSON dictionary with all selections.
Build parameter combinations with Nest.add(), then create a nested directory structure with Nest.build().
Parameters: |
|
---|
Add a level to the nest
Parameters: |
|
---|
Generate the names of all control files under base_dir
Apply map_fn to the directories defined by control_iter
For each control file in control_iter, map_fn is called with the directory and control file contents as arguments.
Example:
>>> list(nest_map(['run1/control.json', 'run2/control.json'],
... lambda d, c: c['run_id']))
[1, 2]
Parameters: |
|
---|---|
Returns: | A generator of the results of applying map_fn to elements in control_iter |
nestrun.py - run commands based on control dictionaries.
Bases: object
Metadata about a process run
‘Type’ for argparse - checks that file exists but does not open.
Substitute template arguments in in_file from variables in d, write the result to out_fobj.
Aggregate results of nestly runs.
Execute delim action.
Parameters: | arguments – Parsed command line arguments from main() |
---|
From examples/basic_nest/make_nest.py, this is a simple, combinatorial example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #!/usr/bin/env python
import glob
import math
import os
import os.path
from nestly import Nest
wd = os.getcwd()
input_dir = os.path.join(wd, 'inputs')
nest = Nest()
# Simplest case: Levels are added with a name and an iterable
nest.add('strategy', ('exhaustive', 'approximate'))
# Items can update the control dictionary
nest.add('run_count', [{'run_count': 10**i, 'function': 'pow'}
for i in xrange(3)], update=True)
# label_func is applied to each item create a directory name
nest.add('input_file', glob.glob(os.path.join(input_dir, 'file*')),
label_func=os.path.basename)
# Items can be added that don't generate directories
nest.add('base_dir', [os.getcwd()], create_dir=False)
# Any function taking one argument (control dictionary) and returning an
# iterable may also be used:
def log_run_count(c):
run_count = c['run_count']
return [math.log(run_count, 10)]
nest.add('run_count_log', log_run_count, create_dir=False)
nest.build('runs')
|
This is quite a bit more complicated, with lookups on previous values of the control dictionary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | #!/usr/bin/env python
import glob
import os
import os.path
from nestly import Nest, stripext
wd = os.getcwd()
startersdir = os.path.join(wd, "starters")
winedir = os.path.join(wd, "wine")
mainsdir = os.path.join(wd, "mains")
nest = Nest()
bn = os.path.basename
# start by mirroring the two directory levels in startersdir, and name those
# directories "ethnicity" and "dietary"
nest.add('ethnicity', glob.glob(os.path.join(startersdir, '*')),
label_func=bn)
nest.add('dietary', lambda c: glob.glob(os.path.join(c['ethnicity'], '*')),
label_func=bn)
## now get all of the starters
nest.add('starter', lambda c: glob.glob(os.path.join(c['dietary'], '*')),
label_func=stripext)
## now get the corresponding mains
nest.add('main', lambda c: [os.path.join(mainsdir, bn(c['ethnicity']) + "_stirfry.txt")],
label_func=stripext)
## get only the tasty wines
nest.add('wine', glob.glob(os.path.join(winedir, '*.tasty')),
label_func=stripext)
## the wineglasses should be chosen by the wine choice, but we don't want to
## make a directory for those.
nest.add('wineglass', lambda c: [stripext(c['wine']) + ' wine glasses'],
create_dir=False)
nest.build('runs')
|