Welcome to nestly’s documentation!¶
nestly
is a small package designed to ease running software with
combinatorial choices of parameters. It can easily do so for “cartesian
products” of parameter choices, but can do much more– arbitrary
“backwards-looking” dependencies can be used.
To find out more, check out the the Examples.
Contents:
Examples¶
Comparing two algorithms¶
This is a realistic example of using nestly
to examine the performance of two algorithms. Source code to run it is available in examples/adcl/
.
We will use the min_adcl_tree
subcommand of the rppr
tool from the
pplacer
suite, available from http://matsen.fhcrc.org/pplacer.
This tool chooses k
representative leaves from a phylogenetic tree.
There are two implementations: the Full algorithm solves the problem
exactly, while the PAM algorithm uses a variation on the partitioning
among medoids heuristic to find a solution.
We’d like to compare the two algorithms on a variety of trees, using different values for k
.
Making the nest¶
Setting up the comparison is demonstrated in 00make_nest.py
, which builds
up combinations of (algorithm, tree, k)
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | #!/usr/bin/env python
# This example compares runtimes of two implementations of
# an algorithm to minimize the average distance to the closest leaf
# (Matsen et. al., accepted to Systematic Biology).
#
# To run it, you'll need the `rppr` binary on your path, distributed as part of
# the pplacer suite. Source code, or binaries for OS X and 64-bit Linux are
# available from http://matsen.fhcrc.org/pplacer/.
#
# The `rppr min_adcl_tree` subcommand takes a tree, an algorithm name, and
# the number of leaves to keep.
#
# We wish to explore the runtime, over each tree, for various leaf counts.
import glob
from os.path import abspath
from nestly import Nest, stripext
# The `trees` directory contains 5 trees, each with 1000 leaves.
# We want to run each algorithm on all of them.
trees = [abspath(f) for f in glob.glob('trees/*.tre')]
n = Nest()
# We'll try both algorithms
n.add('algorithm', ['full', 'pam'])
# For every tree
n.add('tree', trees, label_func=stripext)
# Store the number of leaves - always 1000 here
n.add('n_leaves', [1000], create_dir=False)
# Now we vary the number of leaves to keep (k)
# Sample between 1 and the total number of leaves.
def k(c):
n_leaves = c['n_leaves']
return range(1, n_leaves, n_leaves // 10)
# Add `k` to the nest.
# This will call k with each combination of (algorithm, tree, n_leaves).
# Each value returned will be used as a possible value for `k`
n.add('k', k)
# Build the nest:
n.build('runs')
|
Running that:
$ ./00make_nest.py
Creates a new directory, runs
.
Within this directory are subdirectories for each algorithm:
runs/full
runs/pam
Each of these contains a directory for each tree used:
$ ls runs/pam
random001 random002 random003 random004 random005
Within each of these subdirectories are directories for each choice of k
.
$ ls runs/pam/random001
1 101 201 301 401 501 601 701 801 901
These directories are leaves. There is a JSON file in each, containing the choices made. For example,
runs/full/random003/401/control.json
contains:
{
"algorithm": "full",
"tree": "/home/cmccoy/development/nestly/examples/adcl/trees/random003.tre",
"n_leaves": 1000,
"k": 401
}
Running the algorithm¶
The nestrun
command-line tool allows you to run a command for each combination of parameters in a nest.
It allows you to substitute parameters chosen by surrounding them in curly brackets, e.g. {algorithm}
.
To see how long, and how much memory each run uses, we’ll use the short shell script time_rppr.sh
:
1 2 3 4 5 6 | #!/bin/sh
export TIME='elapsed,maxmem,exitstatus\n%e,%M,%x'
/usr/bin/time -o time.csv \
rppr min_adcl_tree --algorithm {algorithm} --leaves {k} {tree}
|
Note the placeholders for the parameters to be provided at runtime: k
, tree
, and algorithm
.
Running a script like time_rppr.sh
on every experiment within a nest in parallel is facilitated by the nestrun
script distributed with nestly
:
$ nestrun -j 4 --template-file time_rppr.sh -d runs
(this will take awhile)
This command runs the shell script time_rppr.sh
for each parameter choice, substituting the appropriate parameters.
The -j 4
flag indicates that 4 processors should be used.
Aggregating results¶
Now we have a little CSV file in each leaf directory, containing the running time:
|----------+--------+-------------|
| elapsed | maxmem | exitstatus |
|----------+--------+-------------|
| 17.78 | 471648 | 0 |
|----------+--------+-------------|
To analyze these en-masse, we need to combine them and add information about the parameters used to generate them. The nestagg
script does just this.
$ nestagg delim -d runs -o results.csv time.csv -k algorithm,k,tree
Where -d runs
indicates the directory containing program runs; -o
results.csv
specifies where to write the output; time.csv
gives the name
of the file in each leaf directory, and -k algorithm,k,tree
lists the
parameters to add to each row of the CSV files.
Looking at results.csv
:
|----------+---------+------------+-----------+---------------------------------------+------|
| elapsed | maxmem | exitstatus | algorithm | tree | k |
|----------+---------+------------+-----------+---------------------------------------+------|
| 17.04 | 941328 | 0 | full | .../examples/adcl/trees/random001.tre | 1 |
| 20.86 | 944336 | 0 | full | .../examples/adcl/trees/random001.tre | 101 |
| 31.75 | 944320 | 0 | full | .../examples/adcl/trees/random001.tre | 201 |
| 39.34 | 980048 | 0 | full | .../examples/adcl/trees/random001.tre | 301 |
| 37.84 | 1118960 | 0 | full | .../examples/adcl/trees/random001.tre | 401 |
| 42.15 | 1382000 | 0 | full | .../examples/adcl/trees/random001.tre | 501 |
etc
Now we have something we can look at!
So: PAM is faster for large k
, and always has lower maximum memory use.
(generated by examples/adcl/03analyze.R
)
Building Nests¶
Basic Nest¶
From examples/basic_nest/make_nest.py
, this is a simple, combinatorial
example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #!/usr/bin/env python
import glob
import math
import os
import os.path
from nestly import Nest
wd = os.getcwd()
input_dir = os.path.join(wd, 'inputs')
nest = Nest()
# Simplest case: Levels are added with a name and an iterable
nest.add('strategy', ('exhaustive', 'approximate'))
# Sometimes it's useful to add multiple keys to the nest in one operation, e.g.
# for grouping related data.
# This can be done by passing an iterable of dictionaries to the `Nest.add` call,
# each containing at least the named key, along with the `update=True` flag.
#
# Here, 'run_count' is the named key, and will be used to create a directory in the nest,
# and the value of 'power' will be added to each control dictionary as well.
nest.add('run_count', [{'run_count': 10**i, 'power': i}
for i in range(3)], update=True)
# label_func can be used to generate a meaningful name. Here, it strips the all
# but the file name from the file path
nest.add('input_file', glob.glob(os.path.join(input_dir, 'file*')),
label_func=os.path.basename)
# Items can be added that don't generate directories
nest.add('base_dir', [os.getcwd()], create_dir=False)
# Any function taking one argument (control dictionary) and returning an
# iterable may also be used.
# This one just takes the logarithm of 'run_count'.
# Since the function only returns a single result, we don't create a new directory.
def log_run_count(c):
run_count = c['run_count']
return [math.log(run_count, 10)]
nest.add('run_count_log', log_run_count, create_dir=False)
nest.build('runs')
|
This example is then run with the ../examples/basic_nest/run_example.sh
script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #!/bin/sh
set -e
set -u
set -x
# Build a nested directory structure
./make_nest.py
# Let's look at a sample control file:
cat runs/approximate/1/file1/control.json
# Run `echo.sh` using every control.json under the `runs` directory, 2
# processes at a time
nestrun --processes 2 --template-file echo.sh -d runs
# Merge the CSV files named '{strategy}.csv' (where strategy value is taken
# from the control file)
nestagg delim '{strategy}.csv' -d runs -o aggregated.csv
|
echo.sh
is just the simple script that runs nestrun
and aggregates the
results into an aggregate.csv
file:
1 2 3 4 5 6 7 | #!/bin/sh
#
# Echo the value of two fake output variables: var1, which is always 13, and
# var2, which is 10 times the run_count.
echo "var1,var2
13,{run_count}0" > "{strategy}.csv"
|
Meal¶
This is a bit more complicated, with lookups on previous values of the control dictionary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | #!/usr/bin/env python
import glob
import os
import os.path
from nestly import Nest, stripext
wd = os.getcwd()
startersdir = os.path.join(wd, "starters")
winedir = os.path.join(wd, "wine")
mainsdir = os.path.join(wd, "mains")
nest = Nest()
bn = os.path.basename
# Start by mirroring the two directory levels in startersdir, and name those
# directories "ethnicity" and "dietary".
nest.add('ethnicity', glob.glob(os.path.join(startersdir, '*')),
label_func=bn)
# In the `dietary` key, the anonymous function `lambda ...` chooses as values
# names of directories the current `ethnicity` directory
nest.add('dietary', lambda c: glob.glob(os.path.join(c['ethnicity'], '*')),
label_func=bn)
## Now get all of the starters.
nest.add('starter', lambda c: glob.glob(os.path.join(c['dietary'], '*')),
label_func=stripext)
## Then get the corresponding mains.
nest.add('main', lambda c: [os.path.join(mainsdir, bn(c['ethnicity']) + "_stirfry.txt")],
label_func=stripext)
## Take only the tasty wines.
nest.add('wine', glob.glob(os.path.join(winedir, '*.tasty')),
label_func=stripext)
## The wineglasses should be chosen by the wine choice, but we don't want to
## make a directory for those.
nest.add('wineglass', lambda c: [stripext(c['wine']) + ' wine glasses'],
create_dir=False)
nest.build('runs')
|
SCons integration¶
This SConstruct
file is an example of using nestly with the SCons build
system:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | # -*- python -*-
#
# This example takes every file in the inputs directory and performs the
# following operations:
# * cuts out a column range from every line in the file; either 1-5 or 3-40
# * optionally filters out every line that has an "o" or "O"
# * runs wc on every such file
# * aggregate these together using the prep_tab.sh script
#
# Assuming that SCons is installed, you should be able to run this example by
# typing `scons` in this directory. That should build a series of things in the
# `build` directory. Because this is a build system, deleting a file or directory
# in the build directory and then running scons will simply rerun the needed parts.
from os.path import join
import os
from nestly.scons import SConsWrap
from nestly import Nest
env = Environment()
# Passing an argument to `alias_environment` allows building targets based on nest
# key.
# For example, the `counts` files described below can be built by invoking
# `scons counts`
nest = SConsWrap(Nest(), 'build', alias_environment=env)
# Add our aggregate targets, initializing collections that will get populated
# downstream. At the end of the pipeline, we will operate on these collections.
# The `add_argument` takes a key which will be the key used for accessing the
# collection. The `list` argument specifies that the collection will be a list.
nest.add_aggregate('count_agg', list)
nest.add_aggregate('cut_agg', list)
# Add a nest level with the name 'input_file' that takes the files in the inputs
# directory as its nestable list. Make its label function just the basename.
nest.add('input_file', [join('inputs', f) for f in os.listdir('inputs')],
label_func=os.path.basename)
# This nest level determines the column range we will cut out of the file.
nest.add('cut_range', ['1-5', '3-40'])
# This adds a nest item with the name 'cut' and makes an SCons target out of
# the result.
@nest.add_target()
def cut(outdir, c):
cut, = Command(join(outdir, 'cut'),
c['input_file'],
'cut -c {0[cut_range]} <$SOURCE >$TARGET'.format(c))
# Here we add this cut file to the all_cut aggregator before returning
c['cut_agg'].append(cut)
return cut
# This nest level determines whether we remove the lines with o's.
nest.add('o_choice', ['remove_o', 'leave_o'])
@nest.add_target()
def o_choice(outdir, c):
# If we leave the o lines, then we don't have to do anything.
if c['o_choice'] == 'leave_o':
return c['cut']
# If we want to remove the o lines, then we have to make an SCons Command
# that does so with sed.
return Command(join(outdir, 'o_removed'),
c['cut'],
'sed "/[oO]/d" <$SOURCE >$TARGET')[0]
# Add a target for the word counts.
@nest.add_target()
def counts(outdir, c):
counts, = Command(join(outdir, 'counts'),
c['o_choice'],
'wc <$SOURCE >$TARGET')
# Add the resulting file to the count_agg collection
c['count_agg'].append(counts)
return counts
# Add a control dictionary with chosen values to each leaf directory
nest.add_controls(env)
# Before operating on our aggregate collections, we return back to the original
# nest level in which the aggregates were created by using the `pop` function to
# remove all of the later nest levels from the nest state, leaving only the
# collections.
nest.pop('input_file')
# Now, back at the initial nest level, we can operate on the populated aggregate
# collections. First, the counts:
@nest.add_target()
def all_counts(outdir, c):
return Command(join(outdir, 'all_counts.tab'),
c['count_agg'],
'./prep_tab.sh $SOURCES | column -t >$TARGET')
# Then the cuts:
@nest.add_target()
def all_cut(outdir, c):
return Command(join(outdir, 'all_cut.txt'),
c['cut_agg'],
'cat $SOURCES >$TARGET')
|
nestly Package¶
nestly
Package¶
nestly is a collection of functions designed to make running software with combinatorial choices of parameters easier.
core
Module¶
Core functions for building nests.
-
class
nestly.core.
Nest
(control_name='control.json', indent=2, fail_on_clash=False, warn_on_clash=True, base_dict=None, include_outdir=True)[source]¶ Bases:
object
Nests are used to build nested parameter selections, culminating in a directory structure representing choices made, and a JSON dictionary with all selections.
Build parameter combinations with
Nest.add()
, then create a nested directory structure withNest.build()
.Parameters: - control_name – Name JSON file to be created in each leaf
- indent – Indentation level in json file
- fail_on_clash – Error if a nest level attempts to overwrite a previous value
- warn_on_clash – Print a warning if a nest level attempts ot overwrite a previous value
- base_dict – Base dictionary to start all control dictionaries from
(default:
{}
) - include_outdir – If true, include an OUTDIR key in every control indicating the directory this control would be written to.
-
add
(name, nestable, create_dir=True, update=False, label_func=<type 'str'>, template_subs=False)[source]¶ Add a level to the nest
Parameters: - name (string) – Name of the level. Forms the key in the output dictionary.
- nestable – Either an iterable object containing values, _or_ a function which takes a single argument (the control dictionary) and returns an iterable object containing values
- create_dir (boolean) – Should a directory level be created for this nestable?
- update (boolean) – Should the control dictionary be updated with
the results of each value returned by the nestable? Only valid for
dictionary results; useful for updating multiple values. At a
minimum, a key-value pair corresponding to
name
must be returned. - label_func – Function to be called to convert each value to a directory label.
- template_subs (boolean) – Should the strings in / returned by nestable be treated as templates? If true, str.format is called with the current values of the control dictionary.
-
nestly.core.
control_iter
(base_dir, control_name='control.json')[source]¶ Generate the names of all control files under base_dir
-
nestly.core.
nest_map
(control_iter, map_fn)[source]¶ Apply
map_fn
to the directories defined bycontrol_iter
For each control file in control_iter, map_fn is called with the directory and control file contents as arguments.
Example:
>>> list(nest_map(['run1/control.json', 'run2/control.json'], ... lambda d, c: c['run_id'])) [1, 2]
Parameters: - control_iter – Iterable of paths to JSON control files
- map_fn (function) – Function to run for each control file. It should accept two arguments: the directory of the control file and the json-decoded contents of the control file.
Returns: A generator of the results of applying
map_fn
to elements incontrol_iter
scons
Module¶
SCons integration for nestly.
-
class
nestly.scons.
SConsEncoder
(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)[source]¶ Bases:
json.encoder.JSONEncoder
JSON Encoder which handles SCons objects.
-
class
nestly.scons.
SConsWrap
(nest, dest_dir='.', alias_environment=None)[source]¶ Bases:
object
A Nest wrapper to add SCons integration.
This class wraps a
Nest
in order to provide methods which are useful for using nestly with SCons.A Nest passed to SConsWrap must have been created with
include_outdir=True
, which is the default.Parameters: - nest – A
Nest
object to wrap - dest_dir – The base directory for all output directories.
- alias_environment – An optional SCons
Environment
object. If present, targets added viaSConsWrap.add_target()
will include an alias using the nest key.
-
add
(name, nestable, **kw)[source]¶ Adds a level to the nesting and creates a checkpoint that can be reverted to later for aggregation by calling
SConsWrap.pop()
.Parameters: - name – Identifier for the nest level
- nestable – A nestable object - see
Nest.add()
. - kw – Additional parameters to pass to
Nest.add()
.
-
add_aggregate
(name, data_fac)[source]¶ Add an aggregate target to this nest.
Since nests added after the aggregate can access the construct returned by the factory function value, it can be mutated to provide additional values for use when the decorated function is called.
To do something with the aggregates, you must
SConsWrap.pop()
nest levels created between addition of the aggregate and then can add any normal targets you would like which take advantage of the targets added to the data structure.Parameters: - name – Name for the target in the nest
- data_fac – a nullary factory function which will be called
immediately for each of the current control dictionaries and stored
in each dictionary with the given name as in
SConsWrap.add_target()
.
-
add_controls
(env, target_name='control', file_name='control.json', encoder_cls=<class 'nestly.scons.SConsEncoder'>)[source]¶ Adds a target to build a control file at each of the current leaves.
Parameters: - env – SCons Environment object
- target_name – Name for target in nest
- file_name – Name for output file.
-
add_nest
(name=None, **kw)[source]¶ A simple decorator which wraps
nestly.core.Nest.add()
.
-
add_target
(name=None)[source]¶ Add an SCons target to this nest.
The function decorated will be immediately called with each of the output directories and current control dictionaries. Each result will be added to the respective control dictionary for later nests to access.
Parameters: name – Name for the target in the name (default: function name).
-
add_target_with_env
(environment, name=None)[source]¶ Add an SCons target to this nest, with an SCons Environment
The function decorated will be immediately called with three arguments:
environment
: A clone of the SCons environment, with variables populated for all values in the control dictionary, plus a variableOUTDIR
.outdir
: The output directorycontrol
: The control dictionary
Each result will be added to the respective control dictionary for later nests to access.
Differs from
SConsWrap.add_target()
only by the addition of theEnvironment
clone.
-
pop
(name=None)[source]¶ Reverts to the nest stage just before the corresponding call of
SConsWrap.add_aggregate()
. However, any aggregate collections which have been worked on will still be accessible, and can be called operated on together after calling this method. If no name is passed, will revert to the last nest level.Parameters: name – Name of the nest level to pop.
- nest – A
Subpackages¶
scripts Package¶
nestrun
Module¶
nestrun.py - run commands based on control dictionaries.
-
class
nestly.scripts.nestrun.
NestlyProcess
(command, working_dir, popen, log_name='log.txt')[source]¶ Bases:
object
Metadata about a process run
-
running_time
¶
-
-
nestly.scripts.nestrun.
extant_file
(x)[source]¶ ‘Type’ for argparse - checks that file exists but does not open.
-
nestly.scripts.nestrun.
sigint_handler
(nlocal, write_this_summary, running_procs, signum, frame)[source]¶
-
nestly.scripts.nestrun.
template_subs_file
(in_file, out_fobj, d)[source]¶ Substitute template arguments in in_file from variables in d, write the result to out_fobj.
nestagg
Module¶
Aggregate results of nestly
runs.
-
nestly.scripts.nestagg.
delim
(arguments)[source]¶ Execute delim action.
Parameters: arguments – Parsed command line arguments from main()
Command line tools¶
nestrun
¶
nestrun
takes a command template and a list of control.json files with variables to
substitute. Substitution is performed using the Python built-in
str.format
method. See the Python Formatter documentation for details on syntax,
and examples/jsonrun/do_nestrun.sh
for an example.
Signals¶
nestrun
also handles some signals by default.
-
SIGTERM
This tells
nestrun
to stop spawning jobs. All jobs that were already spawned will continue running.
-
SIGINT
This tells
nestrun
to terminate if received twice. On the first SIGTERM,nestrun
will emit a warning message; on the second, it will terminate all jobs and then itself.
-
SIGUSR1
This tells
nestrun
to immediately write a list of all currently-running processes and their working directories to stderr, then flush stderr.
Help¶
usage: nestrun.py [-h] [-j N] [--template 'template text'] [--stop-on-error]
[--template-file FILE] [--save-cmd-file SAVECMD_FILE]
[--log-file LOG_FILE | --no-log] [--dry-run]
[--summary-file SUMMARY_FILE] [-d DIR]
[control_files [control_files ...]]
nestrun - substitute values into a template and run commands in parallel.
optional arguments:
-h, --help show this help message and exit
-j N, --processes N, --local N
Run a maximum of N processes in parallel locally
(default: 2)
--template 'template text'
Command-execution template, e.g. bash {infile}. By
default, nestrun executes the templatefile.
--stop-on-error Terminate remaining processes if any process returns
non-zero exit status (default: False)
--template-file FILE Command-execution template file path.
--save-cmd-file SAVECMD_FILE
Name of the file that will contain the command that
was executed.
--log-file LOG_FILE Name of the file that will contain output of the
executed command.
--no-log Don't create a log file
--dry-run Dry run mode, does not execute commands.
--summary-file SUMMARY_FILE
Write a summary of the run to the specified file
Control files:
control_files Nestly control dictionaries
-d DIR, --directory DIR
Run on all control files under DIR. May be used in
place of specifying control files.
nestagg
¶
The nestagg
command provides a mechanism for combining results of multiple
runs, via a subcommand interface. Currently, the only supported action is
merging delimited files from a set of leaves, adding values from the control
dictionary on each. This is performed via nestagg delim
.
Help¶
usage: nestagg.py delim [-h] [-k KEYS | -x EXCLUDE_KEYS] [-m {fail,warn}]
[-d DIR] [-s SEPARATOR] [-t] [-o OUTPUT]
file_template [control.json [control.json ...]]
positional arguments:
file_template Template for the delimited file to read in each
directory [e.g. '{run_id}.csv']
control.json Control files
optional arguments:
-h, --help show this help message and exit
-k KEYS, --keys KEYS Comma separated list of keys from the JSON file to
include [default: all keys]
-x EXCLUDE_KEYS, --exclude-keys EXCLUDE_KEYS
Comma separated list of keys from the JSON file not to
include [default: None]
-m {fail,warn}, --missing-action {fail,warn}
Action to take when a file is missing [default: fail]
-d DIR, --directory DIR
Run on all control files under DIR. May be used in
place of specifying control files.
-s SEPARATOR, --separator SEPARATOR
Separator [default: ,]
-t, --tab Files are tab-separated
-o OUTPUT, --output OUTPUT
Output file [default: stdout]
SCons integration¶
SCons is an excellent build tool (analogous to make
). The
nestly.scons
module is provided to make integrating nestly with SCons
easier. SConsWrap
wraps a Nest
object to provide
additional methods for adding nests. SCons is complex and is fully documented
on their website, so we do not describe it here. However, for the purposes of
this document, it suffices to know that dependencies are created when a
target function is called.
The basic idea is that when writing an SConstruct file (analogous to a
Makefile), these SConsWrap
objects extend the usual nestly
functionality with build dependencies. Specifically, there are functions that
add targets to the nest. When SCons is invoked, these targets are identified
as dependencies and the needed code is run.
Typically, you will only need targets within some nest level to refer to things either in the same nest, or in parent nests. However, it is possible to operate on target collections which are not related in this way by using aggregate targets.
Constructing an SConsWrap
¶
SConsWrap
objects wrap and modify a Nest
object. Each Nest
object
needs to have been created with include_outdir=True
, which is the default.
Optionally, a destination directory can be given to the SConsWrap
which
will be passed to Nest.iter()
:
>>> nest = SConsWrap(Nest(), dest_dir='build')
In this example, all the nests created by nest
will go under the build
directory. Throughout the rest of this document, nest
will refer to this
same SConsWrap
instance.
Adding levels¶
Nest levels can still be added to the nest
object:
>>> nest.add('level1', ['spam', 'eggs'])
SConsWrap
also provides a convenience decorator
SConsWrap.add_nest()
for adding levels which use a function as their
nestable. The following examples are exactly equivalent:
@nest.add_nest('level2', label_func=str.strip)
def level2(c):
return [' __' + c['level1'], c['level1'] + '__ ']
def level2(c):
return [' __' + c['level1'], c['level1'] + '__ ']
nest.add('level2', level2, label_func=str.strip)
Another advantage to using the decorator is that the name parameter is optional; if it’s omitted, the name of the nest is taken from the name of the function. As a result, the following example is also equivalent:
@nest.add_nest(label_func=str.strip)
def level2(c):
return [' __' + c['level1'], c['level1'] + '__ ']
Note
add_nest()
must always be called before being applied as a
decorator. @nest.add_nest
is not valid; the correct usage is
@nest.add_nest()
if no other parameters are specified.
Adding targets¶
The fundamental action of SCons integration is in adding a target to a nest.
Adding a target is very much like adding a level in that it will add a key to
the control dictionary, except that it will not add any branching to a nest.
For example, successive calls to Nest.add()
produces results like the following
>>> nest.add('level1', ['A', 'B'])
>>> nest.add('level2', ['C', 'D'])
>>> pprint.pprint([c.items() for outdir, c in nest])
[[('OUTDIR', 'A/C'), ('level1', 'A'), ('level2', 'C')],
[('OUTDIR', 'A/D'), ('level1', 'A'), ('level2', 'D')],
[('OUTDIR', 'B/C'), ('level1', 'B'), ('level2', 'C')],
[('OUTDIR', 'B/D'), ('level1', 'B'), ('level2', 'D')]]
A crude illustration of how level1
and level2
relate:
# C .---- - -
# A .----------o level2
# | D '---- - -
# o----o level1
# | C .---- - -
# B '----------o level2
# D '---- - -
Calling add_target()
, however, produces slightly different
results:
>>> nest.add('level1', ['A', 'B'])
>>> @nest.add_target()
... def target1(outdir, c):
... return 't-{0[level1]}'.format(c)
...
>>> pprint.pprint([c.items() for outdir, c in nest])
[[('OUTDIR', 'A'), ('level1', 'A'), ('target1', 't-A')],
[('OUTDIR', 'B'), ('level1', 'B'), ('target1', 't-B')]]
And a similar illustration of how level1
and target1
relate:
# t-A
# A .----------o------ - -
# o----o level1 target1
# B '----------o------ - -
# t-B
add_target()
does not increase the total number of control
dictionaries from 2; it only updates each existing control dictionary to add
the target1
key. This is effectively the same as calling
add()
(or add_nest()
) with a function
and returning an iterable of one item:
>>> nest.add('level1', ['A', 'B'])
>>> @nest.add_nest()
... def target1(c):
... return ['t-{0[level1]}'.format(c)]
...
>>> pprint.pprint([c.items() for outdir, c in nest])
[[('OUTDIR', 'A/t-A'), ('level1', 'A'), ('target1', 't-A')],
[('OUTDIR', 'B/t-B'), ('level1', 'B'), ('target1', 't-B')]]
Astute readers might have noticed the key difference between the two: functions
decorated with add_target()
have an additional parameter,
outdir
. This allows targets to be built into the correct place in the
directory hierarchy.
The other notable difference is that the function decorated by
add_target()
will be called exactly once with each control
dictionary. A function added with add()
may be called
more than once with equal control dictionaries.
Like add_nest()
, add_target()
must always be
called, and optionally takes the name of the target as the first parameter. No
other parameters are accepted.
Adding aggregates¶
As mentioned in the introduction, often you only need targets within a given nest level to depend on things in the same nest level or parental nest levels. To get around this restriction, you can utilize nestly’s aggregate functionality.
Adding an aggregate target creates a collection (for each terminal node of the current nest state) which can be updated in downstream nest levels.
Once targets have been added to the aggregate collection, you can return to a previous nest level by using the pop()
method and operate on the populated aggregate collection at that level.
For example, let’s say we have two nest levels, level1
and level2
, which take the values [A, B]
and [C, D]
respectively.
If we want to perform an operation for every unique combination of {level1, level2}
, then aggregate the results grouped by values of level1
:
>>> # Create the first nest level, and add an aggregate named "aggregate1"
>>> nest.add('level1', ['A', 'B'])
>>> nest.add_aggregate('aggregate1', list)
...
>>> # Next, add level2 and a target to level2
>>> nest.add('level2', ['C', 'D'])
>>> @nest.add_target()
... def some_target(outdir, c):
... target = c['level1'] + c['level2']
... # here we populate the aggregate
... c['aggregate1'].append(target)
... return target
...
>>> # Now the aggregates have been filled!
>>> # Note that the aggregate collection is shared among all descendents of
>>> # each `level1` value
>>> pprint.pprint([(c['level1'], c['level2'], c['aggregate1']) for outdir, c in nest])
[('A', 'C', ['AC', 'AD']),
('A', 'D', ['AC', 'AD']),
('B', 'C', ['BC', 'BD']),
('B', 'D', ['BC', 'BD'])]
>>>
>>> # However, if we try to build something from the aggregate collection now, we'd get 4 copies (one for
>>> # 'A/C', one for 'A/D', etc.).
>>> # To return to the nest state prior to adding `level2`, we pop it from the nest:
>>> nest.pop('level2')
>>> # Now when we access the aggregate collection, there are only two entries, one for A and one for B:
>>> pprint.pprint([(c['level1'], c['aggregate1']) for outdir, c in nest])
[('A', ['AC', 'AD']), ('B', ['BC', 'BD'])]
>>>
>>> # we can add targets using the aggregate collection!
>>> @nest.add_target()
... def operate_on_aggregate(outdir, c):
... print 'agg', c['level1'], c['aggregate1']
...
agg A ['AC', 'AD']
agg B ['BC', 'BD']
As you can see above, aggregate targets are added using the add_aggregate()
method.
The first argument to this method is used as a key for accessing the aggregate collection(s) from the control dictionary.
The second argument should be a factory function which will be called with no arguments and set as the initial value of the aggregate (typically a collection constructor like list or dict).
Prior to using the aggregate collection, any branching nest levels added after the aggregate should be removed, using pop()
to prevent building identical targets.
This function, when passed the name of a nest level, returns the SConsWrap
to the state just before that nest level was created.
The only modifications which remain are those on the aggregate collection, which retains any targets added to it within the removed nest levels.
Once back at the parental nest level, targets added to the aggregate can be operated on by any further targets added.
Note that to pop a level from the nest, one must call nestly.scons.SConsWrap.add()
rather than nestly.core.Nest.add()
.
Because the results of operations on aggregates are just regular targets at some ancestral nest level, these targets can be used as the sources to targets further downstream.
Note
nestly’s initial SCons aggregation functionality added in version 0.4.0 and described in the nestly manuscript involved registering aggregate functions before adding additional levels to the nest. This interface did not allow the user to utilize aggregate targets as sources of other targets downstream. The original aggregation functionality has since been removed in favor of that described above.
Calling commands from SCons¶
While the previous example demonstrate how to use the various methods of
SConsWrap
, they did not demonstrate how to actually call commands
using SCons. The easiest way is to define the various targets from within the
SConstruct
file:
from nestly.scons import SConsWrap
from nestly import Nest
import os
nest = Nest()
wrap = SConsWrap(nest, 'build')
# Add a nest for each of our input files.
nest.add('input_file', [join('inputs', f) for f in os.listdir('inputs')],
label_func=os.path.basename)
# Each input will get transformed each of these different ways.
nest.add('transformation', ['log', 'unit', 'asinh'])
@nest.add_target()
def transformed(outdir, c):
# The template for the command to run.
action = 'guppy mft --transform {0[transformation]} $SOURCE -o $TARGET'
# Command will return a tuple of the targets; we want the only item.
outfile, = Command(
source=c['input_file'],
target=os.path.join(outdir, 'transformed.jplace'),
action=action.format(c))
return outfile
A function name_targets()
is also provided for more easily naming the
targets of an SCons command:
@nest.add_target('target1')
@name_targets
def target1(outdir, c):
return 'outfile1', 'outfile2', Command(
source=c['input_file'],
target=[os.path.join(outdir, 'outfile1'),
os.path.join(outdir, 'outfile2')],
action="transform $SOURCE $TARGETS")
In this case, target1
will be a dict resembling {'outfile1':
'build/outdir/outfile1', 'outfile2': 'build/outdir/outfile2'}
.
Note
name_targets()
does not preserve the name of the decorated function,
so the name of the target must be provided as a parameter to
add_target()
.
A more involved, runnable example is in the examples/scons
directory.
Project Modules¶
Changes¶
0.6.1¶
- Fix bug wherein pop does not work on nest levels added with a function (GH-23).
0.6.0¶
- Add support for automatic alias creation in
SConsWrap
instances (GH-17).
0.5.0¶
- Add
SConsWrap.add_target_with_env
(GH-14) - Completely revamped aggregation functionality (GH-15)
- Add
SConsWrap.add_controls
(GH-16)
0.4.0¶
- Add
SIG{INT,TERM,USR1}
handling to nestrun (GH-9) - Add SCons integration via
nestly.scons
(GH-12) - Support for walking a directory in nestagg (GH-13)
- Initial Python 3, PyPy support
- Add an
OUTDIR
key to nest control files - Additional examples
0.3.0¶
- Add
nestly.core.stripext
- New aggregation functionality:
nestagg
subcommand;nestly.core.nest_map
- Show tail of log file when
nestrun
fails (GH-10)
0.2.0¶
- Deprecated
nestly.nestly
- New object-oriented API in
nestly.core
- Updated examples