parkour.graph documentation

combine

(combine node cls)(combine node var & args)
Add combine task to job node `node`, as implemented by `Reducer` class `cls`,
or Clojure var `var` and optional `args`.

config

(config node & steps)
Add arbitrary configuration steps to `node`, which may be either a single job
node or a vector of job nodes.

execute

(execute graph conf jname)
Execute Hadoop jobs for the job graph `graph`, which should be a job graph
leaf node or vector of leaf nodes.  Jobs are configured starting with base
configuration `conf` and named based on the string `jname`.  Returns a vector of
the distributed sequences produced by the job graph leaves.

fexecute

(fexecute graph conf jname)
As per `execute`, but require and return only a single result desq;
almost `(comp first execute)`.

input

(input node)
Return a fresh `:input`-stage job graph node consuming from the provided dseq
`node`.  If instead provided an `:input`-stage node or vector of such nodes,
acts as the identity function.

map

(map node cls)(map node var & args)
Add map task to job node `node`, as implemented by `Mapper` class `cls`, or
Clojure var `var` and optional `args`.

node-fn

(node-fn node conf jname)
Return a function for executing the job defined by the job node `node`, using
base configuration `conf` and job name `jname`.

node-job

(node-job node conf jname)
Hadoop `Job` for job node `node`, starting with base configuration `conf`
and named `jname`.

output

(output node dsink)(output node & named-dsinks)
Add output task to job node `node` for writing to `dsink` or named-output
name-dsink pairs `named-dsinks`.  Yields either a new `:input`-stage node
reading from the written output or a vector of such nodes.

partition

(partition node step)(partition node step cls)(partition node step var & args)
Add partition task to the provided job node `node`, as configured by `step`
and optionally implemented by either Partitioner class `cls` or Clojure var
`var` & optional `args`.  The `node` may be either a single job node or a vector
of job nodes to co-group.  The `step` may be either a configuration step or a
vector of the two map-output key & value classes.

reduce

(reduce node cls)(reduce node var & args)
Add reduce task to job node `node`, as implemented by `Reducer` class `cls`,
or Clojure var `var` and optional `args`.

run-graph

(run-graph runner graph outputs)
Execute the data-flow graph described by `graph` using the function executor
`runner`.  Each key in `graph` identifies a particular entry.  Each value is a
tuple of `(inputs, f)`, where `inputs` is a sequence of other `graph` keys and
`f` is a function calculating that entry's result given `inputs`.  Returns a
vector of the result entries for the keys in the collection `outputs`.

run-job

(run-job job)
Run `job` and wait synchronously for it to complete.  Kills the job on
exceptions or JVM shutdown.  Unlike the `Job#waitForCompletion()` method, does
not swallow `InterruptedException`.

shuffle

(shuffle [ckey] [[ckey cval]])
Base shuffle configuration; sets map output key & value types to
the classes `ckey` and `cval` respectively.

sink

(sink node dsink)(sink node & named-dsinks)
Deprecated alias for `output`.

source

(source dseq)
Deprecated alias for `input`.