parkour.graph documentation
combine
(combine node cls)
(combine node var & args)
Add combine task to job node `node`, as implemented by `Reducer` class `cls`,
or Clojure var `var` and optional `args`.
config
(config node & steps)
Add arbitrary configuration steps to `node`, which may be either a single job
node or a vector of job nodes.
execute
(execute graph conf jname)
Execute Hadoop jobs for the job graph `graph`, which should be a job graph
leaf node or vector of leaf nodes. Jobs are configured starting with base
configuration `conf` and named based on the string `jname`. Returns a vector of
the distributed sequences produced by the job graph leaves.
fexecute
(fexecute graph conf jname)
As per `execute`, but require and return only a single result desq;
almost `(comp first execute)`.
map
(map node cls)
(map node var & args)
Add map task to job node `node`, as implemented by `Mapper` class `cls`, or
Clojure var `var` and optional `args`.
node-fn
(node-fn node conf jname)
Return a function for executing the job defined by the job node `node`, using
base configuration `conf` and job name `jname`.
node-job
(node-job node conf jname)
Hadoop `Job` for job node `node`, starting with base configuration `conf`
and named `jname`.
output
(output node dsink)
(output node & named-dsinks)
Add output task to job node `node` for writing to `dsink` or named-output
name-dsink pairs `named-dsinks`. Yields either a new `:input`-stage node
reading from the written output or a vector of such nodes.
partition
(partition node step)
(partition node step cls)
(partition node step var & args)
Add partition task to the provided job node `node`, as configured by `step`
and optionally implemented by either Partitioner class `cls` or Clojure var
`var` & optional `args`. The `node` may be either a single job node or a vector
of job nodes to co-group. The `step` may be either a configuration step or a
vector of the two map-output key & value classes.
reduce
(reduce node cls)
(reduce node var & args)
Add reduce task to job node `node`, as implemented by `Reducer` class `cls`,
or Clojure var `var` and optional `args`.
run-graph
(run-graph runner graph outputs)
Execute the data-flow graph described by `graph` using the function executor
`runner`. Each key in `graph` identifies a particular entry. Each value is a
tuple of `(inputs, f)`, where `inputs` is a sequence of other `graph` keys and
`f` is a function calculating that entry's result given `inputs`. Returns a
vector of the result entries for the keys in the collection `outputs`.
run-job
(run-job job)
Run `job` and wait synchronously for it to complete. Kills the job on
exceptions or JVM shutdown. Unlike the `Job#waitForCompletion()` method, does
not swallow `InterruptedException`.
shuffle
(shuffle [ckey] [[ckey cval]])
Base shuffle configuration; sets map output key & value types to
the classes `ckey` and `cval` respectively.
sink
(sink node dsink)
(sink node & named-dsinks)
Deprecated alias for `output`.
source
(source dseq)
Deprecated alias for `input`.