parkour.mapreduce documentation
The task context. Only bound during the dynamic scope of a task.
(collfn v)
Task function adapter for collection-function-like functions. The adapted
function `v` should accept conf-provided arguments followed by the (unwrapped)
input tuple source, and should return a reducible collection of output tuples.
If `v` has metadata for the `::mr/source-as` or `::mr/sink-as` keys, the
function input and/or output will be re-shaped as specified via the associated
metadata value.
(combiner! conf var & args)
As per `reducer!`, but allocate and configure for the Hadoop combine step,
which may impact e.g. output types.
(contextfn v)
Task function adapter for functions accessing the job context. The adapted
function `v` should accept a configuration followed by any conf-provided
arguments, and should return a function. The returned function should accept
the job context and an (unwrapped) input tuple source, and should return a
reducible collection of output tuples.
(counters-map counters)
Translate job `counters` into nested Clojure map of strings to counts.
(job conf)
Return new Hadoop `Job` instance, optionally initialized with configuration
(keygroups context)
Produce distinct keys from the tuples in `context`. Deprecated.
(keykeygroups context)
Produce pairs of distinct grouping keys and associated sequences of specific
keys from the tuples in `context`. Deprecated.
(keykeyvalgroups context)
Produce pairs of distinct grouping keys and associated sequences of specific
keys and values from the tuples in `context`. Deprecated.
(keys context)
Produce keys only from the tuples in `context`. Deprecated.
(keysgroups context)
Produce sequences of specific keys associated with distinct grouping keys
from the tuples in `context`. Deprecated.
(keyvalgroups context)
Produce pairs of distinct group keys and associated sequences of values from
the tuples in `context`. Deprecated.
(keyvals context)
Produce pairs of keys and values from the tuples in `context`. Deprecated.
(local-runner? conf)
True iff `conf` specifies the local job runner.
(mapper! conf var & args)
Allocate and return a new mapper class for `conf` as invoking `var`.
Prior to use, the function referenced by `var` will be transformed by the
function specified as the value of `var`'s `::mr/adapter` metadata, defaulting
to `parkour.mapreduce/collfn`. During task-setup, the transformed function will
be invoked with the job `Configuration` and any provided `args` (which must be
EDN-serializable); it should return a function of one argument, which will be
invoked with the task context to execute the task.
See also: `collfn`, `contextfn`.
(partfn v)
Partitioner function adapter for value-based partitioners. The adapted
function `v` should accept a configuration followed by any conf-provided
arguments, and should return a function. The returned function should accept
an (unwrapped) tuple key, (unwrapped) tuple value, and a partition-count; it
should return an integer modulo the partition-count, and should optionally be
primitive-hinted as OOLL.
(partitioner! conf var & args)
Allocate and return a new partitioner class for `conf` as invoking `var`.
Prior to use, the function referenced by `var` will be transformed by the
function specified as the value of `var`'s `::mr/adapter` metadata, defaulting
to the `(comp parkour.mapreduce/partfn constantly)`. During task-setup, the
transformed function will be invoked with the job `Configuration` and any
provided `args` (which must be EDN-serializable); it should return a function of
three arguments: a raw map-output key, a raw map-output value, and an integral
reduce-task count. That function will be called for each map-output tuple, must
return an integral value mod the reduce-task count, and must be primitive-hinted
as `OOLL`.
See also: `partfn`.
(recseqfn v)
Input format record-reader creation function adapter for input formats
implemented in terms of seqs. The adapted function `v` should accept an input
split and a task context, and should return a value which is `seq`able,
`count`able, and optionally `Closeable`.
(reducer! conf var & args)
Allocate and return a new reducer class for `conf` as invoking `var`.
Prior to use, the function referenced by `var` will be transformed by the
function specified as the value of `var`'s `::mr/adapter` metadata, defaulting
to `parkour.mapreduce/collfn`. During task-setup, the transformed function will
be invoked with the job `Configuration` and any provided `args` (which must be
EDN-serializable); it should return a function of one argument, which will be
invoked with the task context to execute the task.
See also: `collfn`, `contextfn`.
(set-combiner job cls)
Set the combiner class for `job` to `cls`.
(set-mapper job cls)
Set the mapper class for `job` to `cls`.
(set-partitioner job cls)
Set the partitioner class for `job` to `cls`.
(set-reducer job cls)
Set the reducer class for `job` to `cls`.
(sink coll)
(sink sink coll)
Emit all tuples from `coll` to `sink`, or to `*context*` if not provided.
(sink-as kind coll)
Annotate `coll` as containing values to sink as `kind`. The `kind` may
either be a sinking function of two arguments (a sink and a collection) or a
keyword indicating a built-in sinking function. Supported keywords are `:none`,
`:keys`, `:vals`, and `:keyvals`.
(source-as kind source)
Shape `source` to the collection shape `kind`. The `kind` may either be a
source-shaping function of one argument or a keyword indicating a built-in
source-shaping function. Supported keywords are: `:keys`, `:vals`, `:keyvals`,
`:keygroups`, `:valgroups`, :keyvalgroups`, `:keykeyvalgroups`,
`:keykeygroups`, and `:keysgroups`.
(tac conf)
(tac conf id)
Return a new TaskAttemptContext instance using provided configuration `conf`
and task attempt ID `taid`.
Atom holding any exception thrown during local execution. Intended only for
internal use within Parkour.
(valgroups context)
Produce sequences of values associated with distinct grouping keys from the
tuples in `context`. Deprecated.
(vals context)
Produce values only from the tuples in `context`. Deprecated.
(wrap-sink sink)
(wrap-sink ckey cval sink)
Return new tuple sink which wraps keys and values as the types `ckey` and
`cval` respectively, which should be compatible with the key and value type of
`sink`. Where they are not compatible, the type of the `sink` will be used
instead. Returns a new tuple sink which wraps any sunk keys and values which
are not already of the correct type then sinks them to `sink`.