parkour.mapreduce documentation

context

The task context.  Only bound during the dynamic scope of a task.

Source

collfn

(collfn v)

Task function adapter for collection-function-like functions.  The adapted
function `v` should accept conf-provided arguments followed by the (unwrapped)
input tuple source, and should return a reducible collection of output tuples.
If `v` has metadata for the `::mr/source-as` or `::mr/sink-as` keys, the
function input and/or output will be re-shaped as specified via the associated
metadata value.

Source

combiner!

(combiner! conf var & args)

As per `reducer!`, but allocate and configure for the Hadoop combine step,
which may impact e.g. output types.

Source

contextfn

(contextfn v)

Task function adapter for functions accessing the job context.  The adapted
function `v` should accept a configuration followed by any conf-provided
arguments, and should return a function.  The returned function should accept
the job context and an (unwrapped) input tuple source, and should return a
reducible collection of output tuples.

Source

counters-map

(counters-map counters)

Translate job `counters` into nested Clojure map of strings to counts.

Source

input-format!

(input-format! conf svar sargs rvar rargs)

Allocate and return a new input format class for `conf` as invoking `svar` to
generate input-splits and invoking `rvar` to generate record-readers.

During local job initialization, the function referenced by `svar` will be
invoked with the job context followed by any provided `sargs` (which must be
EDN-serializable); it should return a sequence of input split data.  Any values
in the returned sequence which are not `InputSplit`s will be wrapped in
`EdnInputSplit`s and must be EDN-serializable; in such a case, the `::mr/length`
and `::mr/locations` keys of such data may provide the split byte-size and node
locations respectively.

Prior to use, the function reference by `rvar` will be transformed by the
function specified as the value of the `readerv`'s `::mr/adapter` metadata,
defaulting to `parkour.mapreduce/recseqfn`.  During remote task-setup, the
transformed function will be invoked with the task input split and task context
followed by any provided `rargs`; it should return a `RecordReader` generating
the task input data.

See also: `recseqfn`.

Source

job

(job)(job conf)

Return new Hadoop `Job` instance, optionally initialized with configuration
`conf`.

Source

keygroups

(keygroups context)

Produce distinct keys from the tuples in `context`.  Deprecated.

Source

keykeygroups

(keykeygroups context)

Produce pairs of distinct grouping keys and associated sequences of specific
keys from the tuples in `context`.  Deprecated.

Source

keykeyvalgroups

(keykeyvalgroups context)

Produce pairs of distinct grouping keys and associated sequences of specific
keys and values from the tuples in `context`.  Deprecated.

Source

keys

(keys context)

Produce keys only from the tuples in `context`.  Deprecated.

Source

keysgroups

(keysgroups context)

Produce sequences of specific keys associated with distinct grouping keys
from the tuples in `context`.  Deprecated.

Source

keyvalgroups

(keyvalgroups context)

Produce pairs of distinct group keys and associated sequences of values from
the tuples in `context`.  Deprecated.

Source

keyvals

(keyvals context)

Produce pairs of keys and values from the tuples in `context`.  Deprecated.

Source

local-runner?

(local-runner? conf)

True iff `conf` specifies the local job runner.

Source

mapper!

(mapper! conf var & args)

Allocate and return a new mapper class for `conf` as invoking `var`.

Prior to use, the function referenced by `var` will be transformed by the
function specified as the value of `var`'s `::mr/adapter` metadata, defaulting
to `parkour.mapreduce/collfn`.  During task-setup, the transformed function will
be invoked with the job `Configuration` and any provided `args` (which must be
EDN-serializable); it should return a function of one argument, which will be
invoked with the task context to execute the task.

See also: `collfn`, `contextfn`.

Source

partfn

(partfn v)

Partitioner function adapter for value-based partitioners.  The adapted
function `v` should accept a configuration followed by any conf-provided
arguments, and should return a function.  The returned function should accept
an (unwrapped) tuple key, (unwrapped) tuple value, and a partition-count; it
should return an integer modulo the partition-count, and should optionally be
primitive-hinted as OOLL.

Source

partitioner!

(partitioner! conf var & args)

Allocate and return a new partitioner class for `conf` as invoking `var`.

Prior to use, the function referenced by `var` will be transformed by the
function specified as the value of `var`'s `::mr/adapter` metadata, defaulting
to the `(comp parkour.mapreduce/partfn constantly)`.  During task-setup, the
transformed function will be invoked with the job `Configuration` and any
provided `args` (which must be EDN-serializable); it should return a function of
three arguments: a raw map-output key, a raw map-output value, and an integral
reduce-task count.  That function will be called for each map-output tuple, must
return an integral value mod the reduce-task count, and must be primitive-hinted
as `OOLL`.

See also: `partfn`.

Source

recseqfn

(recseqfn v)

Input format record-reader creation function adapter for input formats
implemented in terms of seqs.  The adapted function `v` should accept an input
split and a task context, and should return a value which is `seq`able,
`count`able, and optionally `Closeable`.

Source

reducer!

(reducer! conf var & args)

Allocate and return a new reducer class for `conf` as invoking `var`.

Prior to use, the function referenced by `var` will be transformed by the
function specified as the value of `var`'s `::mr/adapter` metadata, defaulting
to `parkour.mapreduce/collfn`.  During task-setup, the transformed function will
be invoked with the job `Configuration` and any provided `args` (which must be
EDN-serializable); it should return a function of one argument, which will be
invoked with the task context to execute the task.

See also: `collfn`, `contextfn`.

Source

set-combiner

(set-combiner job cls)

Set the combiner class for `job` to `cls`.

Source

set-mapper

(set-mapper job cls)

Set the mapper class for `job` to `cls`.

Source

set-partitioner

(set-partitioner job cls)

Set the partitioner class for `job` to `cls`.

Source

set-reducer

(set-reducer job cls)

Set the reducer class for `job` to `cls`.

Source

sink

(sink coll)(sink sink coll)

Emit all tuples from `coll` to `sink`, or to `*context*` if not provided.

Source

sink-as

(sink-as kind coll)

Annotate `coll` as containing values to sink as `kind`.  The `kind` may
either be a sinking function of two arguments (a sink and a collection) or a
keyword indicating a built-in sinking function.  Supported keywords are `:none`,
`:keys`, `:vals`, and `:keyvals`.

Source

source-as

(source-as kind source)

Shape `source` to the collection shape `kind`.  The `kind` may either be a
source-shaping function of one argument or a keyword indicating a built-in
source-shaping function.  Supported keywords are: `:keys`, `:vals`, `:keyvals`,
`:keygroups`, `:valgroups`, :keyvalgroups`, `:keykeyvalgroups`,
`:keykeygroups`, and `:keysgroups`.

Source

tac

(tac conf)(tac conf id)

Return a new TaskAttemptContext instance using provided configuration `conf`
and task attempt ID `taid`.

Source

task-ex

Atom holding any exception thrown during local execution.  Intended only for
internal use within Parkour.

Source

valgroups

(valgroups context)

Produce sequences of values associated with distinct grouping keys from the
tuples in `context`.  Deprecated.

Source

vals

(vals context)

Produce values only from the tuples in `context`.  Deprecated.

Source

wrap-sink

(wrap-sink sink)(wrap-sink ckey cval sink)

Return new tuple sink which wraps keys and values as the types `ckey` and
`cval` respectively, which should be compatible with the key and value type of
`sink`.  Where they are not compatible, the type of the `sink` will be used
instead.  Returns a new tuple sink which wraps any sunk keys and values which
are not already of the correct type then sinks them to `sink`.

Source

Generated by Codox

Parkour 0.6.3 API documentation

Namespaces

Public Vars