parkour.io.avro documentation

dseq

(dseq schemas & paths)
Distributed sequence of Avro input, applying the vector of `schemas` as per
the arguments to `set-input`, and reading from `paths`.

dsink

(dsink schemas)(dsink schemas path)
Distributed sink for writing Avro data files at `path`, or a transient path
if not provided.  Configures output schemas as per application of `set-output`
to `schemas`.

dval

(dval value)(dval schema value)
Avro distributed value, serialized using `schema` if provided or a plain
EDN-in-Avro schema if not.

set-grouping

(set-grouping job gs)
Configure `job` combine & reduce phases to group keys via schema `gs`,
which should be encoding-compatible with the map-output key schema.

set-input

(set-input job ks)(set-input job ks vs)
Configure `job` for Avro input with keys or keyvals using expected
schemas `ks` and `vs`.  Schemas may be `:default` to just directly use
input writer schema(s).

set-map-output

(set-map-output job ks)(set-map-output job ks vs)
Configure `job` map output to produce Avro with key schema `ks` and
optional value schema `vs`.

set-output

(set-output job ks)(set-output job ks vs)(set-output job ks vs gs)
Configure `job` output to produce Avro with key schema `ks` and
optional value schema `vs`.  Configures job output format to match
when the output format has not been otherwise explicitly specified.

shuffle

(shuffle [ks])(shuffle [ks vs])(shuffle [ks vs gs])
Configuration step for Avro shuffle, with key schema `ks`, optional value
schema `vs`, and optional grouping schema `gs`.

task

(task f)
Returns a function which calls `f` with an `unwrap`ed tuple source
and expects `f` to return a `reduce`able object.  Avro-wraps and sinks
all tuples from the resulting `reduce`able.

wrap-sink

(wrap-sink context)
Wrap task context for sinking Avro output.