parkour.io.avro documentation
dseq
(dseq schemas & paths)
Distributed sequence of Avro input, applying the vector of `schemas` as per
the arguments to `set-input`, and reading from `paths`.
dsink
(dsink schemas)
(dsink schemas path)
Distributed sink for writing Avro data files at `path`, or a transient path
if not provided. Configures output schemas as per application of `set-output`
to `schemas`.
dval
(dval value)
(dval schema value)
Avro distributed value, serialized using `schema` if provided or a plain
EDN-in-Avro schema if not.
set-grouping
(set-grouping job gs)
Configure `job` combine & reduce phases to group keys via schema `gs`,
which should be encoding-compatible with the map-output key schema.
set-map-output
(set-map-output job ks)
(set-map-output job ks vs)
Configure `job` map output to produce Avro with key schema `ks` and
optional value schema `vs`.
set-output
(set-output job ks)
(set-output job ks vs)
(set-output job ks vs gs)
Configure `job` output to produce Avro with key schema `ks` and
optional value schema `vs`. Configures job output format to match
when the output format has not been otherwise explicitly specified.
shuffle
(shuffle [ks])
(shuffle [ks vs])
(shuffle [ks vs gs])
Configuration step for Avro shuffle, with key schema `ks`, optional value
schema `vs`, and optional grouping schema `gs`.
task
(task f)
Returns a function which calls `f` with an `unwrap`ed tuple source
and expects `f` to return a `reduce`able object. Avro-wraps and sinks
all tuples from the resulting `reduce`able.
wrap-sink
(wrap-sink context)
Wrap task context for sinking Avro output.