KijiExpress is a set of tools designed to make defining data processing MapReduce jobs quick and expressive, particularly for data stored in Kiji tables.

KijiExpress jobs are written in the Scala programming language, which gives you access to Java libraries and tools but is more concise and easier to write. In addition, KijiExpress gives you access to functionality for building complex MapReduce data pipelines by including the Scalding library, a Twitter sponsored open-source library for authoring flows of analytics-focused MapReduce jobs. KijiExpress is integrated with Avro to give you access to complex records and data types in your data transformation pipelines.

The core functionality of KijiExpress currently provides developers with tools to manipulate data in pipelines that provide considerable flexibility over trying to write MapReduce jobs directly in Java.

Using this Document

The first section of this document describes how to set up your environment.

Then there are sections on data concepts in Kiji, a Scala and Scalding introduction, and an example KijiExpress job that demonstrates and explains the Scalding concepts in action.

If you are already familiar with Scala and Scalding, you can skip directly to the last sections, which outline functionality specific to KijiExpress. These are the KijiExpress sources and Data flow operations in KijiExpress. The last section goes over how to run KijiExpress jobs.

Useful External References

Other Kiji References