Motivation

While Hadoop’s job tracker provides detailed information about jobs that have been run in the cluster, it is not a persistent data store for such information. Kiji tracks all historical jobs in a job_history table within every instance. This information includes an xml dump of the full job configuration, start times, end times, and all job counters.

Setup

The job_history table is installed in a particular instance as soon as the first MapReduce job is run in that instance.

You can verify that the table was installed properly using the ls command:

kiji ls kiji://.env/default/job_history

Jobs that extend KijiMapReduceJob will automatically record metadata to the job_history table.

Classes Overview

The JobHistoryKijiTable class is the main class responsible for providing access to the job_history table.  Currently it provides the ability to record and retrieve job metadata. This is a framework-audience class and subject to change between minor versions.

Using the API

The JobHistoryKijiTable class surfaces the calls getJobDetails(String jobId) and getJobScanner() for retrieving the recorded metadata.

Example

The job_history table is a Kiji table under the hood, and can thus be inspected using the kiji ls, kiji scan, and kiji get tools. The EntityId associated with the job_history table is the jobId.  For example, to look at all of the jobIds that have been recorded:

kiji scan kiji://.env/default/job_history/info:jobId

There is also a kiji job_history tool, which displays the job history data in a more human readable format.

kiji job-history --kiji=kiji://.env/default/

To look up the job data for an individual job with jobId ‘job_20130221123621875_0001’, try:

kiji job-history --kiji=kiji://.env/default --job-id=job_20130221123621875_0001