While Hadoop’s job tracker provides detailed information about jobs that have been run in the cluster, it is not a persistent data store for such information. Kiji tracks all historical jobs in a
job_history table within every instance. This information includes an xml dump of the full job configuration, start times, end times, and all job counters.
job_history table is installed in a particular instance as soon as the first MapReduce job is run in that instance.
You can verify that the table was installed properly using the ls command:
kiji ls kiji://.env/default/job_history
Jobs that extend
KijiMapReduceJob will automatically record metadata to the
JobHistoryKijiTable class is the main class responsible for providing access to the
job_history table. Currently it provides the ability to record and retrieve job metadata. This is a framework-audience class and subject to change between minor versions.
Using the API
JobHistoryKijiTable class surfaces the calls
getJobDetails(String jobId) and
getJobScanner() for retrieving the recorded metadata.
job_history table is a Kiji table under the hood, and can thus be inspected using the
kiji scan, and
kiji get tools. The
EntityId associated with the
job_history table is the jobId. For example, to look at all of the jobIds that have been recorded:
kiji scan kiji://.env/default/job_history/info:jobId
There is also a
kiji job_history tool, which displays the job history data in a more human readable format.
kiji job-history --kiji=kiji://.env/default/
To look up the job data for an individual job with jobId ‘job_20130221123621875_0001’, try:
kiji job-history --kiji=kiji://.env/default --job-id=job_20130221123621875_0001