Install Kiji BentoBox

If you don't have a working environment yet, install the standalone Kiji BentoBox in three quick steps!

Start a Kiji Cluster

  • If you plan to use a BentoBox, run the following command to set BentoBox-related environment variables and start the Bento cluster:
cd <path/to/bento>
source bin/kiji-env.sh
bento start

After BentoBox starts, it displays a list of useful ports for cluster webapps and services. The MapReduce JobTracker webapp (http://localhost:50030 in particular will be useful for this tutorial.

  • If you are running Kiji without a BentoBox, there are a few things you'll need to do to make sure your environment behaves the same way as a BentoBox:

Starting Kiji in Non-BentoBox Systems

  1. Make sure HDFS is installed and started.
  2. Make sure MapReduce is installed, that HADOOP_HOME is set to your MR distribution, and that MapReduce is started.
  3. Make sure HBase is installed, that HBASE_HOME is set to your hbase distribution, and that HBase is started.
  4. Export KIJI_HOME to the root of your kiji distribution.
  5. Export PATH=${PATH}:${KIJI_HOME}/bin.
  6. Export EXPRESS_HOME to the root of your kiji-express distribution.
  7. Export PATH=${PATH}:${EXPRESS_HOME}/bin

When the tutorial refers to the BentoBox, you'll know that you'll have to manage your Kiji cluster appropriately.

Set Tutorial-Specific Environment Variables

  • Define an environment variable named KIJI that holds a Kiji URI to the Kiji instance we'll use during this tutorial:
export KIJI=kiji://.env/kiji_express_music

The code for this tutorial is located in the ${KIJI_HOME}/examples/express-music/ directory. Commands in this tutorial will depend on this location.

  • Set a variable for the tutorial location:
export MUSIC_EXPRESS_HOME=${KIJI_HOME}/examples/express-music

Install Kiji

  • Install your Kiji instance:
kiji install --kiji=${KIJI}

Create Tables

The file music-schema.ddl defines table layouts that are used in this tutorial:

music-schema.ddl

  • Create the Kiji music tables that have layouts described in music-schema.ddl.
${KIJI_HOME}/schema-shell/bin/kiji-schema-shell --kiji=${KIJI} --file=${MUSIC_EXPRESS_HOME}/music-schema.ddl

This command uses kiji-schema-shell to create the tables using the KijiSchema DDL, which makes specifying table layouts easy. See the KijiSchema DDL Shell reference for more information on the KijiSchema DDL.

  • Verify the Kiji music tables were correctly created:
kiji ls ${KIJI}

You should see the newly-created songs and users tables:

kiji://localhost:2181/express_music/songs
kiji://localhost:2181/express_music/users

Upload Data to HDFS

HDFS stands for Hadoop Distributed File System. If you are running the BentoBox, it is running as a filesystem on your machine atop your native filesystem. This tutorial demonstrates loading data from HDFS into Kiji tables, which is a typical first step when creating KijiExpress applications.

  • Upload the data set to HDFS:
hadoop fs -mkdir express-tutorial
hadoop fs -copyFromLocal ${MUSIC_EXPRESS_HOME}/example_data/*.json express-tutorial/

You're now ready for the next step, Importing Data.

Kiji Administration Quick Reference

Here are some of the Kiji commands introduced on this page and a few more useful ones:

  • Start a BentoBox Cluster:
cd <path/to/bento>
source bin/kiji-env.sh
bento start
  • Stop your BentoBox Cluster:
bento stop
kiji install --kiji=<URI/of/instance>

The URI takes the form:

kiji://.env/<instance name>
  • Running KijiExpress jobs

To run a KijiExpress job, you invoke a command of the following form:

express job \
    [--libjars <list of JAR files, separated by colon>] \
    [--hdfs] \
    <job JAR file> <job class> [job-specific options]

The --hdfs option indicates that KijiExpress should run the job against the Hadoop cluster versus in Cascading's local environment. The --libjars option indicates additional JAR files needed to run the command.