In this section of the tutorial, we will import metadata about songs into the Kiji table songs, and import data about when users have listened to songs into the Kiji table users.

Stock Importers

The user data can be imported using a stock Kiji bulk importer with the command:

kiji bulk-import
-Dkiji.import.text.input.descriptor.path=express-tutorial/song-plays-import-descriptor.json
--importer=org.kiji.mapreduce.lib.bulkimport.JSONBulkImporter
--output="format=kiji table=${KIJI}/users nsplits=1"     --input="format=text
file=express-tutorial/song-plays.json"     --lib=${LIBS_DIR}

Custom Importers in KijiExpress

If Kiji’s stock bulk importers don’t fit your use case, you can also write import jobs in KijiExpress. This is what we’ve done to import the song metadata into the songs table.

The source code for this importer is at the bottom of this page, for interested readers. The syntax will be explained more in-depth in the next section.

KijiExpress programs or scripts can be run using the express command. Here, we’ll demonstrate how to run the the song metadata importer as a precompiled job contained in a jar file:

express job --libjars "${MUSIC_EXPRESS_HOME}/lib/*" \
    ${MUSIC_EXPRESS_HOME}/lib/kiji-express-music-0.4.0.jar \
    org.kiji.express.music.SongMetadataImporter \
    --input express-tutorial/song-metadata.json \
    --table-uri ${KIJI}/songs --hdfs

Verify Output

After running the importer, you can verify that the Kiji table songs contains the imported data using the kiji scan command.

kiji scan ${KIJI}/songs --max-rows=5

You should see something like:

Scanning kiji table: kiji://localhost:2181/kiji_express_music/songs/
entity-id=['song-32'] [1365548283995] info:metadata
                                 {"song_name": "song name-32", "artist_name": "artist-2", "album_name": "album-0", "genre": "genre1.0", "tempo": 120, "duration": 180}

entity-id=['song-49'] [1365548285203] info:metadata
                                 {"song_name": "song name-49", "artist_name": "artist-3", "album_name": "album-1", "genre": "genre4.0", "tempo": 150, "duration": 180}

entity-id=['song-36'] [1365548284255] info:metadata
                                 {"song_name": "song name-36", "artist_name": "artist-2", "album_name": "album-0", "genre": "genre1.0", "tempo": 90, "duration": 0}

entity-id=['song-10'] [1365548282517] info:metadata
                                 {"song_name": "song name-10", "artist_name": "artist-1", "album_name": "album-0", "genre": "genre5.0", "tempo": 160, "duration": 240}

entity-id=['song-8'] [1365548282382] info:metadata
                                 {"song_name": "song name-8", "artist_name": "artist-1", "album_name": "album-1", "genre": "genre5.0", "tempo": 140, "duration": 180}

We can also use the kiji scan command to verify the users table import was successful.

kiji scan ${KIJI}/users --max-rows=2 --max-versions=5

You should see something like:

entity-id=['user-28'] [1325739120000] info:track_plays
                                 song-25
entity-id=['user-28'] [1325739060000] info:track_plays
                                 song-23
entity-id=['user-28'] [1325738940000] info:track_plays
                                 song-25
entity-id=['user-28'] [1325738760000] info:track_plays
                                 song-28

entity-id=['user-2'] [1325736420000] info:track_plays
                                 song-4
entity-id=['user-2'] [1325736180000] info:track_plays
                                 song-3
entity-id=['user-2'] [1325735940000] info:track_plays
                                 song-4
entity-id=['user-2'] [1325735760000] info:track_plays
                                 song-28
entity-id=['user-2'] [1325735520000] info:track_plays
                                 song-0

Now that you’ve imported your data, we are ready to start analyzing it! The source code for the song metadata importer is included below in case you are curious. We will go over the syntax of writing your own jobs in more detail in following sections.

(Optional) Source Code for Scalding Importer

The data is formatted with a JSON record on each line. Each record corresponds to a song, and provides the following metadata for the song:

  • song id
  • song name
  • artist name
  • album name
  • genre
  • tempo
  • duration

The info:metadata column of the table contains an Avro record containing this relevant song metadata.

The importer looks like this:

SongMetadataImporter.scala