This page is deprecated.

For a more up-to-date look at how to derive data using Producers, see the Music recommendation tutorial. This section is preserved for your reference, but the APIs referenced herein are deprecated and may be removed in a future release of KijiSchema.

Note that later sections of this tutorial rely on you running the commands in this section. But you should not model your future MapReduce analyses on the code in AddressFieldExtractor.

Your friends have been terribly disorganized about giving you their contact details. Being the perfectionist you are, you would like to be able to, at any given point, know how many friends you have in a certain zip code... because obviously, such questions need answering.

We'll show you a way to decompose your contacts’ addresses into their street address, city, and zip code (the derived columns) to make it easier for you to get this information quickly.

AddressFieldExtractor.java

The run function begins by creating an HBase configuration, and configuring the MapReduce task. Note that we need to ship certain jars that we depend on during the map task. Here's how we do this:

GenericTableMapReduceUtil.addAllDependencyJars(job);
DistributedCacheJars.addJarsToDistributedCache(job,
    new File(System.getenv("KIJI_HOME"), "lib"));
job.setUserClassesTakesPrecedence(true);

The AddressMapper extends Hadoop's Mapper class. The map function is run per row of the Kiji table. It extracts the address field from each row as follows:

final Address address = row.getMostRecentValue(Fields.INFO_FAMILY, Fields.ADDRESS);

Address is the same Avro type you read about on the Phonebook Importer page. The JSON description for it can be found at $KIJI_HOME/examples/phonebook/src/main/avro/Address.avsc. More information about Avro types can be found here.

We decompose and write the individual fields into a derived column using mWriter.put(...). For example, the zip code can be extracted from the Address object and written as follows:

mWriter.put(entityId, Fields.DERIVED_FAMILY, Fields.ZIP, address.getZip());

Running the Example

We assume that you have already imported the contacts from $KIJI_HOME/examples/phonebook/input-data.txt into the phonebook Kiji table by this point. You can execute this example using the kiji jar command with the class name:

$KIJI_HOME/bin/kiji jar \
    $KIJI_HOME/examples/phonebook/lib/kiji-phonebook-1.1.5.jar \
    org.kiji.examples.phonebook.AddressFieldExtractor

Verify

You can use the following command to see if your contacts' address data was successfully extracted:

$KIJI_HOME/bin/kiji scan kiji://.env/default/phonebook/derived
Scanning kiji table: kiji://localhost:2181/default/phonebook/derived/
entity-id=['John,Doe'] [1384236064962] derived:addr1
                                 1600 Pennsylvania Ave
entity-id=['John,Doe'] [1384236064964] derived:city
                                 Washington
entity-id=['John,Doe'] [1384236064965] derived:state
                                 DC
entity-id=['John,Doe'] [1384236064967] derived:zip
                                 99999

...