This page is deprecated.
For a more up-to-date look at how to derive data using Producers, see the Music recommendation tutorial. This section is preserved for your reference, but the APIs referenced herein are deprecated and may be removed in a future release of KijiSchema.
Note that later sections of this tutorial rely on you running the commands in this section. But you should not model your future MapReduce analyses on the code in AddressFieldExtractor.
Your friends have been terribly disorganized about giving you their contact details. Being the perfectionist you are, you would like to be able to, at any given point, know how many friends you have in a certain zip code… because obviously, such questions need answering.
We’ll show you a way to decompose your contacts’ addresses into their street address, city, and zip code (the derived columns) to make it easier for you to get this information quickly.
The run function begins by creating an HBase configuration, and configuring the MapReduce task. Note that we need to ship certain jars that we depend on during the map task. Here's how we do this:
GenericTableMapReduceUtil.addAllDependencyJars(job); DistributedCacheJars.addJarsToDistributedCache(job, new File(System.getenv("KIJI_HOME"), "lib")); job.setUserClassesTakesPrecedence(true);
The AddressMapper extends Hadoop’s Mapper class. The map function is run per row of the Kiji table. It extracts the address field from each row as follows:
Address address = row.getMostRecentValue(Fields.INFO_FAMILY, Fields.ADDRESS);
Address is the same Avro type you read about on the Phonebook Importer page. The JSON description for it can be found at
$KIJI_HOME/examples/phonebook/src/main/avro/Address.avsc. More information about Avro types can be found here.
We decompose and write the individual fields into a derived column using
writer.put(...). For example, the zip code can be extracted from the Address object and written as follows:
writer.put(entityId, Fields.DERIVED_FAMILY, Fields.ZIP, address.getZip());
Running the Example
We assume that you have already imported the contacts from
$KIJI_HOME/examples/phonebook/input-data.txt into the phonebook Kiji table by this point. You can execute this example using the
kiji jar command with the class name:
$KIJI_HOME/bin/kiji jar \ $KIJI_HOME/examples/phonebook/lib/kiji-phonebook-1.0.0-rc4.jar \ org.kiji.examples.phonebook.AddressFieldExtractor
You can use the following command to see if your contacts’ address data was successfully extracted:
$KIJI_HOME/bin/kiji ls --kiji=kiji://.env/default/phonebook --columns=derived
Scanning kiji table: kiji://localhost:2181/default/phonebook/ U\x1EP\xC1\xF2c$7\xCC\xBA\xCB\x16\x10\x0F\x11\xDB  derived:addr1 1600 Pennsylvania Ave U\x1EP\xC1\xF2c$7\xCC\xBA\xCB\x16\x10\x0F\x11\xDB  derived:city Washington U\x1EP\xC1\xF2c$7\xCC\xBA\xCB\x16\x10\x0F\x11\xDB  derived:state DC U\x1EP\xC1\xF2c$7\xCC\xBA\xCB\x16\x10\x0F\x11\xDB  derived:zip 99999 ...