Derive Data
This page is deprecated.
For a more up-to-date look at how to derive data using Producers, see the Music recommendation tutorial. This section is preserved for your reference, but the APIs referenced herein are deprecated and may be removed in a future release of KijiSchema.
Note that later sections of this tutorial rely on you running the commands in this section. But you should not model your future MapReduce analyses on the code in AddressFieldExtractor.
Your friends have been terribly disorganized about giving you their contact details. Being the perfectionist you are, you would like to be able to, at any given point, know how many friends you have in a certain zip code... because obviously, such questions need answering.
We'll show you a way to decompose your contacts’ addresses into their street address, city, and zip code (the derived columns) to make it easier for you to get this information quickly.
AddressFieldExtractor.java
The run function begins by creating an HBase configuration, and configuring the MapReduce task. Note that we need to ship certain jars that we depend on during the map task. Here\'s how we do this:
GenericTableMapReduceUtil.addAllDependencyJars(job);
DistributedCacheJars.addJarsToDistributedCache(job,
new File(System.getenv("KIJI_HOME"), "lib"));
job.setUserClassesTakesPrecedence(true);
The AddressMapper extends Hadoop's Mapper class. The map function is run per row of the Kiji table. It extracts the address field from each row as follows:
final Address address = row.getMostRecentValue(Fields.INFO_FAMILY, Fields.ADDRESS);
Address is the same Avro type you read about on the
Phonebook Importer page. The JSON
description for it can be found at
$KIJI_HOME/examples/phonebook/src/main/avro/Address.avsc
. More information
about Avro types can be found
here.
We decompose and write the individual fields into a derived column using mWriter.put(...)
. For
example, the zip code can be extracted from the Address object and written as follows:
mWriter.put(entityId, Fields.DERIVED_FAMILY, Fields.ZIP, address.getZip());
Running the Example
We assume that you have already imported the contacts from
$KIJI_HOME/examples/phonebook/input-data.txt
into the phonebook Kiji table by this point.
You can execute this example using the kiji jar
command with the class name:
$KIJI_HOME/bin/kiji jar \
$KIJI_HOME/examples/phonebook/lib/kiji-phonebook-1.1.3.jar \
org.kiji.examples.phonebook.AddressFieldExtractor
Verify
You can use the following command to see if your contacts' address data was successfully extracted:
$KIJI_HOME/bin/kiji scan kiji://.env/default/phonebook/derived
Scanning kiji table: kiji://localhost:2181/default/phonebook/derived/
entity-id=['John,Doe'] [1384236064962] derived:addr1
1600 Pennsylvania Ave
entity-id=['John,Doe'] [1384236064964] derived:city
Washington
entity-id=['John,Doe'] [1384236064965] derived:state
DC
entity-id=['John,Doe'] [1384236064967] derived:zip
99999
...
Phonebook Tutorial
- Overview
- Setup
- Create a Table
- Read and Write in Kiji
- Import Data
- Derive Data
- Use Counters
- Delete Contacts