All data stored in Kiji cells are serialized and deserialized using Apache Avro. Each logical unit of data in Avro has a type. The type of a datum is called a schema. A schema may be a simple primitive such as an integer or string, or it may be a composition of other schemas such as an array or record.

Avro data can be serialized and deserialized by several programming languages into types appropriate for that language. In Java, for example, data with an Avro INT schema is manifested as a java.lang.Integer object. A MAP schema is manifested as a java.util.Map. The full mapping from Avro schemas to Java types can be found in the Avro documentation.

Using Avro with KijiRowData

When implementing a gatherer's gather() method or a producer or bulk importer's produce() method, use the KijiRowData object to read data from the current Kiji table row. Avro serialization is taken care of for you; the call to getValue() or getMostRecentValue() will automatically return the type specified in the table layout. For example, to read an Avro string value from the most recent value of the info:name column, call KijiRowData.getMostRecentValue("info", "name"). It will be returned to you as a java.lang.CharSequence. If you are reading a cell with a complex compound schema, KijiSchema will return the corresponding Avro generated Java object type.

To write typed data into a Kiji cell from your producer or bulk importer's produce() method, use the context passed into KijiProducer's produce() method. The put() method is overloaded to accept a variety of Java types, including primitives and Avro types. Serialization is handled for you, so you can pass a complex Avro object directly to put(). For example, to write custom Address complex Avro type:

    final EntityId user = table.getEntityId("Abraham Lincoln");
    final Address addr = new Address();
    addr.setAddr1("1600 Pennsylvania Avenue");
    addr.setCity("Washington");
    addr.setState("DC");
    addr.setZip("20500");

    context.put(user, "info", "address", addr);

Note that the type of the value passed to put() must be compatible with the schema registered for the column in the Kiji table layout.

Using Avro in MapReduce

You may find it useful to read and write Avro data between your mappers and reducers. Jobs run by Kiji can use Avro data for MapReduce keys and values. To use Avro data as your gatherer, mapper, or reducer's output key, use the org.apache.mapred.AvroKey class. You must also specify the writer schema for your key by implementing the org.kiji.mapreduce.AvroKeyWriter interface. For example, to output an Integer key from a gatherer:

public class MyAvroGatherer
    extends KijiGatherer<AvroKey<Integer>, Text>
    implements AvroKeyWriter {
  // ...

  @Override
  protected void gather(KijiRowData input, GathererContext context)
      throws IOException, InterruptedException {
    // ...
    context.write(new AvroKey<Integer>(5), new Text("myvalue"));
  }

  @Override
  public Schema getAvroKeyWriterSchema(Configuration conf) throws IOException {
    return Schema.create(Schema.Type.INTEGER);
  }
}

Likewise, an org.apache.mapred.AvroValue may be used for Avro data as the output value. Implement the AvroValueWriter interface to specify the writer schema. To use Avro data as your bulk importer, mapper, or reducer's input key or value, wrap it in an AvroKey (or AvroValue for values) and implement AvroKeyReader (or AvroValueReader) to specify the reader schema.