Accessing Data
The KijiTableReader
class provides a get(...)
method to read typed data from a Kiji table row.
The row is addressed by its EntityId
(which can be retrieved from the KijiTable
instance
using the getEntityId()
method).
Specify the desired cells from the rows with a KijiDataRequest
.
See the KijiDataRequest
documentation for details.
In general, Kiji
and KijiTable
instances should only be opened once over the life of an application
(EntityIdFactory
s should also be reused).
KijiTablePool
can be used to maintain a pool of opened KijiTable
objects for reuse.
To initially open a KijiTable
:
// URI for Kiji instance « kiji_instance_name » in your default HBase instance:
final KijiURI kijiURI = KijiURI.newBuilder().withInstanceName("kiji_instance_name").build();
final Kiji kiji = Kiji.Factory.open(kijiURI);
try {
final KijiTable table = kiji.openTable("table_name");
try {
// Use the opened table:
// …
} finally {
// Always release the table you open:
table.release();
}
} finally {
// Always release the Kiji instances you open:
kiji.release();
}
To read from an existing KijiTable
,
create a KijiDataRequest
specifying the columns of data to return.
Then, query for the desired EntityId
,
using a KijiTableReader
.
You can get a KijiTableReader
for a KijiTable
using the openTableReader()
method.
For example:
final KijiTableReader reader = table.openTableReader();
try {
// Select which columns you want to read:
final KijiDataRequest dataRequest = KijiDataRequest.builder()
.addColumns(ColumnsDef.create().add("some_family", "some_qualifier"))
.build();
final EntityId entityId = table.getEntityId("your-row");
final KijiRowData rowData = reader.get(entityId, dataRequest);
// Use the row:
// …
} finally {
// Always close the reader you open:
reader.close();
}
The KijiTableReader
also implements a bulkGet(...)
method
for retrieving data for a list of EntityId
s.
This is more efficient than a series of calls to get(...)
because it uses a single RPC instead of one for each get.
Row scanners
If you need to process a range of rows, you may use a KijiRowScanner
:
final KijiTableReader reader = table.openTableReader();
try {
final KijiDataRequest dataRequest = KijiDataRequest.builder()
.addColumns(ColumnsDef.create().add("family", "qualifier"))
.build();
final KijiScannerOptions scanOptions = new KijiScannerOptions()
.setStartRow(table.getEntityId("the-start-row"))
.setStopRow(table.getEntityId("the-stop-row"));
final KijiRowScanner scanner = reader.getScanner(dataRequest, scanOptions);
try {
// Scan over the requested row range, in order:
for (KijiRowData row : scanner) {
// Process the row:
// …
}
} finally {
// Always close scanners:
scanner.close();
}
} finally {
// Always close table readers:
reader.close();
}
Modifying Data
The KijiTableWriter
class provides a put(...)
method to write or update cells in a Kiji table.
The cell is addressed by its entity ID, column family, column qualifier, and timestamp.
You can get a KijiTableWriter
for a KijiTable
using the openTableWriter()
method.
final KijiTableWriter writer = table.openTableWriter();
try {
// Write a string cell named "a_family:some_qualifier" to the row "the-row":
final long timestamp = System.currentTimeMillis();
final EntityId eid = table.getEntityId("the-row");
writer.put(eid, "a_family", "some_qualifier", timestamp, "Some value!");
writer.flush();
} finally {
// Always close the writers you open:
writer.close();
}
Note: the type of the value being written to the cell must match the type of the column declared in the table layout.
Atomic Modifications
The AtomicKijiPutter
class provides the ability
to perform atomic operations on Kiji tables. The atomic putter uses a begin, put, commit workflow
to construct transactions which are executed atomically by the underlying HBase table. Addressing
puts in the atomic writer is very similar to KijiTableWriter, except you do not need to specify an
entityId for every put, because they all must target the same row. When your transaction is ready,
write it by using
commit()
.
KijiTable table = ...
AtomicKijiPutter putter = table.getWriterFactory().openAtomicWriter();
try {
// Begin a transaction on the specified row:
putter.begin(entityId);
// Accumulate a set of puts to write atomically:
putter.put(family, qualifier, "value");
putter.put(family, qualifier2, "value2");
putter.put(family2, qualifier3, "value3");
// More puts...
// Write all puts atomically:
putter.commit();
} finally {
putter.close();
}
If you want to ensure that the table has not been modified while you accumulated puts, you can use
checkAndCommit(family, qualifier, value)
instead of commit(). This will write your puts and return true if the value in the
specified cell matches the value to check, otherwise it will return false and your puts will remain
in place so that you may attempt a new check.
KijiTable table = ...
final AtomicKijiPutter putter = table.getWriterFactory().openAtomicWriter();
final KijiTableReader reader = table.openTableReader();
final KijiDataRequest request = KijiDataRequest.create("meta", "modifications");
final long currentModificationCount =
reader.get(entityId, request).getMostRecentValue("meta", "modifications");
try {
// Begin a transaction on the specified row:
putter.begin(entityId);
// Accumulate a set of puts to write atomically:
putter.put(family, qualifier, "value");
// Increment a cell indicating the number of times the row has been modified.
putter.put("meta", "modifications", currentModificationCount + 1L);
// More puts...
// Ensure the row has not been modified while preparing the transaction.
if (putter.checkAndCommit("meta", "modifications", currentModificationCount)) {
LOG.info("Write successful.");
} else {
LOG.info("Write failed");
}
} finally {
putter.close();
}
Batched Modifications
When working with high traffic tables, minimization of remote procedure calls is critical. The
KijiBufferedWriter
class provides local buffering
to reduce RPCs and improve table performance. Using the buffered writer is very similar to the standard
KijiTableWriter with a few small differences. The buffered writer does not write put or delete operations
until the buffer becomes full or the user calls
flush
or
close
. The size of the local
buffer can be managed using
setBufferSize
. The buffered
writer cannot buffer increment operations because increment requires an immediate connection to a live
table.
KijiTable table = ...
KijiBufferedWriter bufferedWriter = table.getWriterFactory().openBufferedWriter();
// Set the buffer size to one megabyte
bufferedWriter.setBufferSize(1024L * 1024L);
try {
// Accumulate puts and deletes in the local buffer.
bufferedWriter.put(entityId, family, qualifier, timestamp, "value");
bufferedWriter.deleteCell(entityId, family, qualifier);
// Buffered operations may be written at any time when the buffer becomes full or on calls to
// flush() on this writer.
// Manually flush to ensure writes are committed.
flush();
} finally {
// Always close writers.
bufferedWriter.close();
}
Counters
Incrementing a counter value stored in a Kiji cell would normally require a "read-modify-write" transaction using a client-side row lock. Since row locks can cause contention, Kiji exposes a feature of HBase to do this more efficiently by pushing the work to the server side. To increment a counter value in a Kiji cell, the column must be declared with a schema of type "counter". See Managing Data for details on how to declare a counter in your table layout.
Columns containing counters may be accessed like other columns; counters are exposed as long integers.
In particular, the counter value may be retrieved using KijiTableReader.get(...)
and written using KijiTableWriter.put(...)
.
In addition to that, the KijiTableWriter
class also provides a method to atomically increment counter values.
final KijiTableWriter writer = table.openTableWriter();
try {
// Incrementing the counter type column "a_family:some_counter_qualifier" by 2:
final EntityId eid = table.getEntityId("the-row");
writer.increment(eid, "a_family", "some_counter_qualifier", 2);
writer.flush();
} finally {
// Always close the writer you open:
writer.close();
}
MapReduce
Deprecation Warning
This section refers to classes in the org.kiji.schema.mapreduce package that may be removed in the future. Please see the KijiMR Userguide for information on using MapReduce with Kiji.
The KijiTableInputFormat
provides the necessary functionality to read from a Kiji table in a
MapReduce job. To configure a job to read from a Kiji table, use KijiTableInputFormat
's
static setOptions
method. For example:
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf);
// * Setup jars to ship to the hadoop cluster.
job.setJarByClass(YourClassHere.class);
GenericTableMapReduceUtil.addAllDependencyJars(job);
DistributedCacheJars.addJarsToDistributedCache(job,
new File(System.getenv("KIJI_HOME"), "lib"));
job.setUserClassesTakesPrecedence(true);
// *
KijiDataRequest request = new KijiDataRequest()
.addColumn(new KijiDataRequest.Column("your-family", "your-qualifier"));
// Setup the InputFormat.
KijiTableInputFormat.setOptions(job, "your-kiji-instance-name", "the-table-name", request);
job.setInputFormatClass(KijiTableInputFormat.class);
The code contained within "// *" is responsible for shipping Kiji resources to the DistributedCache. This is so that all nodes within your hadoop cluster will have access to Kiji dependencies.
KijiTableInputFormat
outputs keys of type EntityId
and values of type KijiRowData
. This
data can be accessed from within a mapper:
@Override
public void map(EntityId entityId, KijiRowData row, Context context) {
// ...
}
To write to a Kiji table from a MapReduce job, you should use
KijiTableWriter
as before. You should also set
your OutputFormat class to NullOutputFormat
, so MapReduce doesn't expect to create
a directory full of text files on your behalf.
To configure a job to write to a Kiji table, refer to the following example:
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf);
// Setup jars to ship to the hadoop cluster.
job.setJarByClass(YourClassHere.class);
GenericTableMapReduceUtil.addAllDependencyJars(job);
DistributedCacheJars.addJarsToDistributedCache(job,
new File(System.getenv("KIJI_HOME"), "lib"));
job.setUserClassesTakesPrecedence(true);
// Setup the OutputFormat.
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(NullWritable.class);
job.setOutputFormatClass(NullOutputFormat.class);
And then, from within a Mapper:
public class MyMapper extends Mapper<LongWritable, Text, NullWritable, KijiOutput> {
private KijiTableWriter writer;
private Kiji kiji;
private KijiTable table;
@Override
public void setup(Context context) {
// Open a KijiTable for generating EntityIds.
kiji = Kiji.open("your-kiji-instance-name");
table = kiji.openTable("the-table-name");
// Create a KijiTableWriter that writes to a MapReduce context.
writer = table.openTableWriter();
}
@Override
public void map(LongWritable key, Text value, Context context) {
// ...
writer.put(table.getEntityId("your-row"), "your-family", "your-qualifier", value.toString());
}
@Override
public void cleanup(Context context) {
writer.close();
table.release();
kiji.release();
}
}
KijiSchema User Guide
- What is KijiSchema?
- Data Model
- Managing Data
- Accessing Data
- For Administrators
- Kiji Security
- DDL Shell Reference
- Command-Line Tool Reference
- FAQ