JpmmlScoreFunction.java

JpmmlScoreFunction

The JpmmlScoreFunction provides a way to deploy a PMML compliant model that has been trained externally to the Kiji ecosystem to the Kiji model repository. The Kiji scoring server will calculate scores. This score function internally uses the Jpmml library to parse and evaluate PMML models.

A deployed JpmmlScoreFunction expects that the table it is attached to has:

  • One column containing a record with one field per predictor MiningField.
  • One column containing a record that will be the output from the model (one record field per predicted/output field).
  • The type of the fields in both the predictor record and result record must match the provided type in the pmml DataDictionary.
  • If the output column is setup with STRICT schema validation the schema that will be produced by the pmml model must be registered as a writer (you may also want to register the predictor column's expected reader schema).

The following additional constraints must be met to use the JpmmlScoreFunction:

Pmml-Avro Data Type Mapping

The JpmmlScoreFunction attempts to convert pmml data types into corresponding avro data types (if specified). Collections are currently not supported. If the type of a field is not specified, a best effort conversion will take place.

PMML Data Type Avro Data Type Notes
STRING string
INTEGER long Pmml has no "long" type
FLOAT float
DOUBLE double
BOOLEAN boolean
DATE/TIME/DATE_TIME string Formatted using ISO8601
DATE_DAYS_SINCE_0
DATE_DAYS_SINCE_1960
DATE_DAYS_SINCE_1970
DATE_DAYS_SINCE_1980
TIME_SECONDS
DATE_TIME_SECONDS_SINCE_0
DATE_TIME_SECONDS_SINCE_1960
DATE_TIME_SECONDS_SINCE_1970
DATE_TIME_SECONDS_SINCE_1980
int JPMML only provides these as integers.

Deploying a JpmmlScoreFunction with the 'model-repo pmml' tool

  1. Place the pmml xml file in a location accessible to your account on hdfs or your local filesystem.

  2. Ensure that the field names in the pmml xml file are valid Avro field names and that your model's name is of the form: artifact.model-version.

  3. Generate a model container descriptor by running the model-repo pmml tool against the generated pmml xml file with syntax:

     kiji model-repo pmml \
         --table=kiji://my/kiji/table \
         --model-file=file:///path/to/pmml/xml/file.xml \
         --model-name=nameOfModelInPmmlFile \
         --model-version=0.0.1 \
         --predictor-column=model:modelpredictor \
         --result-column=model:modelresult \
         --result-record-name=MyModelResult \
         --model-container=/path/to/write/model-container.json
    
  4. Create an empty jar (can't deploy right now without a jar file and JpmmlScoreFunction lives in the kiji-scoring jar):

     touch empty-file
     jar cf /path/to/empty-jar.jar empty-file
     rm empty-file
    
  5. Deploy the generated model container. The deps-resolver flag must be specified here even though it won't get used (KIJIREPO-47):

     kiji model-repo deploy nameOfModelInPmmlFile /path/to/empty-jar.jar \
         --kiji=kiji://my-model-repo/instance \
         --deps-resolver=maven \
         --production-ready=true \
         --model-container=/path/to/written/model-container.json \
         --message="Initial deployment of JPMML based model."
    
  6. Attach the JpmmlScoreFunction to the result column for the model (requires an active scoring server):

     # Freshness policy may need to be different depending on model.
     # This command will fail if the model's name is not of the form: artifact.model-version.
     kiji model-repo fresh-model \
         kiji://my-model-repo/instance \
         nameOfModelInPmmlFile \
         org.kiji.scoring.lib.AlwaysFreshen
    

After running through these steps, your model should be running/active. To validate that your newly deployed model is functioning:

# List active ScoreFunctions in the scoring server by model repo name and version.
curl <scoring.server.hostname>:<scoring-server-port>/admin/list