PMML ScoreFunction
JpmmlScoreFunction.java
JpmmlScoreFunction
The JpmmlScoreFunction provides a way to deploy a PMML compliant model that has been trained externally to the Kiji ecosystem to the Kiji model repository. The Kiji scoring server will calculate scores. This score function internally uses the Jpmml library to parse and evaluate PMML models.
A deployed JpmmlScoreFunction expects that the table it is attached to has:
- One column containing a record with one field per predictor MiningField.
- One column containing a record that will be the output from the model (one record field per predicted/output field).
- The type of the fields in both the predictor record and result record must match the provided type in the pmml DataDictionary.
- If the output column is setup with
STRICT
schema validation the schema that will be produced by the pmml model must be registered as a writer (you may also want to register the predictor column's expected reader schema).
The following additional constraints must be met to use the JpmmlScoreFunction:
- Must use a model that is supported: https://github.com/jpmml/jpmml-evaluator (see README).
- Must already have a trained model xml file (example).
- Model must not use PMML extensions.
- Model must not operate on sets (association rules models).
Pmml-Avro Data Type Mapping
The JpmmlScoreFunction attempts to convert pmml data types into corresponding avro data types (if specified). Collections are currently not supported. If the type of a field is not specified, a best effort conversion will take place.
PMML Data Type | Avro Data Type | Notes |
STRING | string | |
INTEGER | long | Pmml has no "long" type |
FLOAT | float | |
DOUBLE | double | |
BOOLEAN | boolean | |
DATE/TIME/DATE_TIME | string | Formatted using ISO8601 |
DATE_DAYS_SINCE_0 DATE_DAYS_SINCE_1960 DATE_DAYS_SINCE_1970 DATE_DAYS_SINCE_1980 TIME_SECONDS DATE_TIME_SECONDS_SINCE_0 DATE_TIME_SECONDS_SINCE_1960 DATE_TIME_SECONDS_SINCE_1970 DATE_TIME_SECONDS_SINCE_1980 |
int | JPMML only provides these as integers. |
Deploying a JpmmlScoreFunction with the 'model-repo pmml' tool
Place the pmml xml file in a location accessible to your account on hdfs or your local filesystem.
Ensure that the field names in the pmml xml file are valid Avro field names and that your model's name is of the form: artifact.model-version.
Generate a model container descriptor by running the model-repo pmml tool against the generated pmml xml file with syntax:
kiji model-repo pmml \ --table=kiji://my/kiji/table \ --model-file=file:///path/to/pmml/xml/file.xml \ --model-name=nameOfModelInPmmlFile \ --model-version=0.0.1 \ --predictor-column=model:modelpredictor \ --result-column=model:modelresult \ --result-record-name=MyModelResult \ --model-container=/path/to/write/model-container.json
Create an empty jar (can't deploy right now without a jar file and JpmmlScoreFunction lives in the kiji-scoring jar):
touch empty-file jar cf /path/to/empty-jar.jar empty-file rm empty-file
Deploy the generated model container. The deps-resolver flag must be specified here even though it won't get used (KIJIREPO-47):
kiji model-repo deploy nameOfModelInPmmlFile /path/to/empty-jar.jar \ --kiji=kiji://my-model-repo/instance \ --deps-resolver=maven \ --production-ready=true \ --model-container=/path/to/written/model-container.json \ --message="Initial deployment of JPMML based model."
Attach the JpmmlScoreFunction to the result column for the model (requires an active scoring server):
# Freshness policy may need to be different depending on model. # This command will fail if the model's name is not of the form: artifact.model-version. kiji model-repo fresh-model \ kiji://my-model-repo/instance \ nameOfModelInPmmlFile \ org.kiji.scoring.lib.AlwaysFreshen
After running through these steps, your model should be running/active. To validate that your newly deployed model is functioning:
# List active ScoreFunctions in the scoring server by model repo name and version.
curl <scoring.server.hostname>:<scoring-server-port>/admin/list
KijiScoring User Guide
- What is Scoring
- KijiFreshnessPolicy
- ScoreFunction
- FreshenerContext
- Lifecycle of a Freshener
- FreshKijiTableReader
- Metadata Management
- PMML ScoreFunction