I'm pretty new in SAP World and I’m trying to work with R Server installed within SAP HANA Studio (Version of HANA Studio : 2.3.8 & Version of R Server 3.4.0)
My tasks are:
Here is a small example of RLANG procedure for training a saving the model on HANA:
PROCEDURE "PA"."RF_TRAIN" (
IN data "PA"."IRIS",
OUT modelOut "PA"."TRAIN_MODEL"
)
LANGUAGE RLANG
SQL SECURITY INVOKER
DEFAULT SCHEMA "PA"
AS
BEGIN
require(randomForest)
require(dplyr)
require(pmml)
# iris <- as.data.frame(data)
data(iris)
iris <- iris %>% mutate(y = factor(ifelse(Species == "setosa", 1, 0)))
model <- randomForest(y~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, iris,
importance = TRUE,
ntree = 500)
modelOut <- as.data.frame(pmml(model))
END;
(Please don’t be confused, that I’m not using my input data for model training, this is not a real example)
Here is how a table with the model on SAP HANA should look like:
In this example training is working, but I’m not sure how to save the randomForest-Object on SAP HANA data base or how to convert the randomForest-Object into similar one in the picture.
Would appreciate any help :)
If you plan to use R server for your predictions, you can store your random Forest model as a BLOB
object in SAP HANA.
Following the SAP HANA R Integration Guide, you need to.
BLOB
attribute to your table "PA"."TRAIN_MODEL
.serialize
before writing it in your table.Unserialize
your model when calling predict procedure.Which would give, in your R script.
require(randomForest)
require(dplyr)
require(pmml)
generateRobjColumn <- function(...){
result <- as.data.frame(cbind(
lapply(
list(...),
function(x) if (is.null(x)) NULL else serialize(x, NULL)
)
))
names(result) <- NULL
names(result[[1]]) <- NULL
result
}
# iris <- as.data.frame(data)
data(iris)
iris <- iris %>% mutate(y = factor(ifelse(Species == "setosa", 1, 0)))
model <- randomForest(y~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, iris,
importance = TRUE,
ntree = 500)
modelOut <- data.frame(ID = 1, MODEL = generateRobjColumn(pmml(model)))
Note that you don't actually need to use pmml
if you plan to re-use the model as is.
In another procedure, you will need to call this table and unserialize your model for prediction.
CREATE PROCEDURE "PA"."RF_PREDICT" (IN data "PA"."IRIS", IN modelOut "PA"."TRAIN_MODEL", OUT result "PA"."PRED")
LANGUAGE RLANG AS
BEGIN
rfModel <- unserialize(modelOut$MODEL[[1]])
result <- predict(rfModel, newdata = data) # or whatever steps you need for prediction
END;