amazon-web-servicesamazon-kinesisamazon-kinesis-analytics

Issue with AWS Kinesis SQL - Random Cut Forest algorithm


I have this code in an AWS Kinesis application:

CREATE OR REPLACE STREAM "OUT_FILE" (
        "fechaTS"              timestamp,
        "celda"                varchar(25),
        "Field1"               DOUBLE,
        "Field2"               DOUBLE,
        "ANOMALY_SCORE"        DOUBLE,
        "ANOMALY_EXPLANATION"  varchar(1024)
        );

CREATE OR REPLACE PUMP "PMP_OUT" AS
   INSERT INTO "OUT_FILE"
      SELECT STREAM 
        "fechaTS",
        "celda",
        "Field1",
        "Field2",
        "ANOMALY_SCORE",
        "ANOMALY_EXPLANATION"
      FROM TABLE(RANDOM_CUT_FOREST_WITH_EXPLANATION(
                 CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 300, 512, 8064, 4, true))
  WHERE "celda" = 'CELLNUMBER' 
         ;

I just expect the usual output of anomaly scores calculations per each input record.

Instead, I get this error mesage:

Number of numeric attributes should be less than or equal to 30 (Please check the documentation to know the supported numeric SQL types)

The number of numerical attributes I am feeding into the model is just 2. On the other hand, supported SQL numeric types are these, according with the documentation: DOUBLE, INTEGER, FLOAT, TINYINT, SMALLINT, REAL, and BIGINT. (I have tried also with FLOAT).

What am I doing wrong?


Solution

  • The solution is to define the variables as DOUBLE (or other accepted type), at the level of input schema: to define them as DOUBLE in SQL is not enough.

    I tried a JSON like this and worked:

    {"ApplicationName": "<myAppName>",
     "Inputs": [{
       "InputSchema": {
                "RecordColumns": [{"Mapping": "fechaTS", "Name": "fechaTS", "SqlType": "timestamp"},
                                  {"Mapping": "celda","Name": "celda","SqlType": "varchar(25)"},
                                  {"Mapping": "Field1","Name": "Field1","SqlType": "DOUBLE"},
                                  {"Mapping": "Field2","Name": "Field2","SqlType": "DOUBLE"},
                                  {"Mapping": "Field3","Name": "Field3","SqlType": "DOUBLE"}],
                "RecordFormat": {"MappingParameters": {"JSONMappingParameters": {"RecordRowPath": "$"}},
                                 "RecordFormatType": "JSON"}
                },
        "KinesisStreamsInput": {"ResourceARN": "<myInputARN>", "RoleARN": "<myRoleARN>"},
        "NamePrefix": "<myNamePrefix>"
        }]
      }
    

    Additional information: if you save this JSON in myJson.json, then issue this command:

    aws kinesisanalytics create-application --cli-input-json file://myJson.json
    

    AWS Command Line Interface (CLI) must be previously installed and configured.