Aerospike: how to bulk load a list of integers into a bin?

I'm trying to use the Aerospike bulk loader to seed a cluster with data from a tab-separated file.

The source data looks like this:

set key segments
segment 123 10,20,30,40,50
segment 234 40,50,60,70

The third column, 'segments', contains a comma separated list of integers.

I created a JSON template:

{
  "version" : "1.0",
  "input_type" : "csv",
  "csv_style": { "delimiter": " " , "n_columns_datafile": 3, "ignore_first_line": true}

  "key": {"column_name":"key", "type": "integer"},

  "set": { "column_name":"set" , "type": "string"},

  "binlist": [
    {"name": "segments",
      "value": {"column_name": "segments", "type": "list"}
    }
  ]
}

... and ran the loader:

java -cp aerospike-load-1.1-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c template.json data.tsv

When I query the records in aql, they seem to be a list of strings:

aql> select * from test
+--------------------------------+
| segments                       |
+--------------------------------+
| ["10", "20", "30", "40", "50"] |
| ["40", "50", "60", "70"]       |
+--------------------------------+

The data I'm trying to store is a list of integers. Is there an easy way to convert the objects stored in this bin to a list of integers (possibly a Lua UDF) or perhaps there's a tweak that can be made to the bulk loader template?

Update:

I attempted to solve this by creating a Lua UDF to convert the list from strings to integers:

function convert_segment_list_to_integers(rec)
    for i=1, table.maxn(rec['segments']) do
        rec['segments'][i] = math.floor(tonumber(rec['segments'][i]))
    end
    aerospike:update(rec)
end

... registered it:

aql> register module 'convert_segment_list_to_integers.lua'

... and then tried executing against my set:

aql> execute convert_segment_list_to_integers.convert_segment_list_to_integers() on test.segment

I enabled some more verbose logging and notice that the UDF is throwing an error. Apparently, it's expecting a table and it was passed userdata:

Dec 04 2015 23:23:34 GMT: DEBUG (udf): (udf_rw.c:send_result:527) FAILURE when calling convert_segment_list_to_integers convert_segment_list_to_integers ...rospike/usr/udf/lua/convert_segment_list_to_integers.lua:2: bad argument #1 to 'maxn' (table expected, got userdata)
Dec 04 2015 23:23:34 GMT: DEBUG (udf): (udf_rw.c:send_udf_failure:407) Non-special LDT or General UDF Error(...rospike/usr/udf/lua/convert_segment_list_to_integers.lua:2: bad argument #1 to 'maxn' (table expected, got userdata))

It seems that maxn isn't an applicable method to a userdata object.

Can you see what needs to be done to fix this?

Solution

To convert your lists with string values to lists of integer values you can run the following record udf:

function convert_segment_list_to_integers(rec)
        local list_with_ints = list()
        for value in list.iterator(rec['segments']) do
                local int_value = math.floor(tonumber(value))
                list.append(list_with_ints, int_value)
        end
        rec['segments'] = list_with_ints
        aerospike:update(rec)
end

When you edit your existing lua module, make sure to re-run register module 'convert_segment_list_to_integers.lua'.

The cause of this issue is within the aerospike-loader tool: it will always assume/enforce strings as you can see in the following java code:

case LIST:
    /*
     * Assumptions
     * 1. Items are separated by a colon ','
     * 2. Item value will be a string
     * 3. List will be in double quotes
     * 
     * No support for nested maps or nested lists
     * 
     */
    List<String> list = new ArrayList<String>();
    String[] listValues = binRawText.split(Constants.LIST_DELEMITER, -1);
    if (listValues.length > 0) {
        for (String value : listValues) {
            list.add(value.trim());
        }
        bin = Bin.asList(binColumn.getBinNameHeader(), list);
    } else {
        bin = null;
        log.error("Error: Cannot parse to a list: " + binRawText);
    }
    break;

Source on Github: http://git.io/vRAQW

If you prefer, you can modify this code and re-compile to always assume integer list values. Change line 266 and 270 to something like this (untested):

List<Integer> list = new ArrayList<Integer>(); 
list.add(Integer.parseInt(value.trim());