luaaerospikeaerospike-loader

Aerospike: how to bulk load a list of integers into a bin?


I'm trying to use the Aerospike bulk loader to seed a cluster with data from a tab-separated file.

The source data looks like this:

set key segments
segment 123 10,20,30,40,50
segment 234 40,50,60,70

The third column, 'segments', contains a comma separated list of integers.

I created a JSON template:

{
  "version" : "1.0",
  "input_type" : "csv",
  "csv_style": { "delimiter": " " , "n_columns_datafile": 3, "ignore_first_line": true}

  "key": {"column_name":"key", "type": "integer"},

  "set": { "column_name":"set" , "type": "string"},

  "binlist": [
    {"name": "segments",
      "value": {"column_name": "segments", "type": "list"}
    }
  ]
}

... and ran the loader:

java -cp aerospike-load-1.1-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c template.json data.tsv

When I query the records in aql, they seem to be a list of strings:

aql> select * from test
+--------------------------------+
| segments                       |
+--------------------------------+
| ["10", "20", "30", "40", "50"] |
| ["40", "50", "60", "70"]       |
+--------------------------------+

The data I'm trying to store is a list of integers. Is there an easy way to convert the objects stored in this bin to a list of integers (possibly a Lua UDF) or perhaps there's a tweak that can be made to the bulk loader template?

Update:

I attempted to solve this by creating a Lua UDF to convert the list from strings to integers:

function convert_segment_list_to_integers(rec)
    for i=1, table.maxn(rec['segments']) do
        rec['segments'][i] = math.floor(tonumber(rec['segments'][i]))
    end
    aerospike:update(rec)
end

... registered it:

aql> register module 'convert_segment_list_to_integers.lua'

... and then tried executing against my set:

aql> execute convert_segment_list_to_integers.convert_segment_list_to_integers() on test.segment

I enabled some more verbose logging and notice that the UDF is throwing an error. Apparently, it's expecting a table and it was passed userdata:

Dec 04 2015 23:23:34 GMT: DEBUG (udf): (udf_rw.c:send_result:527) FAILURE when calling convert_segment_list_to_integers convert_segment_list_to_integers ...rospike/usr/udf/lua/convert_segment_list_to_integers.lua:2: bad argument #1 to 'maxn' (table expected, got userdata)
Dec 04 2015 23:23:34 GMT: DEBUG (udf): (udf_rw.c:send_udf_failure:407) Non-special LDT or General UDF Error(...rospike/usr/udf/lua/convert_segment_list_to_integers.lua:2: bad argument #1 to 'maxn' (table expected, got userdata))

It seems that maxn isn't an applicable method to a userdata object.

Can you see what needs to be done to fix this?


Solution

  • To convert your lists with string values to lists of integer values you can run the following record udf:

    function convert_segment_list_to_integers(rec)
            local list_with_ints = list()
            for value in list.iterator(rec['segments']) do
                    local int_value = math.floor(tonumber(value))
                    list.append(list_with_ints, int_value)
            end
            rec['segments'] = list_with_ints
            aerospike:update(rec)
    end
    

    When you edit your existing lua module, make sure to re-run register module 'convert_segment_list_to_integers.lua'.

    The cause of this issue is within the aerospike-loader tool: it will always assume/enforce strings as you can see in the following java code:

    case LIST:
        /*
         * Assumptions
         * 1. Items are separated by a colon ','
         * 2. Item value will be a string
         * 3. List will be in double quotes
         * 
         * No support for nested maps or nested lists
         * 
         */
        List<String> list = new ArrayList<String>();
        String[] listValues = binRawText.split(Constants.LIST_DELEMITER, -1);
        if (listValues.length > 0) {
            for (String value : listValues) {
                list.add(value.trim());
            }
            bin = Bin.asList(binColumn.getBinNameHeader(), list);
        } else {
            bin = null;
            log.error("Error: Cannot parse to a list: " + binRawText);
        }
        break;
    

    Source on Github: http://git.io/vRAQW

    If you prefer, you can modify this code and re-compile to always assume integer list values. Change line 266 and 270 to something like this (untested):

    List<Integer> list = new ArrayList<Integer>(); 
    list.add(Integer.parseInt(value.trim());