sqlhana

Elusive Error "JDBC: [339]: invalid number: SQL error" in HANA SQL


My goal is selecting values using a WHERE clause on HANA SQL.

My_Table contains a column PartnerID of type NVARCHAR that stores an integer number.

The following works fine:

SELECT * FROM "My_Schema"."My_Table" WHERE 10 < TO_INTEGER("PartnerID");

giving the expected result.

However

SELECT * FROM "My_Schema"."My_Table" WHERE 10 > TO_INTEGER("PartnerID");

throws the following error:

JDBC: [339]: invalid number: SQL error

How is it possible that < works but the > operator fails?


Solution

  • This seemingly inconsistent behaviour is the result of how the HANA column store works.

    Strictly speaking, the result should be the same for both cases, but database systems are also just leaky abstractions on files sometimes.

    Now the mechanics of this behaviour is as follows: In the column store the data for each column, e.g. your "PartnerID" is stored in its own set of data structures.

    At first (right after a record has been created or updated) the information is held in what is called the "delta store" which is pretty much storing the column values as-is. This allows for relatively quick inserts/updates. After a while, HANA will take the data from the "delta store" and fit it into the actual column data structures ("main store").

    The most important characteristic of the column "main store" is that all values of a column are stored in a dictionary and assigned an internal value-ID. (the neat thing is that the size of this value-ID can change according to how many different values there are in a column. Fewer distinct values require fewer bits as keys --> less memory required --> faster scan speed)

    The actual use of a specific value is then marked in the column-value-vector: every table row corresponds to a specific point in this vector and the value-ID at this point determines which value is present in the row. So far, so easy. (In case this isn't easy/obvious/simple, there are plenty of really good materials available that explain how column stores and specifically the HANA one work.)

    Now, when HANA has to find a rows that match a condition in the WHERE clause, it needs to find out which value-IDs too look for in the value-vector. For something like 10 < TO_INTEGER("PartnerID") it needs to look into the dictionary and find entries larger than 10. I haven't mentioned it before, but the column dictionary is always sorted. This allows for binary search strategies.

    So, HANA can readily go and probe the dictionary until it finds the entry '10' and converts it to 10. Due to the sorted dictionary it can now just take any value larger than that, convert it and return the values.

    This works, as long as all the values from '10' upwards can be correctly converted to integers.

    For the opposite case, imagine the values below 10 are not convertible to integers. In this case, HANA will again find number 10 in the dictionary and now probes values lower than that and will try to convert those values. This fails obviously.

    So, this is the simple case for when this can happen. If the delta store contains entries than cannot be converted, this will also lead to the conversion error - in that case for both statements, however.

    There had been plenty of discussions in the past about how to gracefully handle such conversion issues. Search terms like HANA + IS_NUMERIC should yield the corresponding web pages.

    As the original question was "how could this happen in HANA?" I leave this answer at this point.