sqlsnowflake-cloud-data-platformsubstring

SQL string numeric value range comparison


I have a string like A18.30 and I want to compare if this falls under these 2 min and max values: A17.0 and A19.0.

  1. A18.30 => A17.0 between A19.0 [correct]
  2. P1A.5 => P1.0 between A2.0 [In-correct]

Expectation is first letter needs to compared and remaining numeric digits, if it falls into the range to be compared.

I have tried with substring, but I want to know if there is a better way to do the comparison with regex. I use snowflake db.


Solution

  • It looks like the values that are compared are sort of "versions" which requires "natural sort". It can be achieved with SQL:

    SELECT value, lower_limit, upper_limit,
      TRANSFORM(REGEXP_EXTRACT_ALL(value,'(\\d+|[^\\d]+)'),e->IFF(e RLIKE '\\d+',e::INT,e)) 
      BETWEEN TRANSFORM(REGEXP_EXTRACT_ALL(lower_limit,'(\\d+|[^\\d]+)'),e->IFF(e RLIKE '\\d+',e::INT,e))
          AND TRANSFORM(REGEXP_EXTRACT_ALL(upper_limit,'(\\d+|[^\\d]+)'),e->IFF(e RLIKE '\\d+',e::INT,e)) 
      AS is_value_between_lower_upper_limit
    FROM VALUES ('A18.30','A17.0','A19.0'),
                ('P1A.5', 'P1.0', 'A2.0') AS s(value, lower_limit, upper_limit);
    

    Output:

    enter image description here

    More at: How to sort "version" strings with SQL in Snowflake?


    Alternatively by using UDF:

    create or replace function natural_sort(str TEXT)
    returns array
    language python
    runtime_version = '3.11'
    packages = ('natsort')
    handler = 'sort'
    as
    $$
    import natsort
    def sort(str):
      return list(natsort.natsort_key(str))
    $$;
    
    SELECT value, lower_limit, upper_limit,
     natural_sort(value) BETWEEN natural_sort(lower_limit) 
                         AND natural_sort(upper_limit) AS is_value_between_lower_upper_limit
    FROM VALUES ('A18.30','A17.0','A19.0'),
                ('P1A.5', 'P1.0', 'A2.0') AS s(value, lower_limit, upper_limit);
    

    Output:

    enter image description here