xml-parsingsnowflake-cloud-data-platform

Snowflake XML varchar without removing trailing zero


I'm attempting to utilize GETXML and cast its output to VARCHAR, encountering various types of values, such as: 240.0, 102.4, 255.0 And textual data like MOZILLA/5.0 (WINDOWS; U; WINDOWS NT 6.0; PL; RV:1.8.1.14) GECKO/20080404 FIREFOX/2.0.0.14 WEBMONEY ADVISOR

the result for '240.0', for instance, is '240'. How can I retain the decimal point and get '240.0'? I can't use the NUMBER type since the values can also be VARCHAR, and the number of decimal points varies across the data types.

An example of the XML structure I'm working with is:

<DeviceInfo>
  <AdditionalInfo>
    <BrowserVersion>240.0</BrowserVersion>
  </AdditionalInfo>
</DeviceInfo>

In my query:

    SELECT
      GET(XMLGET(xmldata, 'BrowserVersion'), '$')::VARCHAR AS BrowserVersion
    FROM CLAIM;

I except to get 240.0 instead 240

Thanks a lot


Solution

  • This should probably be a support case for Snowflake to address the formatting of the value as a number. It's probably happening during the parse, before the casting to varchar so specifying that won't matter.

    As a workaround, rather than getting the value you can get the outer XML including the tag. That will preserve the formatting. From there, you can use a regular expression to strip away the XML tags.

    create or replace table claim as select parse_xml(
    '
    <DeviceInfo>
      <AdditionalInfo>
        <BrowserVersion>240.0</BrowserVersion>
      </AdditionalInfo>
    </DeviceInfo>
    ') as xmldata;
    
    
    select regexp_replace(XMLGET(XMLGET(xmldata, 'AdditionalInfo'), 'BrowserVersion')::varchar, '<[^>]+>') as BROWSER_VERSION
    from claim;
    

    This will output 240.0, preserving the trailing zero after the decimal.