snowflake-cloud-data-platform

Snowflake: getting value from inner tags of xml file


I am trying to import xml file into snowflake database table. I created a table and XML file format. The XML file format is created using following code:

CREATE OR REPLACE FILE FORMAT LAND_XML.PUBLIC.XML_FILE_FORMAT 
TYPE = 'XML' 
COMPRESSION = 'AUTO' 
PRESERVE_SPACE = FALSE 
STRIP_OUTER_ELEMENT = TRUE 
DISABLE_SNOWFLAKE_DATA = FALSE 
DISABLE_AUTO_CONVERT = FALSE 
IGNORE_UTF8_ERRORS = FALSE; 

The XML file looks as follow:

<?xml version="1.0" encoding="utf-8" ?>
<NoticeOfChange version="1.0.0" application_version="v0.0.1-5780-g16dbd00e9"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:noNamespaceSchemaLocation="https://prod.notices.govt/notices/">
    <ProducedBy>
        <Name>Land Department</Name>
        <Contact>
            <Name>Technical Support</Name>
            <Phone>0800 xxx xxx</Phone>
            <Email>customersupport@land.govt</Email>
        </Contact>
    </ProducedBy>
    <Notices>
    <Notice>
            <NoticeId>577</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
   <Notice>
            <NoticeId>578</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
    <Notice>
            <NoticeId>579</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
   <Notice>
            <NoticeId>580</NoticeId>
            <NoticeType>NoticeOfChange</NoticeType>
            <Description>Notification of change of ownership of rating unit</Description>
            <Statutory>Under Local Government (Rating) Act 2020</Statutory>
   </Notice>
   </Notices>
</NoticeOfChange>

When I imported XML file into snowflake database table, it showed only two rows (instead of expected 4 rows of notices). The output is shown in the following picture: enter image description here

The current file format split the XML file into two rows based on <produced by> and <notices> tag. However, I am not interested in <produced by> tag (want to discard it) and want to convert no of notices present in <notices> tag into separate row of the table. Based on limited knowledge, I could not modify the file format into desired output I want. Any help would be appreciated ?


Solution

  • the root object is NoticeOfChange so flattening on that would give you the objects of that, ProducedBy and Notices, as you note, you just want the latter, so flatten that sub object.. via xmlget(d.xml, 'Notices'):"$"

    So using this CTE just for the data..

    WITH data_table AS (
        SELECT PARSE_XML('<?xml version="1.0" encoding="utf-8" ?>
    <NoticeOfChange version="1.0.0" application_version="v0.0.1-5780-g16dbd00e9"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                    xsi:noNamespaceSchemaLocation="https://prod.notices.govt/notices/">
        <ProducedBy>
            <Name>Land Department</Name>
            <Contact>
                <Name>Technical Support</Name>
                <Phone>0800 xxx xxx</Phone>
                <Email>customersupport@land.govt</Email>
            </Contact>
        </ProducedBy>
        <Notices>
        <Notice>
                <NoticeId>577</NoticeId>
                <NoticeType>NoticeOfChange</NoticeType>
                <Description>Notification of change of ownership of rating unit</Description>
                <Statutory>Under Local Government (Rating) Act 2020</Statutory>
       </Notice>
       <Notice>
                <NoticeId>578</NoticeId>
                <NoticeType>NoticeOfChange</NoticeType>
                <Description>Notification of change of ownership of rating unit</Description>
                <Statutory>Under Local Government (Rating) Act 2020</Statutory>
       </Notice>
        <Notice>
                <NoticeId>579</NoticeId>
                <NoticeType>NoticeOfChange</NoticeType>
                <Description>Notification of change of ownership of rating unit</Description>
                <Statutory>Under Local Government (Rating) Act 2020</Statutory>
       </Notice>
       <Notice>
                <NoticeId>580</NoticeId>
                <NoticeType>NoticeOfChange</NoticeType>
                <Description>Notification of change of ownership of rating unit</Description>
                <Statutory>Under Local Government (Rating) Act 2020</Statutory>
       </Notice>
       </Notices>
    </NoticeOfChange>') as xml
    )
    

    Accessing the Notices is done via:

    SELECT 
        f.value as notice
    FROM data_table AS d
        ,lateral flatten(input=>xmlget(d.xml, 'Notices'):"$")f;
    

    giving:

    NOTICE
    <Notice> <NoticeId>577</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice>
    <Notice> <NoticeId>578</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice>
    <Notice> <NoticeId>579</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice>
    <Notice> <NoticeId>580</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice>

    At which point to can store or access the individual parts as you need.