I am trying to import xml file into snowflake database table. I created a table and XML file format. The XML file format is created using following code:
CREATE OR REPLACE FILE FORMAT LAND_XML.PUBLIC.XML_FILE_FORMAT
TYPE = 'XML'
COMPRESSION = 'AUTO'
PRESERVE_SPACE = FALSE
STRIP_OUTER_ELEMENT = TRUE
DISABLE_SNOWFLAKE_DATA = FALSE
DISABLE_AUTO_CONVERT = FALSE
IGNORE_UTF8_ERRORS = FALSE;
The XML file looks as follow:
<?xml version="1.0" encoding="utf-8" ?>
<NoticeOfChange version="1.0.0" application_version="v0.0.1-5780-g16dbd00e9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="https://prod.notices.govt/notices/">
<ProducedBy>
<Name>Land Department</Name>
<Contact>
<Name>Technical Support</Name>
<Phone>0800 xxx xxx</Phone>
<Email>customersupport@land.govt</Email>
</Contact>
</ProducedBy>
<Notices>
<Notice>
<NoticeId>577</NoticeId>
<NoticeType>NoticeOfChange</NoticeType>
<Description>Notification of change of ownership of rating unit</Description>
<Statutory>Under Local Government (Rating) Act 2020</Statutory>
</Notice>
<Notice>
<NoticeId>578</NoticeId>
<NoticeType>NoticeOfChange</NoticeType>
<Description>Notification of change of ownership of rating unit</Description>
<Statutory>Under Local Government (Rating) Act 2020</Statutory>
</Notice>
<Notice>
<NoticeId>579</NoticeId>
<NoticeType>NoticeOfChange</NoticeType>
<Description>Notification of change of ownership of rating unit</Description>
<Statutory>Under Local Government (Rating) Act 2020</Statutory>
</Notice>
<Notice>
<NoticeId>580</NoticeId>
<NoticeType>NoticeOfChange</NoticeType>
<Description>Notification of change of ownership of rating unit</Description>
<Statutory>Under Local Government (Rating) Act 2020</Statutory>
</Notice>
</Notices>
</NoticeOfChange>
When I imported XML file into snowflake database table, it showed only two rows (instead of expected 4 rows of notices). The output is shown in the following picture:
The current file format split the XML file into two rows based on <produced by>
and <notices>
tag. However, I am not interested in <produced by>
tag (want to discard it) and want to convert no of notices present in <notices>
tag into separate row of the table. Based on limited knowledge, I could not modify the file format into desired output I want. Any help would be appreciated ?
the root object is NoticeOfChange
so flattening on that would give you the objects of that, ProducedBy
and Notices
, as you note, you just want the latter, so flatten that sub object.. via xmlget(d.xml, 'Notices'):"$"
So using this CTE just for the data..
WITH data_table AS (
SELECT PARSE_XML('<?xml version="1.0" encoding="utf-8" ?>
<NoticeOfChange version="1.0.0" application_version="v0.0.1-5780-g16dbd00e9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="https://prod.notices.govt/notices/">
<ProducedBy>
<Name>Land Department</Name>
<Contact>
<Name>Technical Support</Name>
<Phone>0800 xxx xxx</Phone>
<Email>customersupport@land.govt</Email>
</Contact>
</ProducedBy>
<Notices>
<Notice>
<NoticeId>577</NoticeId>
<NoticeType>NoticeOfChange</NoticeType>
<Description>Notification of change of ownership of rating unit</Description>
<Statutory>Under Local Government (Rating) Act 2020</Statutory>
</Notice>
<Notice>
<NoticeId>578</NoticeId>
<NoticeType>NoticeOfChange</NoticeType>
<Description>Notification of change of ownership of rating unit</Description>
<Statutory>Under Local Government (Rating) Act 2020</Statutory>
</Notice>
<Notice>
<NoticeId>579</NoticeId>
<NoticeType>NoticeOfChange</NoticeType>
<Description>Notification of change of ownership of rating unit</Description>
<Statutory>Under Local Government (Rating) Act 2020</Statutory>
</Notice>
<Notice>
<NoticeId>580</NoticeId>
<NoticeType>NoticeOfChange</NoticeType>
<Description>Notification of change of ownership of rating unit</Description>
<Statutory>Under Local Government (Rating) Act 2020</Statutory>
</Notice>
</Notices>
</NoticeOfChange>') as xml
)
Accessing the Notices is done via:
SELECT
f.value as notice
FROM data_table AS d
,lateral flatten(input=>xmlget(d.xml, 'Notices'):"$")f;
giving:
NOTICE |
---|
<Notice> <NoticeId>577</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice> |
<Notice> <NoticeId>578</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice> |
<Notice> <NoticeId>579</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice> |
<Notice> <NoticeId>580</NoticeId> <NoticeType>NoticeOfChange</NoticeType> <Description>Notification of change of ownership of rating unit</Description> <Statutory>Under Local Government (Rating) Act 2020</Statutory> </Notice> |
At which point to can store or access the individual parts as you need.