sql-serverxmlddldmlsqlxml

Create new tables & columns from XML data in SQL columns


I have a table in a SQL Server 2019 database with 5 XML typed columns and one key. This data is in CLIXML format. It is of course marketing information that is not within my control, and the rub -- I cannot know up-front the possible element names. Based on how they intend to consume the data, the elements need to be converted from XML and 'land' on new child tables.

What I need to achieve:

  1. Convert the XML data on the base table to new child tables (5 -- one for each column of XML)
  2. Use the lead_id as the PK on the new tables
  3. Create a trigger on INSERT on the base table, to dynamically examine the XML inserted, determine if the column(s) exist on the related child tables, and if NOT then ALTER the table to include the element as type nvarchar(MAX) NULLABLE and INSERT the data on the child tables.

Recall -- The element names within the xml columns are NOT known up-front, and are not consistent at the column level, each row has to be examined to determine if the column exists (on the yet-to-be-created child table).

Here is the basic structure of the base table that includes the 5 XML type columns:

CREATE TABLE [testCompany].[Marketing_FC_Leads_Objects_CLIXML]
(
    [lead_id] [int] NOT NULL,
    [lead_analysis] [xml] NULL,
    [additional_fields] [xml] NULL,
    [field_mappings] [xml] NULL,
    [customer_journey] [xml] NULL,
    [mapped_fields] [xml] NULL
)

Here is an example of a single attribute that exists in one of those rows, which should, in this case target the not-yet-existent mapped_fields TABLE, and would need to include the lead_id from the base table as the PK on the child table, and then ALL the variable number of elements exploded into a new table.

<Objs xmlns="http://schemas.microsoft.com/powershell/2004/04" Version="1.1.0.1">
  <Obj RefId="0">
    <TN RefId="0">
      <T>Selected.System.Management.Automation.PSNoteProperty</T>
      <T>System.Management.Automation.PSCustomObject</T>
      <T>System.Object</T>
    </TN>
    <MS>
      <S N="Name">mapped_fields</S>
      <Obj N="Value" RefId="1">
        <TN RefId="1">
          <T>System.Management.Automation.PSCustomObject</T>
          <T>System.Object</T>
        </TN>
        <MS>
          <S N="Caller ZIP Code">37207</S>
          <S N="Contact City">Nashville</S>
          <S N="Contact State">TN</S>
          <S N="Caller Country">US</S>
          <S N="New Client?">YES</S>
          <S N="Category of Interest">Not Set</S>
          <S N="Contact Name">Jordan Smith</S>
          <S N="Contact Phone">+15551231234</S>
        </MS>
      </Obj>
    </MS>
  </Obj>
</Objs>

To add further clarity to the above, the elements that would be added to a new mapped_fields table in the above example are: Caller ZIP Code, Contact City, Contact State, Caller Country, New Client?, Category of Interest, Contact Name, and Contact Phone.

I am completely lost on this, and have been told to 'make it work'.


Solution

  • Please try the following conceptual example as a starting point.

    As it was already suggested in the comments there is no need to create actual tables.

    Instead you can create views, one view per XML column, and shred XML on the fly.

    SQL

    -- DDL and sample data population, start
    DECLARE @Marketing_FC_Leads_Objects_CLIXML TABLE (
        [lead_id] [int] PRIMARY KEY NOT NULL,
        [lead_analysis] [xml] NULL,
        [additional_fields] [xml] NULL,
        [field_mappings] [xml] NULL,
        [customer_journey] [xml] NULL,
        [mapped_fields] [xml] NULL
    );
    INSERT @Marketing_FC_Leads_Objects_CLIXML
    (
        lead_id,
        lead_analysis,
        additional_fields,
        field_mappings,
        customer_journey,
        mapped_fields
    )
    VALUES
    (   1,    -- lead_id - int
        NULL, -- lead_analysis - xml
        NULL, -- additional_fields - xml
        N'<Objs xmlns="http://schemas.microsoft.com/powershell/2004/04" Version="1.1.0.1">
            <Obj RefId="0">
                <TN RefId="0">
                    <T>Selected.System.Management.Automation.PSNoteProperty</T>
                    <T>System.Management.Automation.PSCustomObject</T>
                    <T>System.Object</T>
                </TN>
                <MS>
                    <S N="Name">mapped_fields</S>
                    <Obj N="Value" RefId="1">
                        <TN RefId="1">
                            <T>System.Management.Automation.PSCustomObject</T>
                            <T>System.Object</T>
                        </TN>
                        <MS>
                            <S N="Caller ZIP Code">37207</S>
                            <S N="Contact City">Nashville</S>
                            <S N="Contact State">TN</S>
                            <S N="Caller Country">US</S>
                            <S N="New Client?">YES</S>
                            <S N="Category of Interest">Not Set</S>
                            <S N="Contact Name">Jordan Smith</S>
                            <S N="Contact Phone">+15551231234</S>
                        </MS>
                    </Obj>
                </MS>
            </Obj>
        </Objs>', -- field_mappings - xml
        NULL, -- customer_journey - xml
        NULL  -- mapped_fields - xml
        );
    -- DDL and sample data population, end
    
    ;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/powershell/2004/04')
    SELECT lead_id
        , c.value('(S[@N="Caller ZIP Code"]/text())[1]', 'CHAR(5)') AS [Caller ZIP Code]
        , c.value('(S[@N="Contact City"]/text())[1]', 'VARCHAR(100)') AS [Contact City]
        , c.value('(S[@N="Contact State"]/text())[1]', 'CHAR(2)') AS [Contact State]
        , c.value('(S[@N="Caller Country"]/text())[1]', 'CHAR(2)') AS [Caller Country]
        , c.value('(S[@N="New Client?"]/text())[1]', 'VARCHAR(10)') AS [New Client]
        , c.value('(S[@N="Category of Interest"]/text())[1]', 'VARCHAR(100)') AS [New Client]
        , c.value('(S[@N="Contact Name"]/text())[1]', 'VARCHAR(100)') AS [Contact Name]
        , c.value('(S[@N="Contact Phone"]/text())[1]', 'VARCHAR(20)') AS [Contact Phone]
    FROM @Marketing_FC_Leads_Objects_CLIXML
    CROSS APPLY field_mappings.nodes('/Objs/Obj/MS/Obj/MS') AS t(c);
    

    Output

    lead_id Caller ZIP Code Contact City Contact State Caller Country New Client New Client Contact Name Contact Phone
    1 37207 Nashville TN US YES Not Set Jordan Smith +15551231234