sql-serverxml-parsingxquery-sql

Parse XML - Retrieve the Portion Between the Double Quotes


I have the following XML that is in an XML column in SQL Server. I am able to retrieve the data between the tags and list it in table format using the code at the bottom. I can retrieve the values between all the tags except for the one I have in bold below that is in double quotes. I can get the value X just fine but I need to get the 6 that is in between the double quotes in this part: <Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>

WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT ID, FilingYear, FilingPeriod, FilingType, [FileName]

, Organization501c3Ind = c.value('(//Organization501c3Ind/text())[1]','varchar(MAX)')
, Organization501cInd = c.value('(//Organization501cInd/text())[1]','varchar(MAX)')
, Organization501cTypeTxt = c.value('(//Organization501cTypeTxt/text())[1]','varchar(MAX)')


FROM Form990
    CROSS APPLY XMLData.nodes('//Return') AS t(c)
    CROSS APPLY XMLData.nodes('//Return/ReturnHeader/Filer') AS t2(c2)

XML:

<ReturnData documentCnt="2">
    <IRS990 documentId="IRS990-01" referenceDocumentId="IRS990ScheduleO-01" referenceDocumentName="IRS990ScheduleO ReasonableCauseExplanation" softwareId="19009670">
      <PrincipalOfficerNm>CAREY BAKER</PrincipalOfficerNm>
      <USAddress>
        <AddressLine1Txt>PO BOX 11275</AddressLine1Txt>
        <CityNm>TALLAHASSEE</CityNm>
        <StateAbbreviationCd>FL</StateAbbreviationCd>
        <ZIPCd>32302</ZIPCd>
      </USAddress>
      <GrossReceiptsAmt>104241</GrossReceiptsAmt>
      <GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
      <Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>

Thoughts?


Solution

  • Without a minimal reproducible example by the OP, shooting from the hip.

    SQL

    -- DDL and sample data population, start
    DECLARE @Form990 TABLE (ID INT IDENTITY PRIMARY KEY, XMLData XML);
    INSERT INTO @Form990(XMLData) VALUES
    (N'<Return xmlns="http://www.irs.gov/efile" returnVersion="2019v5.1">
        <ReturnData documentCnt="2">
            <IRS990 documentId="IRS990-01" referenceDocumentId="IRS990ScheduleO-01" referenceDocumentName="IRS990ScheduleO ReasonableCauseExplanation" softwareId="19009670">
                <PrincipalOfficerNm>CAREY BAKER</PrincipalOfficerNm>
                <USAddress>
                    <AddressLine1Txt>PO BOX 11275</AddressLine1Txt>
                    <CityNm>TALLAHASSEE</CityNm>
                    <StateAbbreviationCd>FL</StateAbbreviationCd>
                    <ZIPCd>32302</ZIPCd>
                </USAddress>
                <GrossReceiptsAmt>104241</GrossReceiptsAmt>
                <GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
                <Organization501c3Ind>X</Organization501c3Ind>
                <Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>
            </IRS990>
        </ReturnData>
    </Return>');
    -- DDL and sample data population, end
    
    WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
    SELECT -- ID, FilingYear, FilingPeriod, FilingType, [FileName]
        Organization501c3Ind = c.value('(Organization501c3Ind/text())[1]','varchar(MAX)')
        , Organization501cInd = c.value('(Organization501cInd/text())[1]','varchar(MAX)')
        , Organization501cTypeTxt = c.value('(Organization501cInd/@organization501cTypeTxt)[1]','varchar(MAX)')
    FROM @Form990
        CROSS APPLY XMLData.nodes('/Return/ReturnData/IRS990') AS t(c)
    

    Output

    +----------------------+---------------------+-------------------------+
    | Organization501c3Ind | Organization501cInd | Organization501cTypeTxt |
    +----------------------+---------------------+-------------------------+
    | X                    | X                   |                       6 |
    +----------------------+---------------------+-------------------------+