sqljsonxmlnormalizationstructured-data

When can I save JSON or XML data in an SQL Table


When using SQL or MySQL (or any relational DB for that matter) - I understand that saving the data in regular columns is better for indexing sake and other purposes...

The thing is loading and saving JSON data is sometimes a lot more simple - and makes the development easier.

Are there any "golden rules" for saving raw JSON data in the DB?

Is it absolutely a bad practice to do so?


Solution

  • The main questions are

    JSON (like XML) is great for data exchange, small storage and generically defined structures, but it cannot participate in typical actions you run within your RDBMS. In most cases it will be better to transfer your JSON data into normal tables and re-create the JSON when you need it.

    XML / JSON and 1.NF

    The first rule of normalisation dictates, never to store more than one bit of information into one column. You see a column "PersonName" with a value like "Mickey Mouse"? You point to this and cry: Change that immediately!

    What about XML or JSON? Are these types breaking 1.NF? Well, yes and no... 

    It is perfectly okay to store a complete structure as one bit of information if it is one bit of information actually. You get a SOAP response and want to store it because you might need this for future reference (but you will not use this data for your own processes)? Just store it as is!

    Now imagine a complex structure (XML or JSON) representing a person (with its address, further details...). Now you put this into one column as PersonInCharge. Is this wrong? Shouldn't this rather live in properly designed related tables with a foreign key reference instead of the XML/JSON? Especially if the same person might occur in many different rows it is definitely wrong to use an XML/JSON approach.

    But now imagine the need to store historical data. You want to persist the person's data for a given moment in time. Some days later the person tells you a new address? No problem! The old address lives in an XML/JSON if you ever need it...

    Conclusion: If you store the data just to keep it, it's okay. If this data is a unique portion, it's okay...
    But if you need the internal parts regularly or if this would mean redundant duplicate storage it's not okay...

    Physical storage

    The following is for SQL Server and might be different on other RDBMs.

    XML is not stored as the text you see, but as a hierarchy tree. Querying this is astonishingly well performing! This structure is not parsed on string level!
    JSON in SQL Server (2016+) lives in a string and must be parsed. There is no real native JSON type (like there is a native XML type). This might come later, but for now I'd assume, that JSON will not be as performant as XML on SQL Server (see section UPDATE 2). Any need to read a value out of JSON will need a hell of lot of hidden string method calls...

    What does this mean for you?

    your lovable DB artist :-D knows, that storing JSON as is, is against common principles of RDBMs. He knows,

    There are some workarounds (depending on the RDBMS you are using), but most of them don't work the way you'd like it...

    The answer to your question in short

    YES

    NO

    You might start with the JSON within a string column or as BLOB and change this to physical tables when you need it. My magic crystal ball tells me, this might be tomorrow :-D

    UPDATE

    Find some ideas about performance and disc space here: https://stackoverflow.com/a/47408528/5089204

    UPDATE 2: More about performance...

    The following addresses JSON and XML support in SQL-Server 2016

    User @mike123 pointed to an article on an official microsoft blog which seems to proof in an experiment, that querying a JSON is 10 x faster then querying an XML in SQL-Server.

    Some thoughts about that:

    Some cross-checks with the "experiment":

    The following code will show a more realistic experiment

    The final result shows clearly, that JSON is slower than XML (not that much, about 1.5x on a still very simple example).

    The final statement:

    The test code

    USE master;
    GO
    --create a clean database
    CREATE DATABASE TestJsonXml;
    GO
    USE TestJsonXml;
    GO
    --create tables
    CREATE TABLE TestTbl1(ID INT IDENTITY,SomeXml XML);
    CREATE TABLE TestTbl2(ID INT IDENTITY,SomeJson NVARCHAR(MAX));
    CREATE TABLE Target1(SomeString NVARCHAR(MAX));
    CREATE TABLE Target2(SomeString NVARCHAR(MAX));
    CREATE TABLE Times(Test VARCHAR(10),Diff INT)
    GO
    --insert 10000 XMLs into TestTbl1
    WITH Tally AS(SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL))*2 AS Nmbr FROM master..spt_values AS v1 CROSS APPLY master..spt_values AS v2)
    INSERT INTO TestTbl1(SomeXml)
    SELECT 
    N'<Root>
        <Products>
        <ProductDescription>
            <Features>
                <Maintenance>' + CAST(Nmbr AS NVARCHAR(10)) + ' year parts and labor extended maintenance is available</Maintenance>
                <Warranty>1 year parts and labor</Warranty>
            </Features>
            <ProductID>' + CAST(Nmbr AS NVARCHAR(10)) + '</ProductID>
            <ProductName>Road Bike</ProductName>
        </ProductDescription>
        <ProductDescription>
            <Features>
                <Maintenance>' + CAST(Nmbr + 1 AS NVARCHAR(10)) + ' blah</Maintenance>
                <Warranty>1 year parts and labor</Warranty>
            </Features>
            <ProductID>' + CAST(Nmbr + 1 AS NVARCHAR(10)) + '</ProductID>
            <ProductName>Cross Bike</ProductName>
        </ProductDescription>
        </Products>
    </Root>'
    FROM Tally;
    
    --insert 10000 JSONs into TestTbl2
    WITH Tally AS(SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr FROM master..spt_values AS v1 CROSS APPLY master..spt_values AS v2)
    INSERT INTO TestTbl2(SomeJson)
    SELECT 
    N'{
        "Root": {
            "Products": {
                "ProductDescription": [
                    {
                        "Features": {
                            "Maintenance": "' + CAST(Nmbr AS NVARCHAR(10)) + ' year parts and labor extended maintenance is available",
                            "Warranty": "1 year parts and labor"
                        },
                        "ProductID": "' + CAST(Nmbr AS NVARCHAR(10)) + '",
                        "ProductName": "Road Bike"
                    },
                    {
                        "Features": {
                            "Maintenance": "' + CAST(Nmbr + 1 AS NVARCHAR(10)) + ' blah",
                            "Warranty": "1 year parts and labor"
                        },
                        "ProductID": "' + CAST(Nmbr + 1 AS NVARCHAR(10)) + '",
                        "ProductName": "Cross Bike"
                    }
                ]
            }
        }
    }'
    FROM Tally;
    GO
    
    --Do some initial action to avoid first-call-bias
    INSERT INTO Target1(SomeString)
    SELECT SomeXml.value('(/Root/Products/ProductDescription/Features/Maintenance/text())[1]', 'nvarchar(4000)')
    FROM TestTbl1;
    INSERT INTO Target2(SomeString)
    SELECT JSON_VALUE(SomeJson, N'$.Root.Products.ProductDescription[0].Features.Maintenance')
    FROM TestTbl2;
    GO
    
    --Start the test
    DECLARE @StartDt DATETIME2(7), @EndXml DATETIME2(7), @EndJson DATETIME2(7);
    
    --Read all ProductNames of the second product and insert them to Target1
    SET @StartDt = SYSDATETIME();
    INSERT INTO Target1(SomeString)
    SELECT SomeXml.value('(/Root/Products/ProductDescription/ProductName/text())[2]', 'nvarchar(4000)')
    FROM TestTbl1
    ORDER BY NEWID();
    --remember the time spent
    INSERT INTO Times(Test,Diff)
    SELECT 'xml',DATEDIFF(millisecond,@StartDt,SYSDATETIME());
    
    --Same with JSON into Target2
    SET @StartDt = SYSDATETIME();
    INSERT INTO Target2(SomeString)
    SELECT JSON_VALUE(SomeJson, N'$.Root.Products.ProductDescription[1].ProductName')
    FROM TestTbl2
    ORDER BY NEWID();
    --remember the time spent
    INSERT INTO Times(Test,Diff)
    SELECT 'json',DATEDIFF(millisecond,@StartDt,SYSDATETIME());
    
    GO 10 --do the block above 10 times
    
    --Show the result
    SELECT Test,SUM(Diff) AS SumTime, COUNT(Diff) AS CountTime
    FROM Times
    GROUP BY Test;
    GO
    --clean up
    USE master;
    GO
    DROP DATABASE TestJsonXml;
    GO
    

    The result (SQL Server 2016 Express on an Acer Aspire v17 Nitro Intel i7, 8GB Ram)

    Test    SumTime 
    ------------------
    json    2706    
    xml     1604