sqlsql-servert-sqldatabase-design

Why (and how) to split column using master..spt_values?


Subquestioning the answer to question "Split one column into multiple rows" which I re-wrote here as [ 1 ].

What is the (meaning of) Type = 'P' and why to use undocumented master..spt_values for splitting a column? What is the benefit of it?


[ 1 ]

CREATE TABLE dbo.Table1 
(
    Col1        CHAR(1),
    Col2        CHAR(1),
    Col3        CHAR(1),
    Col4        VARCHAR(50)
)
GO

INSERT INTO dbo.Table1 VALUES ('A','B','C','1,2,3')
GO
INSERT INTO dbo.Table1 VALUES ('D','E','F','6,7,8,9')
GO


SELECT
    T.col1, RIGHT(LEFT(T.col4,Number-1),
    CHARINDEX(',',REVERSE(LEFT(','+T.col4,Number-1))))
FROM
    master..spt_values,
    table1 T
WHERE
    Type = 'P' AND Number BETWEEN 1 AND LEN(T.col4)+1 AND
    (SUBSTRING(T.col4,Number,1) = ','
    -- OR SUBSTRING(T.col4,Number,1)  = '') --this does not work correctly anyway

Related question:


Solution

  • Purpose

    Why use undocumented master..spt-values

    Sybase, and therefore its bastard son MS SQL, provide various features and functions for the product, that is implemented in system procedures (as opposed to the binaries like sqlserver, which are started as a service). These system procedures procedures are written in SQL code and named sp_%. Except for some secret internals, they have the same limitations and needs as any other SQL code. They are part of the Sybase ASE or SQL Server product. As such, they are not required to document it; and the internal bits cannot be reasonably labelled as "undocumented".

    master..spt_values contains all the various bits and pieces that the said system procedures need, in an SQL table, to produce the various reports. The sp means system procedure; spt means tables for system procedures; and of course values is the content.

    Lookup Tables

    What is the (meaning of) Type = 'P'

    People often describe spt_values as "de-normalised", but that is the incorrect term. The correct term is folded, or packed. It is 26 or so logical Lookup tables, each beautifully Normalised, folded into one physical table, with a Type column to differentiate the logical tables.

    Now in a normal database, that would be a gross error (just look at the answers for "one lookup table or many"). But in a server catalogue, it is desirable, it replaces 26 physical tables.

    There is therefore just one purpose for spt_values, to contain 26 folded, otherwise separate, Reference tables, and one Projection table.

    Expansion

    The ordinary use of spt_values then, is as an ordinary Lookup or Reference or ENUM table. First, the Lookup values:

    SELECT *                    -- list Genders
        FROM Gender
    

    It is used in the same way that Person has a GenderCode that needs to be expanded (very expanded, these freaky days):

    SELECT  P.*,                -- list Person
            G.Name              -- expand GenderCode to Name
        FROM Person P
        JOIN Gender G
            ON P.GenderCode = G.GenderCode
    

    Eg. sp_lock produces a report of active locks, displaying lock types as string names. But master..syslocks contains lock types as numbers, it does not contain those names; and if it did, it would be a badly denormalised table ! If you execute the query (Sybase ASE code, you will have to convert):

    SELECT *                    -- list LockTypes
        FROM master..spt_values 
        WHERE type = "L"
    

    you will notice 66 LockType numbers and names in the Lookup table. That allows sp_lock to execute simple code like Person::Gender above:

    SELECT  spid,               -- list Active Locks
            DB_NAME(dbid),
            OBJECT_NAME(id, dbid),
            v.name,             -- expand lock name
            page,
            row
    FROM master..syslocks   L,
         master..spt_values LT
    WHERE L.type = LT.number    -- 
    AND   type = "L"            -- LockType Lookup table
    ORDER by 1, 2, 3, 4, 5, 6   -- such that perusal is easy
    

    Projection

    What is the (meaning of) Type = 'P' ?

    What is Projection and how is it used ?

    Say, for example, instead of the active locks produced by the query above, you wanted a list of all 66 LockTypes, showing the number of active locks (or Null). You don't need a cursor, or a WHILE loop. We could Project the LockType Lookup table, through the count of active locks:

    SELECT  LT.name,            -- list LockTypes
            [Count] = (         -- with count
        SELECT COUNT(*)
            FROM master..syslocks
            WHERE type = LT.number
                )
        FROM master..spt_values LT
        WHERE type = "L"
    

    There are several methods, that is just one. Another method is to use a Derived Table instead of the Subquery. But you still need the Projection.

    That is typically what spt_values is used for, either Expansion or Projection. Now that you know it is there, you can use it too. It is safe (in the master database) and used by virtually all the system procedures, which means the system procedures cannot run without it.

    for splitting a column?

    Ah, you do not understand the "Split one CSV column into multiple rows" code.

    What is the benefit of it?

    I think I have answered that. If you did not have it, every system procedure that requires a list of Numbers would have to CREATE a temp table; and INSERT the rows into it; before running its code. Of course, not having to perform those steps, makes the system procedures much faster.

    Now, when you need to perform a Projection, eg. calendar dates in the future, or whatever, you can use spt_values, instead of having to create your own temp table each time (or create your own private permanent table and maintain it).