sql-serverdependencies

Extracting join information from views/stored procs in SQL Server


I have a very large code base to go through and document. At the moment I'm documenting joins by finding them in stored procedures / views and writing down the details. It's extremely time-consuming.

I'm pretty sure there isn't but just in case, is there a way of easily getting join information from within SQL modules (views/stored products/functions/whatever).

I'm aware of sys.dm_sql_referenced_entities but this tells you what's used, not how.

The alternative is to write a SQL parser and all that goes with it, but I don't have time to do that, nor to take an off-the-shelf one and modify that.


Solution

  • You can query the system tables of the database for object definitions.

    SELECT DISTINCT [obj].[name] AS [Object_Name], [obj].[type], [obj].[type_desc], [mod].[definition]
    FROM sys.sql_modules AS [mod]
    INNER JOIN sys.objects AS [obj]
             ON [mod].[object_id] = [obj].[object_id]
    WHERE [mod].definition Like '%JOIN%'
    AND [obj].[type] = 'P' OR [obj].[type] = 'V'
    

    This searches for simply the written word JOIN, if this for instance is used in a comment, is in a columnname or anything else it will also find it. But this should be a good start to know which procedures to check (instead of going through each one manually).

    In the above query

    [obj].[type] = 'P' OR [obj].[type] = 'V'

    Makes it only look at (SQL) Stored procedures and Views. You could omit this for more potential definitions. Msdn Link for Sys objects

    Now to try and get join information from these definitions. It's going to be finnicky, and if someone knows a better method by all means.

    But what you could do is scan the procedure definitions. In my DB i have the following procedure for illustrative purposes:

    CREATE PROCEDURE dbo.getFooBar AS
    BEGIN
        SELECT * 
        FROM dbo.ADDRESSES AS ADR
        INNER JOIN dbo.CLIENTS AS CLT
            ON ADR.ClientId = CLT.Id
        WHERE ADR.City IS NOT NULL
    
        SELECT *
        FROM dbo.Clients AS CLT
        LEFT JOIN dbo.Users AS USR
            ON CLT.Id = USR.ClientId
        WHERE USR.Email IS NOT NULL
    END
    

    Now this is a fairly standard query, where you have a FROM block, which is followed by JOINS, and then a WHERE. We can use this knowledge to scan the definition column in our system table.

    Because this is SQL Server 2014 (or pre 2016, where the SQL Server function String_Split does not exist yet) we tweak our own for this purpose. Base taken from here and then modified.

    IF OBJECT_ID('[dbo].[SPLIT_STRING]','IF') IS NULL BEGIN
        EXEC ('CREATE FUNCTION [dbo].[SPLIT_STRING] () RETURNS TABLE AS RETURN SELECT 1 X') 
    END
    GO
    ALTER FUNCTION [dbo].[SPLIT_STRING]
    (
        @string    nvarchar(MAX), 
        @separator nvarchar(MAX)
    )
    RETURNS TABLE WITH SCHEMABINDING 
    AS RETURN
       WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
            Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
            T(N) AS (SELECT TOP(ISNULL(LEN(@string),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 N FROM Y),
            Delim(Pos) AS (SELECT t.N FROM T WHERE (SUBSTRING(@string, t.N, LEN(@separator+'x')-1) LIKE @separator OR t.N = 0)),
            Separated(value) AS (SELECT SUBSTRING(@string, d.Pos, LEAD(d.Pos,1,2147483647) OVER (ORDER BY (SELECT NULL)) - d.Pos - LEN(@separator)) 
                                   FROM Delim d
                                  WHERE @string IS NOT NULL)
           SELECT s.value
             FROM Separated s
            WHERE s.value <> @separator
    GO
    

    This then allows us to scan through the definitions, scan for definitions that contain the word JOIN. Split those on the FROM (should be before JOIN). And then till the WHERE (or any other keyword). And then filter those results again to only look at definitions that once again have the JOIN keyword.

    SELECT [SUBQUERY].[Object_Name], [SUBQUERY].[Type], [split2].[value] AS DefinitionWithJoin FROM
    (SELECT DISTINCT [obj].[name] AS [Object_Name], [obj].[type], [mod].[definition]
    FROM sys.sql_modules AS [mod]
    INNER JOIN sys.objects AS [obj]
             ON [mod].[object_id] = [obj].[object_id]
    WHERE [mod].definition Like '%JOIN%'
    AND [obj].[type] = 'P' OR [obj].[type] = 'V') AS SUBQUERY
    CROSS APPLY dbo.[SPLIT_STRING](SUBQUERY.[definition],'FROM') AS SPLIT1
    CROSS APPLY dbo.[SPLIT_STRING](SPLIT1.value, 'WHERE') AS SPLIT2
    WHERE SPLIT2.value LIKE '%JOIN%'
    

    Essentially dissecting the object definition for phrases. You could change the statement in the SPLIT2 Cross Apply to be something else. Note that this is a decent performance hog, due to the various string splits, scanning text definitions.

    But in my results I will get:

    objName Type DefintionWithJoin
    getFooBar P FROM dbo.ADDRESSES AS ADR INNER JOIN dbo.CLIENTS AS CLT ON ADR.ClientId = CLT.Id
    getFooBar P FROM dbo.Clients AS CLT LEFT JOIN dbo.Users AS USR ON CLT.Id = USR.ClientId

    Now this approach might need some tweaking based on keywords. And the first query that finds all objects that have at least 1 JOIN statements can be used to double check your results. But the above queries work on my local test SQL Server 2014 DB containing ~200 stored procedures.