sql-serverpivotautodesk

Filtering Autodesk Vault vertical data, getting the newest record for each drawing


Using Microsoft SQL Server Express Edition (64-bit) 10.0.550.0

I'm trying to extract data from an Autodesk Vault server. The SQL involved to get to the required data is too advanced for my current level of knowledge, so I'm trying to lay a puzzle using bits from Google and StackOverflow as pieces. Using this excellent answer I was able to transpose the vertical data into a manageable horizontal format.

The Autodesk Vault database stores information about CAD drawings (among other things). The main vertical table dbo.Property holds information about all the different revisions of each CAD drawing. The problem I'm currently facing is that I'm getting too much data. I just want the data from the latest revision of each CAD drawing.

Here's my SQL so far

select 
    CreateDate,
    EntityID,
    PartNumber,
    CategoryName,
    [Subject],
    Title
from 
(
    select 
        EntityID,
        CreateDate,
        [53] as PartNumber,
        [28] as CategoryName,
        [42] as [Subject],
        [43] as Title
    from 
    (
        select
            p.Value, 
            p.PropertyDefID,
            p.EntityID,
            e.CreateDate
        from dbo.Property as p
        inner join dbo.Entity as e on p.EntityID = e.EntityId
        where p.PropertyDefID in(28, 42, 43, 53)
        and e.EntityClassID = 8
    ) t1
    pivot 
    (
        max(Value)
        for PropertyDefID in([28], [42], [43], [53])
    ) t2
) t3
where PartNumber is not null
and PartNumber != ''
and CategoryName = 'Drawing'
-- (1) additional condition
order by PartNumber, CreateDate desc

Where dbo.Property.Value is of sql_variant datatype. The query above results in a data set similar to this:

CreateDate | EntityID | PartNumber | CategoryName | Subject | Title 
---------------------------------------------------------------------
2016-01-01 |    59046 |      10001 | Drawing      | Xxxxx   | Yyyyy
2016-05-01 |    60137 |      10001 | Drawing      | Xxxxx   | Yyyyy
2016-08-01 |    62518 |      10001 | Drawing      | Xxxx    | Yyyyyy
2016-12-16 |    63007 |      10001 | Drawing      | Xxxxxx  | Yyyyyy
2016-01-01 |    45776 |      10002 | Drawing      | Zzzzz   | NULL  
2016-11-01 |    65011 |      10002 | Drawing      | Zzzzzz  | NULL  
...
(about 23000 rows)

The problem that I have is that I'm getting all revisions for each drawing. In the example above I only want the latest revision for PartNumber=10001 dated '2016-12-16' etc.

I have also looked at this answer on how to group and select rows where one of the columns has a max value, but I just can't seem to figure out how to combine the two. I tried adding the following snippet to the commented line in the above query, but it fails on many different levels.

and (PartNumber, CreateDate) in 
(
    select PartNumber, max(CreateDate)
    from t3
    group by PartNumber 
)

The reason I'm tagging this question "pivot", although the pivoting is already done, is that I suspect that the pivoting is what's causing me trouble. I just haven't been able to wrap my head around this pivoting stuff yet, and my SQL optimization skills are seriously lacking. Maybe the filtering should be done at an inner level?


Solution

  • Drawing inspiration from the comment provided by @Strawberry, I kept working and tweaking until I got something that seems to work. I had to use a PIVOT inside a PIVOT for it all to work.

    Edit: At first I used views, but then the prerequisites changed as I had to work with a read-only database user. Fortunately, I was still allowed to create temporary tables.

    This is the final result.

    if object_id('tempdb.dbo.#Properties', 'U') is not null
        drop table #Properties
    
    create table #Properties 
    (
        PartNumber  nvarchar(max),
        [Subject]   nvarchar(max),
        Title       nvarchar(max),
        CreateDate  datetime
    )
    
    insert into #Properties
    (
        PartNumber,
        [Subject],
        Title,
        CreateDate
    )
    select 
        convert(nvarchar(max), PartNumber),
        convert(nvarchar(max), [Subject]), 
        convert(nvarchar(max), Title),
        convert(datetime, CreateDate)
    from 
    (
        select 
            EntityID,
            CreateDate,
            [53] as PartNumber,
            [42] as [Subject],
            [43] as Title
        from 
        (
            select
                p.Value, 
                p.PropertyDefID,
                p.EntityID,
                e.CreateDate
            from dbo.Property as p
            inner join dbo.Entity as e on p.EntityID = e.EntityId
            where p.PropertyDefID in (42, 43, 53)
            and e.EntityClassID = 8
            and p.EntityID in
            (
                select 
                    max(EntityID) as MaxEntityID
                from 
                (
                    select 
                        EntityID,
                        [28] as CategoryName,
                        [53] as PartNumber
                    from
                    (
                        select
                            p.Value,
                            p.EntityID,
                            p.PropertyDefID
                        from dbo.Property as p
                        inner join dbo.Entity as e on p.EntityID = e.EntityId
                        where p.PropertyDefID in (28, 53)
                        and e.EntityClassID = 8 -- FileIteration
                    ) as t1
                    pivot
                    (
                        max(Value)
                        for PropertyDefID in ([28], [53])
                    ) as t2
                ) as t3
                where CategoryName = 'Drawing'
                group by PartNumber
            )
        ) as t4
        pivot 
        (
            max(Value)
            for PropertyDefID in ([42], [43], [53])
        ) as t5
    ) as t6
    where PartNumber is not null
    and PartNumber != ''
    order by PartNumber
    
    select * from #Properties;
    -- search conditions goes here
    

    I had to change the suggested join to a where x in(y) because the join was insanely slow (I terminated the query after four minutes). Now the resulting data set (which takes ~2 seconds to produce) looks promising:

    PartNumber | Subject | Title  | CreateDate       | ...
    -----------------------------------------------------------------------
    100000     | Xxxxxx  | Yyyyyy | 2015-08-17 09-10 | ...
    100001     | Zzzzzz  | NULL   | 2015-09-02 15-23 | ...
    ...
    (about 8900 rows)
    

    No more old revisions in the set.