sqlsql-server

Get an hour record for each group of records


I need to get rows for each hour in the day on each type of part that has any data. I have tried several attempts but seem to only be able to get the HOUR column to show each hour but it will not show the location. I have the data I need in a query and I have tried to join it to a table of Hours by the Hour Time stamp of the records to get each group of parts to have a row for each hour in the day but still nothing seems to work

I have tried the following to get the Hours columns to show for each part regardless of data in the hour

DECLARE @Date DATETIME = GETDATE()
--Gets an hour field for each hour in day 12AM - 11PM
;WITH Dates AS
(
    SELECT FORMAT(DATEADD(HOUR, -1, @Date),'h tt') [Hour], 
           DATEADD(HOUR,-1,@Date) [Date], 
           1 Num
    UNION ALL
    SELECT FORMAT(DATEADD(HOUR, -1, [Date]),'h tt'), 
           DATEADD(HOUR,-1,[Date]), 
           Num+1
    FROM Dates
    WHERE Num <= 23
)
, PickedData AS
(SELECT [Hour], x.group_area, x.HC, x.HCSum, x.picked_user, x.pickedDateTime, x.rate, x.reaches, x.ShiftOutput
    FROM Dates
    LEFT OUTER JOIN (PICKEDDATA that includes an hour stamp with other columns) x
    ON x.hourTimestamp = [Dates].Hour
 )

SELECT group_area, PD.HC, PD.reaches, PD.Hour
FROM PickedData PD
JOIN Dates D
    ON D.[Hour] = PD.[Hour]
ORDER BY PD.pickedDateTime

What I am getting:

part Hour
NULL 1 PM
NULL 11 AM
NULL 8 AM
NULL 7 AM
NULL 6 AM
NULL 5 AM
NULL 4 AM
NULL 3 AM
NULL 2 AM
NULL 1 AM
NULL 12 AM
NULL 11 PM
NULL 10 PM
NULL 9 PM
NULL 8 PM
NULL 7 PM
NULL 6 PM
NULL 5 PM
NULL 4 PM
NULL 2 PM
GOATPEN 9 AM
LOSTT 10 AM
FILK 3 PM

For example what I would want to get for 6AM to 5PM

PARTS

part Column1 Column2 Column3 Column4 Col5 Hour
Goat pen 11 8 2 1500 6 AM
Goat pen 8 9 1 500 17 7 AM
Goat pen 8 5 5 4 17 8 AM
Goat pen 0 1 53 4 19 9 AM
Goat pen 0 2 6 5 19 10 AM
Goat pen 0 0 0 11 AM
Goat pen 12 PM
Goat pen 1 PM
Goat pen 2 PM
Goat pen 3 PM
Goat pen 4 PM
Goat pen 5 PM
Lost tal 9 4 2 200 0 6 AM
Lost tal 8 9 1 29 90 7 AM
Lost tal 8 5 5 4 90 8 AM
Lost tal 0 4 53 4 19 9 AM
Lost tal 0 2 6 5 19 10 AM
Lost tal 0 0 0 11 AM
Lost tal 12 PM
Lost tal 1 PM
Lost tal 2 PM
Lost tal 3 PM
Lost tal 2 2 2 4 PM
Lost tal 5 PM

Solution

  • The general solution to this kind of problem is to start by making a projection of all possible Part/Hour combinations. A CROSS JOIN is commonly used, and the projection can be described in a CTE. Then take that projection and LEFT JOIN back to the raw data to fill in the details.

    So it will be something like this:

    WITH Hours As (
       --  <short query to get all possible hours>
       -- (based on exmaple returns 12 rows with values from '6 AM' to '5 PM')
    )
    , Parts As (
       --  <short query to get key for all possible parts>
       -- (based on example returns 2 rows, for LOSTT, and GOAT PEN)
    )
    , PartHours As (
        SELECTS Parts.Key, Hours.Hour
        FROM Hours
        CROSS JOIN Parts
    )
    , PickedData As (
       -- base data here
             -- needs to include a column to match for both the Part and the Hour values 
             -- ideally there will only be one row for each part/hour pair
             -- you may find it useful to do this earlier if you need it to produce the Parts our Hours pieces
    )
    SELECT ph.Part,
        <pd columns>,
        ph.Hour
    FROM PartHours ph
    LEFT JOIN PickedData pd ON pd.Hour = ph.Hour AND pd.Part = ph.Part
    ORDER BY ph.Part, ph.Hour
    

    If more than one row from the source data could match, you can use aggregation, an APPLY lateral join, or windowing to roll them up.


    Finally, I see a few expressions like this:

     CONVERT(DATE,pck_dtl2.picked_datetime) = FORMAT (GETDATE(), 'yyyy-MM-dd')
    

    Assuming picked_datetime is a reasonable data type (and, btw, varchar is not reasonable for columns with datetime in the name), the expression can be many times more efficient like this:

    pck_dtl2.picked_datetime >= cast(getdate() as date)
        and pck_dtl2.picked_datetime < cast(dateadd(day, 1, getdate()) as date)
    

    There are several reasons why that is both faster and more accurate. But the most important thing to understand is, thanks to cultural/internationalization issues, converting date (or number) types to and from strings types is not the quick or simple operation we often expect. Instead, it tends to be slow and error prone: something to avoid. That is, what you have is about the worst possible way to do this is in common use.

    I further see the next expression looks like this:

    DATEPART(HOUR,pck_dtl2.picked_datetime) >= 6 --@StartTime  
    

    Which we can roll into the same check as the prior suggestion like this:

    pck_dtl2.picked_datetime >= dateadd(hour, 6, cast(cast(getdate() as date) as datetime))
        and pck_dtl2.picked_datetime < cast(dateadd(day, 1, getdate()) as date)
    

    The most recent SQL Server 2022 can also simplify this via the DATETRUNC() function.

    As the query is currently written, you have to convert these values to string for every row in the source set, even rows you will not need (which, again, is even slower than you'd expect), and you will not be able to use indexes for the comparisons, which cuts to the core of database performance. The code I suggest here fixes BOTH of those issues.

    The tl;dr version is, whenever you have a comparison with a column on one side and a common value like getdate() or 6 on the other, but you need to make an adjustment so the sides match, you want to find a way to make the adjustment to the common value, rather than the column. This reduces the number of computations from one per row to one TOTAL, and preserves your ability to use indexes with the column.