jsonsql-servercross-apply

Count on CROSS APPLY with JSON Data


I have the following DB structure:

ID, Datetime, JsonData

Within JsonData there is a field called party which is "," delimited and another field called Source.

I'm trying to run a simple select where I get the data grouped by (Source, Time)...

Datetime, Source, SourceCount, CountParty

12/12/2021, 1, 4, 7

12/12/2021, 2, 3, 5

I'm trying to run the following SQL, however, I'm struggling to get CountParty. I always tend to get SourceCount. Can anyone please shed some light on what Im doing wrong?

I always end up with

12/12/2021, 1, 4, 4

12/12/2021, 2, 3, 3

Thanks

select
json_value(JsonData,'$.Info.source') as Source
, Party.value
,count(json_value(JsonData,'$.Info.Party'))
,  count(*)

from test
CROSS APPLY 
    STRING_SPLIT(json_value(JsonData,'$.Info.Party'), ',') Party
group by json_value(JsonData,'$.Info.Source'), value
order by [Source]

{ "cID": "CID1","Info": {"Party": "A,B,C","Source" : "1"}}
{ "cID": "CID2","Info": {"Party": "A, C","Source" : "2" }}
{ "cID": "CID3","Info": {"Party": "B, C","Source" : "2" }}
{ "cID": "CID4","Info": {"Party": "B","Source" : "1" }}
{ "cID": "CID5","Info": {"Party": "C,A","Source" : "1" }}
{ "cID": "CID6","Info": {"Party": "A","Source" : "1" }}
{ "cID": "CID7","Info": {"Party": "C","Source" : "2" }}


select


json_value(JsonData,'$.Info.source') as Source

, Party.value

,count(json_value(JsonData,'$.Info.Party'))

,  count(*)


from test

CROSS APPLY 

    STRING_SPLIT(json_value(JsonData,'$.Info.Party'), ',') Party

group by json_value(JsonData,'$.Info.Source'), value

order by [Source]

Solution

  • Your problem is that you because you are grouping by value as well, which refers to the value from STRING_SPLIT. So you get a new group for each individual split party value. You should remove that from the grouping. Instead, you can get the count of Party values in a subquery, then SUM that.

    Note also, that count(json_value(JsonData,'$.Info.Party')) does not count distinct values, it counts how many non-null values there are, which probably wasn't your intention.

    You can also use OPENJSON to pull out all the values at once, instead of using JSON_VALUE again and again.

    SELECT
      t.Datetime,
      j.Source,
      COUNT(*) SourceCount,
      SUM(p.PartyCount) PartyCount
    FROM test t
    CROSS APPLY
      OPENJSON(t.JsonData, '$.Info')
      WITH (
        Party nvarchar(500),
        Source nvarchar(100)
      ) j
    CROSS APPLY (
        SELECT COUNT(*) PartyCount
        FROM STRING_SPLIT(j.Party, ',')
    ) p
    group by
      t.Datetime,
      j.Source
    order by
      j.Source;