azureazure-cosmosdbazure-cosmosdb-sqlapi

Select first item in each GROUP BY in Cosmos


I have below json structure

  json1
  {
   name : Raju
   Location : bangloor
   _ts: 1705505722
  }
  json2
  {
   name : Ramu
   Location : bangloor
   _ts: 1705505725
  }
  json3
  {
   name : Ram
   Location : bangloor
   _ts: 1705505729
  }
  json4
  {
   name : sam
   Location : kerala
   _ts: 1705505830
  }
  json5
  {
   name : samual
   Location : kerala
   _ts: 1705505835
  }
  json6
  {
   name : ramya
   Location : kerala
   _ts: 1705505838
  }

I need to group by each item by location and take the latest json(based on created date with full json content) from each group

Expected Output:

[{
   name : Ram
   Location : bangloor
   _ts: 1705505729
 },
 {
  name : ramya
  Location : kerala
  _ts: 1705505838
}]

I have tried below query but it is not working

SELECT *
from c
JOIN (
 SELECT c1.Location, MAX(c1._ts) AS MaxTimestamp
 FROM c c1
 GROUP BY c1.Location) maxTimestamps
WHERE c.Location = maxTimestamps.Location 
AND c._ts = maxTimestamps.maxTimestamp

I think the non-correlated subqueries are not supported directly in Cosmos DB.Can we able to do this in a single query with self join.


Solution

  • As Martin Smith suggested in comments, you can concatenate location along with timestamp, the same approach is used in the below answer:

    SELECT c.Location,
    MAX(CONCAT(TimestampToDateTime(c._ts*1000),' ',c.name)) 
    AS timeName 
    FROM c WHERE c.Location IN ('kerala', 'bangloor') GROUP  BY c.Location
    

    CONCAT() function is used in the above query to concatenate location with the respected timestamp and MAX() function is used to retrieve maximum value from concatenated value.

    Output:

    [
        {
            "Location": "kerala",
            "timeName": "2024-01-22T05:22:10.0000000Z ramya"
        },
        {
            "Location": "bangloor",
            "timeName": "2024-01-22T05:19:07.0000000Z Ram"
        }
    ]