azure-cosmosdbazure-cosmosdb-sqlapi

How to formulate query in CosmosDb with pairwise constraints?


Within a container we store some items with the following base and derived classes (simplified):

public abstract class MyItemBase
{
    public Guid Id { get; set; }
    public string PartitionKey { get; set; } = "some-customer-id-room";
    public Guid RevisionsId { get; set; }
    public int Revision { get; set; }
    public DateTime CreatedAt { get; set; }
    public Guid CreatedBy { get; set; }
    public bool IsDeleted { get; set; }
}

public class MyRoom : MyItemBase
{
    public string Name { get; set; }
    public int Capacity { get; set; }
}

Do monitor the lifetime of (in this case) a room, we create for every change a new item within Cosmos. To know, which items belong together to a lifetime and in which order, all items of one room share the same RevisionsId and the Revision will be increased by one. Also is it possible to delete a room and revive it by switching the IsDeleted flag.

Here a possible lifeline of a single room:

var sharedRevisionsId = Guid.NewGuid();
var roomChanges = new List<MyRoom>
{
    new() { Id = Guid.NewGuid(), RevisionsId = sharedRevisionsId, Revision = 1, CreatedAt = new DateTime(2024, 2, 16, 17, 0, 0), CreatedBy = Guid.NewGuid(), IsDeleted = false, Name = "Main Hall", Capacity = 100 },
    new() { Id = Guid.NewGuid(), RevisionsId = sharedRevisionsId, Revision = 2, CreatedAt = new DateTime(2024, 2, 17, 13, 0, 0), CreatedBy = Guid.NewGuid(), IsDeleted = false, Name = "Main Hall", Capacity = 90 },
    new() { Id = Guid.NewGuid(), RevisionsId = sharedRevisionsId, Revision = 3, CreatedAt = new DateTime(2024, 2, 18, 11, 0, 0), CreatedBy = Guid.NewGuid(), IsDeleted = true, Name = "Main Hall", Capacity = 90 },
    new() { Id = Guid.NewGuid(), RevisionsId = sharedRevisionsId, Revision = 4, CreatedAt = new DateTime(2024, 2, 19, 19, 0, 0), CreatedBy = Guid.NewGuid(), IsDeleted = false, Name = "Main Hall", Capacity = 90 },
    new() { Id = Guid.NewGuid(), RevisionsId = sharedRevisionsId, Revision = 5, CreatedAt = new DateTime(2024, 2, 20, 20, 0, 0), CreatedBy = Guid.NewGuid(), IsDeleted = true, Name = "Main Hall", Capacity = 90 },
};

Within our partition key we have multiple rooms like these and some are alive and some are deleted in their highest revision. To support an undelete functionality in such an environment, we have to find all currently deleted items within that partition key. To do so, I came up with the following query:

SELECT
  root.revisionsId,
  MAX(root.revision) as revision
FROM
  root
WHERE
  root.isDeleted
GROUP BY
  root.revisionsId

This query will return all items that have the IsDeleted flag set to true somewhere in their lifeline, but this must not be the highest revision, cause potentially it has already been revived. Due to this fact, we send a second query, where we take all the candidates out of this first query and ask for their highest revision regardless of the deletion state:

SELECT
  root.revisionsId,
  MAX(root.revision) as revision
FROM
  root
WHERE
  ARRAY_CONTAINS(@revisionsIds, root.revisionsId)
GROUP BY
  root.revisionsId

By using simple LINQ-to-object on the client side, we can then find out, which items are currently in a deletion state:

var deletedItems = deletedCandidates
    .Join(highestRevisions,
        deleted => deleted.RevisionsId,
        highest => highest.RevisionsId,
        (deleted, highest) => (deleted, highest))
    .Where(pair => pair.deleted.Revision == pair.highest.Revision)
    .Select(pair => pair.highest)
    .ToList();

Remark: If you know about anyway on how we can accomplish the above finding in one query directly in Cosmos I would be glad to hear it.

Till this point everything looks quite okay, but now comes my problem: The list of deleted items contains whole objects, but due to the used queries, their properties are only partially filled. The only properties that are meaningful set are RevisionsId and Revision, but in a next step I need at least the Id of these objects or (even better) their whole content.

What we have at this point is a list of elements and we know from each element two things, their RevisionsId, the highest Revision and that IsDeleted is true.

How can I formulate a query against Cosmos, that returns all elements, where these multiple pairs of two properties match?

Some not working ideas are

SELECT
  *
FROM
  root
WHERE
  ARRAY_CONTAINS(
    [{"revisionsId":"abc...", "revision":5}, {"revisionsId":"123...", "revision":13}],
    { root.revisionsId, root.revision }
  )
SELECT
  *
FROM
  root
WHERE
  ARRAY_CONTAINS(
    ["abc...", "123..."],
    root.revisionsId
  )
ORDER BY root.revision desc

So any idea on how to search for multiple elements (in this case rooms) where multiple constraints (in this case RevisionsId and Revision) must be met for each item individually? This question sounds so simple, but I really can't find a good way by the given model.


Solution

  • According to this documentation the syntax for matching the object is like below.

    ARRAY_CONTAINS([<list_of_object>], <object_to_check>)
    

    What you have given is

    ARRAY_CONTAINS(
        [{"revisionsId":"abc...", "revision":5}, {"revisionsId":"123...", "revision":13}],
        { root.revisionsId, root.revision }
      )
    

    { root.revisionsId, root.revision } is not a valid object to check.

    Here, is the correct way of giving.

    ARRAY_CONTAINS(
        [{"revisionsId":"abc...", "revision":5}, {"revisionsId":"123...", "revision":13}],
        { "revisionsId":root.revisionsId, "revision":root.revision }
      )
    

    Here is the output on sample data.

    query

    SELECT
      root.revisionsId,
      root.revision
    FROM
      root
    WHERE
      ARRAY_CONTAINS(
        [{"revisionsId":"e326de52-1a84-42d9-bc01-7613a42dc96f", "revision":1}, {"revisionsId":"e326de52-1a84-42d9-bc01-7613a42dc96f", "revision":2}],
        { "revisionsId":root.revisionsId, "revision":root.revision }
      )
    

    Output:

    [
        {
            "revisionsId": "e326de52-1a84-42d9-bc01-7613a42dc96f",
            "revision": 1
        },
        {
            "revisionsId": "e326de52-1a84-42d9-bc01-7613a42dc96f",
            "revision": 2
        }
    ]