I am searching for a solution in the Javascript API for CosmosDB, where you can perform an INNER/OUTER JOIN between two document collections.
I have been unsuccessful.
From my understanding, Javascript Stored Procedures run within a collection, and cannot access/reference data in another collection.
If the above is true, where does this leave our application's datasource that has been designed in a relational way? If Business requires a immediate query, to collect pthe following data: All agreements/contracts that has been migrated to a new product offering, within a specific region, for a given time frame. How would I go about this query, if there are about 5 collections containing all infromation related to this query?
Any guidance?
UPDATE
Customer
{
"id": "d02e6668-ce24-455d-b241-32835bb2dcb5",
"Name": "Test User One",
"Surname": "Test"
}
Agreement
{
"id": "ee1094bd-16f4-45ec-9f5e-7ecd91d4e729",
"CustomerId": "d02e6668-ce24-455d-b241-32835bb2dcb5"
"RetailProductVersionInstance":
[
{
"id": "8ce31e7c-7b1a-4221-89a3-449ae4fd6622",
"RetailProductVersionId": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"AgreementInstanceUser": {
"FirstName": "Luke",
"LastName": "Pothier",
"AgreementUserTypeId": ""
},
"AgreementInstanceMSISDN": {
"IsoCountryDialingCode": null,
"PhoneNumber": "0839263922",
"NetworkOperatorId": "30303728-9983-47f9-a494-1de853d66254"
},
"RetailProductVersionInstanceState": "IN USE",
"IsPrimaryRetailProduct": true,
"RetailProductVersionInstancePhysicalItems": [
{
"id": "f8090aba-f06b-4233-9f9e-eb2567a20afe",
"PhysicalItemId": "75f64ab3-81d2-f600-6acb-d37da216846f",
"RetailProductVersionInstancePhysicalItemNumbers": [
{
"id": "9905058b-8369-4a64-b9a5-e17e28750fba",
"PhysicalItemNumberTypeId": "39226b5a-429b-4634-bbce-2213974e5bab",
"PhysicalItemNumberValue": "KJDS959405"
},
{
"id": "1fe09dd2-fb8a-49b3-99e6-8c51df10adb1",
"PhysicalItemNumberTypeId": "960a1750-64be-4333-9a7f-c8da419d670a",
"PhysicalItemNumberValue": "DJDJ94943"
}
],
"RetailProductVersionInstancePhysicalItemState": "IN USE",
"DateCreatedUtc": "2018-11-21T13:55:00Z",
"DateUpdatedUtc": "2020-11-21T13:55:00Z"
}
]
}
]
}
RetailProduct
{
"id": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"FriendlyName": "Data-Package 100GB",
"WholeSaleProductId": "d054dae5-173d-478b-bb0e-7516e6a24476"
}
WholeSaleProduct:
{
"id": "d054dae5-173d-478b-bb0e-7516e6a24476",
"ProductName": "Data 100",
"ProviderLiabilities": []
}
Above, I have added some sample documentation.
Relationships:
How, would I write a Javascript Stored Procedure, in CosmosDB, to perform joins between these 4 collections?
Short answer is that you cannot perform joins between different collections via SQL in Cosmos DB.
Generally, the solution to this type of question is multiple queries or different schema. In your scenario, if you can denormalize your schema into one collection without duplicating data, then it is easy.
If you provide your schemas, it'd be possible to provide a more comprehensive answer.
-- Edit 1 --
Stored Procedures are only good candidates for operations that require multiple operations on the same collection + partition key. This makes them good for bulk insert/delete/update, transactions (which need at least a read and a write), and a few other things. They aren't good for CPU intensive things, but rather things that would normally be IO bound by network latency. They aren't possible to use for cross partition or cross collection scenarios. In those cases, you must perform the operations exclusively from the remote client.
In your case, it's a fairly straightforward 2 + 2N
separate reads, where N
is the number of products. You need to read the agreement first. Then you can look up the customer and the product records in parallel, and then you can look up the wholesale record last, so you should have a latency of 3s + C
, where s
is the average duration of a given read request and C
is some constant CPU time to perform the join/issue the request/etc.
It's worth considering whether you can consolidate RetailProduct and WholeSale product into a single record where Wholesale contains all the RetailProducts in an array, or as separate documents, partitioned by the wholesale id, with a well known id that contained the Wholesale product info in a separate document. That would reduce your latency by 1 third. If you go with the partitioning by wholesale id idea, you could write 1 query for any records that shared a wholesale id, so you'd get 2 + log(N)
reads, but the same effective latency. For that strategy, you'd store a composite index of "wholesaleid+productid" in the agreement. One issue to worry about is that it duplicates the wholesale+product relationship, but as long as that relationship doesn't change, I don't think there is anything to worry about and it provides a good optimization for info lookup.