I'm looking for a way to query nested arrays in ArangoDB.
The JSON structure I have is:
{
"uid": "bykwwla4prqi",
"category": "party",
"notBefore": "2016-04-19T08:43:35.388+01:00",
"notAfter": "9999-12-31T23:59:59.999+01:00",
"version": 1.0,
"aspects": [
"participant"
],
"description": [
{ "value": "User Homer Simpson, main actor in 'The Simpsons'", "lang": "en"}
],
"properties": [
{
"property": [
"urn:project:domain:attribute:surname"
],
"values": [
"Simpson"
]
},
{
"property": [
"urn:project:domain:attribute:givennames"
],
"values": [
"Homer",
"Jay"
]
}
]
}
I tried to use a query like the following to find all parties having a given name 'Jay':
FOR r IN resource
FILTER "urn:project:domain:attribute:givennames" IN r.properties[*].targets[*]
AND "Jay" IN r.properties[*].values[*]
RETURN r
but unfortunately it does not work - it returns an empty array. If I use a '1' instead of '*' for the properties array it works. But the array of the properties has no fixed structure.
Does anybody have an idea how to solve this?
Thanks a lot!
You can inspect what the filter does using a simple trick: you RETURN
the actual filter condition:
db._query(`FOR r IN resource RETURN r.properties[*].property[*]`).toArray()
[
[
[
"urn:project:domain:attribute:surname"
],
[
"urn:project:domain:attribute:givennames"
]
]
]
which makes it pretty clear whats going on. The IN
operator can only work on one dimensional arrays. You could work around this by using FLATTEN()
to remove the sub layers:
db._query(`FOR r IN resource RETURN FLATTEN(r.properties[*].property[*])`).toArray()
[
[
"urn:project:domain:attribute:surname",
"urn:project:domain:attribute:givennames"
]
]
However, while your documents are valid json (I guess its converted from xml?) you should alter the structure as one would do it in json:
"properties" : {
"urn:project:domain:attribute:surname":[
"Simpson"
],
"urn:project:domain:attribute:givennames": [
"Homer",
"Jay"
]
}
Since the FILTER
combination you specify would also find any other Jay
(not only those found in givennames
) and the usage of FLATTEN()
will prohibit using indices in your filter statement. You don't want to use queries that can't use indices on reasonably sized collections for performance reasons.
In Contrast you can use an array index on givennames
with the above document layout:
db.resource.ensureIndex({type: "hash",
fields:
["properties.urn:project:domain:attribute:givennames[*]"]
})
Now doublecheck the explain for the query:
db._explain("FOR r IN resource FILTER 'Jay' IN " +
"r.properties.`urn:project:domain:attribute:givennames` RETURN r")
...
6 IndexNode 1 - FOR r IN resource /* hash index scan */
...
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash resource false false 100.00 % \
[ `properties.urn:project:domain:attribute:givennames[*]` ] \
("Jay" in r.`properties`.`urn:project:domain:attribute:givennames`)
that its using the index.