arraysjsonnestedarangodb

Query nested arrays in ArangoDB


I'm looking for a way to query nested arrays in ArangoDB.

The JSON structure I have is:

{
  "uid": "bykwwla4prqi",
  "category": "party",
  "notBefore": "2016-04-19T08:43:35.388+01:00",
  "notAfter": "9999-12-31T23:59:59.999+01:00",
  "version": 1.0,
  "aspects": [
    "participant"
  ],
  "description": [
    { "value": "User Homer Simpson, main actor in 'The Simpsons'", "lang": "en"}
  ],
  "properties": [
    {
      "property": [
        "urn:project:domain:attribute:surname"
      ],
      "values": [
        "Simpson"
      ]
    },
    {
      "property": [
        "urn:project:domain:attribute:givennames"
      ],
      "values": [
        "Homer",
        "Jay"
      ]
    }
  ]
}

I tried to use a query like the following to find all parties having a given name 'Jay':

FOR r IN resource
FILTER "urn:project:domain:attribute:givennames" IN r.properties[*].targets[*]
   AND "Jay" IN r.properties[*].values[*]
RETURN r

but unfortunately it does not work - it returns an empty array. If I use a '1' instead of '*' for the properties array it works. But the array of the properties has no fixed structure.

Does anybody have an idea how to solve this?

Thanks a lot!


Solution

  • You can inspect what the filter does using a simple trick: you RETURN the actual filter condition:

    db._query(`FOR r IN resource  RETURN r.properties[*].property[*]`).toArray()
    [ 
      [ 
        [ 
          "urn:project:domain:attribute:surname" 
        ], 
        [ 
          "urn:project:domain:attribute:givennames" 
        ] 
      ] 
    ]
    

    which makes it pretty clear whats going on. The IN operator can only work on one dimensional arrays. You could work around this by using FLATTEN() to remove the sub layers:

    db._query(`FOR r IN resource  RETURN FLATTEN(r.properties[*].property[*])`).toArray()
    [ 
      [ 
        "urn:project:domain:attribute:surname", 
        "urn:project:domain:attribute:givennames" 
      ] 
    ]
    

    However, while your documents are valid json (I guess its converted from xml?) you should alter the structure as one would do it in json:

    "properties" : {
      "urn:project:domain:attribute:surname":[
          "Simpson"
      ],
      "urn:project:domain:attribute:givennames": [
          "Homer",
          "Jay"
      ]
    }
    

    Since the FILTER combination you specify would also find any other Jay (not only those found in givennames) and the usage of FLATTEN() will prohibit using indices in your filter statement. You don't want to use queries that can't use indices on reasonably sized collections for performance reasons.

    In Contrast you can use an array index on givennames with the above document layout:

    db.resource.ensureIndex({type: "hash",
                            fields:
          ["properties.urn:project:domain:attribute:givennames[*]"]
        })
    

    Now doublecheck the explain for the query:

    db._explain("FOR r IN resource FILTER 'Jay' IN " + 
                "r.properties.`urn:project:domain:attribute:givennames` RETURN r")
    ...
      6   IndexNode            1     - FOR r IN resource   /* hash index scan */
    ...
    Indexes used:
    By   Type   Collection   Unique   Sparse   Selectivity   Fields           Ranges
      6   hash   resource     false    false       100.00 % \
           [ `properties.urn:project:domain:attribute:givennames[*]` ] \
           ("Jay" in r.`properties`.`urn:project:domain:attribute:givennames`)
    

    that its using the index.