arraysyqlvespa

Vespa search query (on array) gives hits even after removing the element from array


I am querying vespa to check if a particular userId is present in an array of userIds. http://localhost:8080/search/?yql=select * from sources doc where userIds contains 'user1';

Search Definition:

search doc {
    document doc {
        field userIds type array<string> {
            indexing : index | summary
        }
        field doctype type string {
            indexing : summary
        }
}

Sample Response:

{
"children": [{
        "id": "id:doc:doc::0",
        "fields": {
            "userIds": ["user1", "user2", "user3"],
            "doctype": "type1"
        }
    },
    {
        "id": "id:doc:doc::1",
        "fields": {
            "userIds": ["user1", "user3"],
            "doctype": "type2"
        }
    }
]}

When I remove an element ("user1") from the array, I am still getting the hits in response, even when it is being succesfully removed from the array.

Update API:

PUT http://localhost:8080/document/v1/doc/doc/docid/0
{
"update": "id:doc:doc::0",
"fields": {
    "userIds[0]": {
        "remove": 0
    }
}
}

GET http://localhost:8080/document/v1/doc/doc/docid/0
{"fields": {
        "userIds": ["user2", "user3"],
        "doctype": "type1"
    }
}

Even after the above userIds field is updated, the same query

http://localhost:8080/search/?yql=select * from sources doc where userIds contains 'user1';

gives the response,

{"children": [{
    "id": "id:doc:doc::0",
    "fields": {
        "userIds": ["user2", "user3"],
        "doctype": "type1"
    }
},
{
    "id": "id:doc:doc::1",
    "fields": {
        "userIds": ["user1", "user3"],
        "doctype": "type2"
    }
}]}

In the above respone, there is no "user1" in the userIds array of "id:doc:doc::0". But, still the query gives it as a hit. Please help.

Edit-1: Note that, when I assign a new array with the element removed, it works correctly

PUT http://localhost:8080/document/v1/doc/doc/docid/0
{
"update": "id:doc:doc::0",
"fields": {
    "userIds": {
        "assign": ["user2", "user3"]
    }
}
}

The above Update API gives the expected hits in response, for the query. But, as I am calling the Update API from within a Searcher, I am getting a huge response time lag. (To create a new Array Object and assign to the userIds field, as the array grows to a big size of about 50000)

Please, tell me why the remove option is failing. I really need to improve the query performance, by using it.

Edit-2: The following syntax, mentioning the element to be removed for updating the array works correctly. Thanks to @Jo's comment.

PUT http://localhost:8080/document/v1/doc/doc/docid/0
{
"update": "id:doc:doc::0",
"fields": {
    "userIds": {
        "remove": ["user1"]
      }
}
}

Note that the above syntax removes all the occurrences of the element specified.


Solution

  • (Summary of the discussion above to provide an answer for the record)

    Removing array elements by index is not supported, use remove by value instead:

    {
    "update": "id:doc:doc::0",
        "fields": {
            "userIds": {
                "remove": ["user1"]
              }
      }
    }