azureindexingluceneazure-cognitive-searchquerying

Optimise conditional queries in Azure cognitive search


we got a unique scenario while using Azure search for one of the project. So, our clients wanted to respect user's privacy, hence we have a feature where a user can restrict search for any PII data. So, if user has opted for Privacy, we can only search for him/her with UserID else we can search using Name, Phone, City, UserID etc.

JSON where Privacy is opted:

{
"Id": "<Any GUID>",
"Name": "John Smith", //searchable
"Phone": "9987887856", //searchable
"OtherInfo": "some info" //non-searchable
"Address" : {}, //searchable
"Privacy" : "yes", //searchable
"UserId": "XXX1234", //searchable
...
}

JSON where Privacy is not opted:

{
"Id": "<Any GUID>",
"Name": "Tom Smith", //searchable
"Phone": "7997887856", //searchable
"OtherInfo": "some info" //non-searchable
"Address" : {}, //searchable
"Privacy" : "no", //searchable
"UserId": "XXX1234", //searchable
...
}

Now we provide search service to take any searchText as input and fetch all data which matches to it (all searchable fields). With above scenario,

  1. We need to remove those results which has "Privacy" as "yes" if searchText is not matching with UserId
  2. In case searchText is matching with UserId, we will be including it in result.
  3. If "Privacy" is set "no" and searchText matches any searchable field, it will be included in result.

So we have gone with "Lucene Analysers" to check it while querying, resulting in a very long query as shown below. Let us assume searchText = "abc"

((Name: abc OR Phone: abc OR UserId: abc ...) AND Privacy: no) OR 
((UserId: abc ) AND Privacy: yes)

This is done as we show paginated results i.e. bringing data in batches like 1 - 10, 11 - 20 and so on, hence, we get top 10 records in each query with total result count.

Is there any other optimised approach to do so?? Or Azure search service facilitates any internal mechanism for conditional queries?


Solution

  • If I understand your requirement correctly, it can be solved quite easily. You determine which property should be searchable and not in your data model. You don't need to construct a complicated query that repeats the end user input for every property. And you don't need to do any batching or processing of results.

    If searchText is your user's input, you can use this:

    (*searchText* AND Privacy:false)
    

    This will search all searchable fields, but it will only return records that have allowed search in PII data.

    You also have a requirement that allows the users to search for userid in all records regardless of the PII setting for the record. To support this, extend the query to:

    (*searchText* AND Privacy:false) OR (UserId:*searchText*)
    

    This allows users to search all fields in records where Privacy is false, and for all other records it allows search in the UserId only. This query pattern will solve all of your requirements with one optimized query.