javaspring-bootelasticsearchspring-data-jpaspring-data-elasticsearch

Strange behavior of Spring Data Elasticsearch with a Wildcard query using a '?'


I created a Spring Boot application using spring-data-elasticsearch 5.1.1. When I execute my query on the database, I get the correct result, but not with the API, and I can't figure out why.

I have an Elasticsearch database that has an index "users" configured like this:

{
    "mappings": {
        "properties": {
            "id": {
                "type": "long"
            },
            "country": {
                "type": "keyword"
            },
            "firstname": {
                "type": "keyword"
            },
            "lastname": {
                "type": "keyword"
            }
    }
}

When I execute the query: GET http://localhost:9200/_sql?

{
    "query": "SELECT * FROM users WHERE firstname LIKE 'Micka_l' and country = 'FR' LIMIT 100"
}

I get the user Mickael.

And when I modify the query:

{
    "query": "SELECT * FROM users WHERE firstname LIKE 'Mick_l' and country = 'FR' LIMIT 100"
}

I don't get any results because I removed a letter in 'Mick_l'.

By executing the query: GET http://localhost:9200/_sql/translate, I retrieved the generated query to use in my Spring Boot Service:

    public List<UserDTO> searchUsers(SearchDTO searchDTO) {
        // Equals 'FR'
        String country = searchDTO.getCountry();
        // Equals 'Micka?l' ou 'Mick?l'
        String firstname = searchDTO.getFirstname();

        StringQuery queryString = new StringQuery("{\"bool\": {\"must\": [{\"term\": {\"country\": " +
                "{\"value\": \"" + country + "\"}}},{\"wildcard\": {\"firstname\": {\"wildcard\": \"" + firstname + "\"," +
                "\"boost\": 1.0}}}],\"boost\": 1.0}}");

        SearchHits<User> searchHits = searchOperations.search(queryString, User.class);

        List<User> userList = searchHits.getSearchHits().stream().map(SearchHit::getContent).collect(Collectors.toList());

        return userList.stream().map((user) -> UserMapper.MAPPER.mapToDTO(user)).collect(Collectors.toList());
    }

}

This query allows me to search for users by their first name like this:

POST http://localhost:8080/api/user/search

{
    "country": "FRA",
    "firstname": "Micka_l"
}

Returns: Mickael

But:

{
    "country": "FRA",
    "firstname": "Mick_l"
}

Also returns: Mickael, whereas I need the search to be case-sensitive. Why does the "?" behave like a "*"?

I have tried several approaches with JPA:

List<User> findByCountryAndFirstnameLike(country, firstname)

And also with @Query:

@Query("{\"bool\": {\"must\": [{\"term\": {\"country\": \"{\"value\": \" ?0 \"}}},{\"wildcard\": {\"firstname\": {\"wildcard\": \" ?1 \","\"boost\": 1.0}}}],\"boost\": 1.0}}")
List<User> findByCountryAndFirstnameLike(country, firstname)

The behavior is the same.

Have you ever tested this behavior?

I tried one last approach with NativeSearchQueryBuilder(), but I can't finish the configuration:

    public List<UserDTO> searchUsers(SearchDTO searchDTO) {

        String country = searchDTO.getCountry();
        String firstname = searchDTO.getFirstname();

        TermQuery termQuery = new TermQuery.Builder().field("country").value(country).build();
        WildcardQuery wildcardQuery = new WildcardQuery.Builder().value(firstname).caseInsensitive(true).boost(1.0F).build();

// Doesn't Work:
        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(new BoolQuery.Builder()
                        .must(t -> t.term(termQuery))
                        .must(w -> w.wildcard(wildcardQuery))
                ).build();


        SearchHits<User> searchHits = searchOperations.search(searchQuery, User.class);

        List<User> userList = searchHits.getSearchHits().stream().map(SearchHit::getContent).collect(Collectors.toList());

        return userList.stream().map((user) -> UserMapper.MAPPER.mapToDTO(user)).collect(Collectors.toList());
    }

}

I'm not sure if this approach can resolve the unexpected behavior when using a "?".


Solution

  • First of all, your first approach to generating elasticsearch queries is opening you for code injection attacks. I hope you are not doing something like this in production code. Anyway, there is a lot in your question - SQL, JQuery, Spring Data, it seems to be all over the place. So let's start with the backend. In elasticsearch, everything works as expected:

    DELETE test
    
    PUT test
    {
      "mappings": {
        "properties": {
          "id": {
            "type": "long"
          },
          "country": {
            "type": "keyword"
          },
          "firstname": {
            "type": "keyword"
          },
          "lastname": {
            "type": "keyword"
          }
        }
      }
    }
    
    POST test/_bulk?refresh
    {"index":{}}
    {"firstname":"Mickael", "lastname": "Carreira", "country": "PT"}
    {"index":{}}
    {"firstname":"Mickael", "lastname": "Landreau", "country": "FR"}
    
    // Returns one record
    GET test/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "country": {
                  "value": "FR"
                }
              }
            },
            {
              "wildcard": {
                "firstname": {
                  "wildcard": "Micka?l",
                  "boost": 1
                }
              }
            }
          ],
          "boost": 1
        }
      }
    }
    
    // Returns no records
    GET test/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "country": {
                  "value": "FR"
                }
              }
            },
            {
              "wildcard": {
                "firstname": {
                  "wildcard": "Mick?l",
                  "boost": 1
                }
              }
            }
          ],
          "boost": 1
        }
      }
    }
    

    Let's now figure out which query your code generates and which part doesn't work.

    BTW, "boost": 1 doesn't do anything so we can simplify it to

    // Returns one record
    GET test/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "country": {
                  "value": "FR"
                }
              }
            },
            {
              "wildcard": {
                "firstname": {
                  "wildcard": "Micka?l"
                }
              }
            }
          ]
        }
      }
    }