Hello everyone I need to write a cypher query for a below scenario.
Given a list of strings, count the number nodes in graph where levenshtein similarity between node name property and strings from the list is more than certain thershold.
I was able to write query if we only have 1 string but I am not sure how to write a query if we have multiple strings ['string 1', 'string 2', 'string 3']
.
MATCH (n:Node)
UNWIND (n.name) as name_lst
RETURN SUM(toInteger(apoc.text.levenshteinSimilarity(name_lst, 'string 1') > 0.6))
Any thoughts on how to transform the above a query if we have multiple strings.
No need to UNWIND the name as name_lst and you can use that variable directly in the APOC function.
If any of the string in the list ['string 1', 'string 2', 'string 3'] has a levSim value of > 0.6 then it will return true. Converting true to integer is 1.
Thus, getting the sum of all 1s in the result will give you the number of Nodes that has a name property with levSim value > 0.6 to any string on the list ['string 1', 'string 2', 'string 3'].
MATCH (n:Node)
RETURN SUM(toInteger(ANY(s in ['string 1', 'string 2', 'string 3']
WHERE apoc.text.levenshteinSimilarity(n.name, s ) > 0.6)))