neo4jcypher

How to write a cypher query to find nodes tagged by specific tags?


Db structure:

(:Event)-[:IS_TAGGED_BY]->(:EventTag {value})

Comments for the structure:


I need to write a query, that will return only those events, which are tagged by a specific set of tags.

There are may be two variations of such a query, which return:

  1. Events, tagged by at least one of the specified tags (let's call it findTaggedByAny).
  2. Events, tagged by all specified tags (let's call it findTaggedByAll).

I can write the findTaggedByAny query:

MATCH (et:EventTag)--(e:Event) WHERE et.value in {0} RETURN e

here {0} - is where the query parameter, containing a set of tag values, will be substituted.

So, after substitution the query will look like:

MATCH (et:EventTag)--(e:Event) WHERE et.value in ["tag1", "tag2"] RETURN e

But I have difficulties trying to implement the findTaggedByAll query, which also should take the same parameter and return events, tagged by all tags from the set. And it doesn't matter whether an event is tagged by any other tags or not.


Solution

  • [UPDATED]

    1. If you want to get events that are connected to all tags in the DB, you can do an efficient "degree-ness" check, like this (assuming that an event is only connected at most once to a specific tag, and the IS_TAGGED_BY relationship is only used to connect events to tags):

      MATCH (t:EventTag)
      WITH COUNT(t) AS numTags
      MATCH (e:Event)
      WHERE SIZE((e)-[:IS_TAGGED_BY]->()) = numTags
      RETURN e;
      
    2. If, instead, you want to get events that are tagged by any tag in a tagList parameter:

      MATCH (e:Event)-[:IS_TAGGED_BY]->(t:EventTag)
      WHERE t.value IN $tagList
      RETURN e;
      
    3. If, instead, you want to get events that are tagged by all tags in a tagList parameter:

      MATCH (e:Event)-[:IS_TAGGED_BY]->(t:EventTag)
      WITH e, COLLECT(t.value) AS tagValues
      WHERE ALL(v IN tagValues WHERE v IN $tagList)
      RETURN e;
      

      Also, if it is relatively rare for an event to have that many tags, this longer query may actually be faster (by doing a degree-ness check before actually looking at the tags):

      MATCH (e:Event)
      WHERE SIZE((e)-[:IS_TAGGED_BY]->()) >= SIZE($tagList)
      MATCH (e)-[:IS_TAGGED_BY]->(t:EventTag)
      WITH e, COLLECT(t.value) AS tagValues
      WHERE ALL(v IN tagValues WHERE v IN $tagList)
      RETURN e;