faunadb

How to query by multiple conditions in faunadb?


I try to improve my understanding of FaunaDB.

I have a collection that contains records like:

{
  "ref": Ref(Collection("regions"), "261442015390073344"),
  "ts": 1587576285055000,
  "data": {
    "name": "italy",
    "attributes": {
      "amenities": {
        "camping": 1,
        "swimming": 7,
        "hiking": 3,
        "culture": 7,
        "nightlife": 10,
        "budget": 6
      }
    }
  }
}

I would like to query in a flexible way by different attributes like:

I created an index containing all attributes, but I don't know how to do greater equals filtering in an index that contains multiple terms.

My fallback would be to create an index for each attribute and use Intersection to get the records that are in all subqueries that I want to check, but this feels somehow wrong:

The query: budget >= 6 AND camping >=8 would be:

Index:
{
  name: "all_regions_by_all_attributes",
  unique: false,
  serialized: true,
  source: "regions",
  terms: [],
  values: [
    {
      field: ["data", "attributes", "amenities", "culture"]
    },
    {
      field: ["data", "attributes", "amenities", "hiking"]
    },
    {
      field: ["data", "attributes", "amenities", "swimming"]
    },
    {
      field: ["data", "attributes", "amenities", "budget"]
    },
    {
      field: ["data", "attributes", "amenities", "nightlife"]
    },
    {
      field: ["data", "attributes", "amenities", "camping"]
    },
    {
      field: ["ref"]
    }
  ]
}

Query:

Map(
  Paginate(
    Intersection(
      Range(Match(Index("all_regions_by_all_attributes")), [0, 0, 0, 6, 0, 8], [10, 10, 10, 10, 10, 10]),
    )

  ),
  Lambda(
    ["culture", "hiking", "swimming", "budget", "nightlife", "camping", "ref"],
    Get(Var("ref"))
  )
)

This approach has the following disadvantages:

Is it possible to store all values in this kind of index that would contain all the data? I know I can just add more values to the index and access them. But this would mean I have to create a new index as soon as we add more fields to the entity. But maybe this is a common thing.

thanks in advance


Solution

  • There are a couple of misconceptions that I think are leading you astray. The most important one: Match(Index($x)) generates a set reference, which is an ordered set of tuples. The tuples correspond to the array of fields that are present in the values section of an index. By default this will just be a one-tuple containing a reference to a document in the collection selected by the index. Range operates on a set reference and knows nothing about the terms used to the select the returned set ref. So how do we compose the query?

    Starting from first principles. Lets imagine we just had this stuff in memory. If we had a set of (attribute, scores) ordered by attribute, score then taking only those where attribute == $attribute would get us close, and then filtering by score > $score would get us what we wanted. This corresponds exactly to a range query over scores with attributes as terms, assuming we modeled the attribute value pairs as documents. We can also embed pointers back to the location so we can retrieve that as well in the same query. Enough chatter, lets do it:

    First stop: our collections.

    jnr> CreateCollection({name: "place_attribute"})
    {
      ref: Collection("place_attribute"),
      ts: 1588528443250000,
      history_days: 30,
      name: 'place_attribute'
    }
    jnr> CreateCollection({name: "place"})
    {
      ref: Collection("place"),
      ts: 1588528453350000,
      history_days: 30,
      name: 'place'
    }
    

    Next up some data. We'll chose a couple of places and give them some attributes.

    jnr> Create(Collection("place"), {data: {"name": "mullion"}})
    jnr> Create(Collection("place"), {data: {"name": "church cove"}})
    jnr> Create(Collection("place_attribute"), {data: {"attribute": "swimming", "score": 3, "place": Ref(Collection("place"), 264525084639625739)}})
    jnr> Create(Collection("place_attribute"), {data: {"attribute": "hiking", "score": 1, "place": Ref(Collection("place"), 264525084639625739)}}) 
    jnr> Create(Collection("place_attribute"), {data: {"attribute": "hiking", "score": 7, "place": Ref(Collection("place"), 264525091487875586)}})
    

    Now for the more interesting part. The index.

    jnr> CreateIndex({name: "attr_score", source: Collection("place_attribute"), terms:[{"field":["data", "attribute"]}], values:[{"field": ["data", "score"]}, {"field": ["data", "place"]}]})
    {
      ref: Index("attr_score"),
      ts: 1588529816460000,
      active: true,
      serialized: true,
      name: 'attr_score',
      source: Collection("place_attribute"),
      terms: [ { field: [ 'data', 'attribute' ] } ],
      values: [ { field: [ 'data', 'score' ] }, { field: [ 'data', 'place' ] } ],
      partitions: 1
    }
    

    Ok. A simple query. Who has Hiking?

    jnr> Paginate(Match(Index("attr_score"), "hiking"))
    {
      data: [
        [ 1, Ref(Collection("place"), "264525084639625730") ],
        [ 7, Ref(Collection("place"), "264525091487875600") ]
      ]
    }
    

    Without too much imagination one could sneak a Get call into that to pull the place out.

    What about only hiking with a score over 5? We have an ordered set of tuples, so just supplying the first component (the score) is enough to get us what we want.

    jnr> Paginate(Range(Match(Index("attr_score"), "hiking"), [5], null))
    { data: [ [ 7, Ref(Collection("place"), "264525091487875600") ] ] }
    
    

    What about a compound condition? Hiking under 5 and swimming (any score). This is where things take a bit of a turn. We want to model conjunction, which in fauna means intersecting sets. The problem we have is that up until now we have been using an index that returns the score as well as the place ref. For intersection to work we need just the refs. Time for a sleight of hand:

    jnr> Get(Index("doc_by_doc"))
    {
      ref: Index("doc_by_doc"),
      ts: 1588530936380000,
      active: true,
      serialized: true,
      name: 'doc_by_doc',
      source: Collection("place"),
      terms: [ { field: [ 'ref' ] } ],
      partitions: 1
    }
    

    What's the point of such an index you ask? Well we can use it to drop any data we like from any index and be left with just the refs via join. This gives us the place refs with a hiking score less than 5 (the empty array sorts before anything, so works as a placeholder for a lower bound).

    jnr> Paginate(Join(Range(Match(Index("attr_score"), "hiking"), [], [5]), Lambda(["s", "p"], Match(Index("doc_by_doc"), Var("p")))))
    { data: [ Ref(Collection("place"), "264525084639625739") ] }
    

    So finally the piece de resistance: all places with swimming and (hiking < 5):

    jnr> Let({
    ...   hiking: Join(Range(Match(Index("attr_score"), "hiking"), [], [5]), Lambda(["s", "p"], Match(Index("doc_by_doc"), Var("p")))),
    ...   swimming: Join(Match(Index("attr_score"), "swimming"), Lambda(["s", "p"], Match(Index("doc_by_doc"), Var("p"))))
    ... },
    ... Map(Paginate(Intersection(Var("hiking"), Var("swimming"))), Lambda("ref", Get(Var("ref"))))
    ... )
    {
      data: [
        {
          ref: Ref(Collection("place"), "264525084639625739"),
          ts: 1588529629270000,
          data: { name: 'mullion' }
        }
      ]
    }
    

    Tada. This could be neatened up a lot with a couple of udfs, exercise left to the reader. Conditions involving or can be managed with union in much the same way.