google-app-enginegoogle-cloud-datastorelow-level-api

Datastore efficiency, low level API


Every Cloud Datastore query computes its results using one or more indexes, which contain entity keys in a sequence specified by the index's properties and, optionally, the entity's ancestors. The indexes are updated incrementally to reflect any changes the application makes to its entities, so that the correct results of all queries are available with no further computation needed.

Generally, I would like to know if

datastore.get(List<Key> listOfKeys);

is faster or slower than a query with the index file prepared (with the same results).

Query q = new Query("Kind")(.setFilter(someFilter));

My current problem:

My data consists of Layers and Points. Points belong to only one unique layer and have unique ids within a layer. I could load the points in several ways:

1) Have points with a "layer name" property and query with a filter. - Here I am not sure whether the datastore would have the results prepared because as the layer name changes dynamically.

2) Use only keys. The layer would have to store point ids.

KeyFactory.createKey("Layer", "layer name");
KeyFactory.createKey("Point", "layer name"+"x"+"point id");

3) Use queries without filters: I don't actually need the general kind "Point" and could be more specific: kind would be ("layer name"+"point id") - What are the costs to creating more kinds? Could this be the fastest way?

Can you actually find out how the datastore works in detail?


Solution

  • faster or slower than a query with the index file prepared (with the same results).

    Fundamentally a query and a get by key are not guaranteed to have the same results.

    Queries are eventually consistent, while getting data by key is strongly consistent.

    Your first challenge, before optimizing for speed, is probably ensuring that you're showing the correct data.

    The docs are good for explaining eventual vs strong consistency, it sounds like you have the option of using an ancestor query which can be strongly consistent. I would also strongly recommend avoiding using the 'name' - which is dynamic - as the entity name, this will cause you an excessive amount of grief.

    Edit: In the interests of being specifically helpful, one option for a working solution based on your description would be:

    1. Give a unique id (a uuid probably) to each layer, store the name as a property
    2. Include the layer key as the parent key for each point entity
    3. Use an ancestor query when fetching points for a layer (which is strongly consistent)

    An alternative option is to store points as embedded entities and only have one entity for the whole layer - depends on what you're trying to achieve.