pythongoogle-cloud-datastoreapp-engine-ndbgcloud-pythongoogle-cloud-python

Query google datastore by key in gcloud api


I'm trying to query for some data using the gcloud api that I just discovered. I'd like to query for a KeyPropery. e.g.:

from google.appengine.ext import ndb

class User(ndb.Model):
    email = ndb.StringProperty()

class Data(ndb.Model):
    user = ndb.KeyProperty('User')
    data = ndb.JsonProperty()

In GAE, I can query this pretty easily assuming I have a user's key:

user = User.query(User.email == 'me@domain.com').get()
data_records = Data.query(Data.user == user.key).fetch()

I'd like to do something similar using gcloud:

from gcloud import datastore

client = datastore.Client(project='my-project-id')
user_qry = client.query(kind='User')
user_qry.add_filter('email', '=', 'me@domain.com')
users = list(user_qry.fetch())
user = users[0]

data_qry = client.query(kind='Data')
data_qry.add_filter('user', '=', user.key)  # This doesn't work ...
results = list(data_qry.fetch())  # results = []

Looking at the documentation for add_filter, it doesn't appear that Entity.key is a supported type:

value (int, str, bool, float, NoneType, :classdatetime.datetime) – The value to filter on.

Is it possible to add filters for key properties?


I've done a bit more sleuthing to try to figure out what is really going on here. I'm not sure that this is helpful for me to understand this issue at the present, but maybe it'll be helpful for someone else.

I've mocked out the underlying calls in the respective libraries to record the protocol buffers that are being serialized and sent to the server. For GAE, it appears to be Batch.create_async in the datastore_query module.

For gcloud, it is the datastore.Client.connection.run_query method. Looking at the resulting protocol buffers (anonymized), I see:

gcloud query pb.

kind {
  name: "Data"
}
filter {
  composite_filter {
    operator: AND
    filter {
      property_filter {
        property {
          name: "user"
        }
        operator: EQUAL
        value {
          key_value {
            partition_id {
              dataset_id: "s~app-id"
            }
            path_element {
              kind: "User"
              name: "user_string_id"
            }
          }
        }
      }
    }
  }
}

GAE query pb.

kind: "Data"
Filter {
  op: 5
  property <
    name: "User"
    value <
      ReferenceValue {
        app: "s~app-id"
        PathElement {
          type: "User"
          name: "user_string_id"
        }
      }
    >
    multiple: false
  >
}

The two libraries are using different versions of the proto as far as I can tell, but the data being passed looks very similar...


Solution

  • This is a subtle bug with your use of the ndb library:

    All ndb properties accept a single positional argument that specifies the property's name in Datastore

    Looking at your model definition, you'll see user = ndb.KeyProperty('User'). This isn't actually saying that the user property is a key of a User entity, but that it should be stored in Datastore with the property name User. You can verify this in your gae protocol buffer query where the property name is (case sensitive) User.

    If you want to limit the key to a single kind, you need to specify it using the kind option.

    user = ndb.KeyProperty(kind="User") 
    

    The KeyProperty also supports:

    user = ndb.KeyProperty(User)   # User is a class here, not a string
    

    Here is a description of all the magic.

    As it is now, your gcloud query is querying for the wrong cased user and should be:

    data_qry = client.query(kind='Data')
    data_qry.add_filter('User', '=', user.key)