pythongoogle-app-engineapp-engine-ndbappstats

GAE Appstats RPC Timeline graph shows long delays with complex NDB queries


I have attached the Appstats from my production app Search page below. The page takes ~45 seconds to load the results by AJAX. There are around 100 entities. The query is as shown below:

qry_1 = X.query(ndb.AND(X.active_status=="active", X.property_3==input_3, X.property_4==input_4, X.property_5==input_5, X.property_6.IN(input_6_list), X.property_20.IN(input_20_list))) 
record_list = qry_1.fetch() 

# input_6_list contains ~5 string items 
# input_20_list contains ~5 string items 

I am not able to figure out:
why are there empty spaces shown in the graph below between the RPC calls.. what do they signify.. how can I prevent them as they make my website unusable
why is the query taking so long to complete

Appstats Timeline

The Call Trace points to the following line in my code:

record_list = qry_1.fetch() 

Call Trace line showing it:

  <path[0]>/main.py:6332 post() 

Solution

  • The situation has not really changed since your previous question. Your data looks very relational. The model is such that it forces queries to perform large numbers of index lookups to satisfy AND and IN operations. In short, the current model will never scale. It must be completely restructured.

    Create your new structure by starting at the query - what is the fastest (in other words simplest) way to just output the required data with the minimum number of query parameters and index lookups? You could concatenate all your old query parameters (properties), hash the result, and end up with a single key to look up in a single index. Fast. Then work backwards from there to storing your records such that every combination of properties hashes to a unique key and a corresponding result record. You must accept redundancy (storing the same result multiple times for different property combinations) but that is the NoSQL way, not Third Normal Form.

    There are some other potentially useful suggestions for re-structuring your data in these StackOverflow questions: Storing song, artist and album data and How to store document structure.

    Second answer: If you are inflexible on your data model, replace Datastore with Cloud SQL because that is much faster with complex queries on small data sets.