pythongoogle-app-enginecrongoogle-cloud-datastoregql

How to move all attributes in a datastore with value True to another datastore?


Is there a way of moving all attributes within a model with a value set to True to another model? I am writing in Python and have the following:

 class crimechecker(webapp.RequestHandler):
    def get(self):
        #Checks for crime
        articles = Article.all().filter('crime = ', None)
        for article in articles:
            crime = False
            for word in triggers:
                body = article.body
                if body.find(word) != -1:
                    crime = True
            article.crime = crime
            a = article.put()

Then a separate cron is run: and each crime story is added to Story with their location. But the stories are not appearing in the Story model?!

class place(webapp.RequestHandler):
    def post(self):
        # Check for any article which was classified as "TRUE" therefore it is a crime document
        crimes = Article.all().filter('crime = ', True)
        for crimestory in crimes:
            if Story.all().filter('title = ', crimestory.title).count() == 0:
                #Yahoo Placemaker key
                p = placemaker('HSnG9pPV34EUBcexz.tDYuSrZ8Hnp.LowswI7TxreF8sXrdpVyVIKB4uPGXBYOA9VjjF1Ca42ipd_KhdJsKYjI5cXRo0eJM-')
                #Encoding for symbols and euro signs etc.
                print p.find_places(crimestory.body.encode('utf-8'))
                for place in p.places:
                    splitted = place.name.split()
                    #Check for locations within Ireland (IE)
                    if 'IE' in splitted:
                        story = Story(long=place.centroid.longitude, lat=place.centroid.latitude, link=crimestory.link, loc_name=place.name, title=crimestory.title, date=crimestory.date).put()
                        logging.info(story)

I have 2 models: an Article and Story. All articles are stored in the article model and any article with crime = True is set to be in the Story model. For some reason it is not moving the stories. The cron is running and not having any log errors. Can I do this task in my dashboard? I have queried both models :

SELECT * FROM Article ORDER BY date DESC

The Article model has stories from todays date (May 2nd)

Story has articles from April 19th and no more since then. Can I query the models and say move all entities with crime set to true to the Story model?


Solution

  • I don't see anything obviously wrong with your code. Since we've already established that there are no errors, my advice would be to add more logging.info calls higher up to see which if statement is evaluating to false, or which for loop is iterating over an empty set. Also, have you confirmed that crimechecker is successfully setting new crime stories to True? It doesn't sound like you've determined which of your two cron jobs is at fault.

    More fundamentally, I think you should re-consider the basic design of this task. Let's classify it as 3 steps:

    1. User creates a new Article entity with a title and body
    2. If the body contains certain keywords, flag it as containing a crime.
    3. If it contains a crime, create a corresponding Story entity.

    Why not just do all of this work in one handler when the article is saved? Why break it out three distinct parts? Besides making the operation complex, your 2nd cron job is also inefficient at scale. You're fetching every crime story since the beginning of time and doing one query for each to see if there's a corresponding story. If you accumulate any significant number of articles, this won't complete in time.

    If you're worried about the performance impact of doing all of these tasks when the article is first saved, use the task queue. When the article is first saved, create one task to scan it for crime keywords. If the keywords are found, create one task to store the corresponding story entity. Pass the entity key around in the task parameters so you don't have to query for anything.