pythongoogle-app-enginewebapp2

Limiting the number of request from any given IP address


I am working on a Google App Engine project (python/webapp2) where I am a little concerned with people abusing/spamming the service I am creating with a large number of requests. In an attempt to combat this potential, my idea is to limit the number of requests allowed per IP address in any given hour for certain parts of the applicaiton. My current plan is as follows:

On each request I will:

  1. grab the ip address from the header
  2. store this ip address in the dataStore with a time stamp
  3. delete any ip address entities in that are over an hour old
  4. count the number of dataStore entities with that IP address
  5. disallow access if there are more than given limit

My question is this:
Is this the best way to go about this? I am only a beginner here and I imagine that there is quite a bit of overhead of doing it this way and that possibly this is a common task that might have a better solution. Is there any better way to do this that is less resource intensive?


Solution

  • In the past, I've done this with memcache, which is much faster, especially since you only really care about approximate limits (approximate because memcache can be flushed by the system, might not be shared by all instances, etc.). You can even use it to expire keys for you. Something like this (which assumes self is a webapp2 request handler, and you've imported GAE's memcache library):

    memcache_key = 'request-count-' + self.request.remote_addr
    
    count = memcache.get(memcache_key)
    
    if count is not None and count > MAX_REQUESTS:
        logging.warning("Remote user has %d requests; rejecting." % (count))
        self.error(429)
        return
    
    count = memcache.incr(memcache_key)
    if count is None:
        # key didn't exist yet
        memcache.add(memcache_key, 1, time=WINDOW_IN_SECONDS)
    

    This will create a key which rejects users after about MAX_REQUESTS in WINDOW_IN_SECONDS time, re-zeroing the count each WINDOW_IN_SECONDS. (i.e. it's not a sliding window; it resets to zero each time period.)