amazon-web-servicesamazon-s3amazon-cloudwatch-events

Realtime-ness of S3 event notification


I am interested in traffic lifecycle (i.e. when the objects were created and deleted) of objects. One approach is to perform periodic scan of the bucket and track explicitly the lastModifiedTime and perform a diff with previous scan result to identify objects deleted.

Another alternate I was considering was to enable S3 event notifications. However, the data in notification does not contain lastModifiedTime for the object. Can the eventTime be used as proxy instead? Is there a guarantee how quickly the event is sent ? In my case, it is acceptable if delivery of the event is delayed; as long as eventTime is not significantly later that modificationTime of object

Also, any other alternatives to capture lifecycle of s3 objects?


Solution

  • Yeah, the eventTime is a pretty good approximation of the lastModifiedTime of an object. One caveat here is the definition of lastModifiedTime is

    Object creation date or the last modified date, whichever is the latest.

    So in order to use eventTime as an approximation, you probably need a trigger that covers all the events where an object is either created or modified. Regarding to your question of how quickly the event is sent, here is a quote from the S3 documentation:

    Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.

    If you want the accurate lastModifiedTime, you need to do a headObject operation for each object.

    Your first periodic pull approach could work, but be careful don't do it naively if you have millions of objects. I mean don't use listObjects and do it in a while loop. This doesn't scale at all and listObjects API is pretty expensive. If you only need to do this traffic analysis once a day or once a week, I recommend using S3 inventory. The lastModifiedTime is included in the inventory report. [ref]