I am interested in traffic lifecycle (i.e. when the objects were created and deleted) of objects.
One approach is to perform periodic scan of the bucket and track explicitly the lastModifiedTime
and perform a diff with previous scan result to identify objects deleted.
Another alternate I was considering was to enable S3 event notifications. However, the data in notification does not contain lastModifiedTime for the object. Can the eventTime
be used as proxy instead? Is there a guarantee how quickly the event is sent ? In my case, it is acceptable if delivery of the event is delayed; as long as eventTime
is not significantly later that modificationTime
of object
Also, any other alternatives to capture lifecycle of s3 objects?
Yeah, the eventTime
is a pretty good approximation of the lastModifiedTime
of an object. One caveat here is the definition of lastModifiedTime
is
Object creation date or the last modified date, whichever is the latest.
So in order to use eventTime
as an approximation, you probably need a trigger that covers all the events where an object is either created or modified. Regarding to your question of how quickly the event is sent, here is a quote from the S3 documentation:
Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.
If you want the accurate lastModifiedTime
, you need to do a headObject
operation for each object.
Your first periodic pull approach could work, but be careful don't do it naively if you have millions of objects. I mean don't use listObjects
and do it in a while loop. This doesn't scale at all and listObjects
API is pretty expensive. If you only need to do this traffic analysis once a day or once a week, I recommend using S3 inventory. The lastModifiedTime
is included in the inventory report. [ref]