amazon-web-servicesamazon-s3s3-object-taggings3-lifecycle-policy

S3 Lifecycle Policy Delete All Objects WITHOUT A Certain Tag Value


While reading over this S3 Lifecycle Policy document I see that it's possible to delete an S3 object containing a particular key=value pair e.g.,

<LifecycleConfiguration>
    <Rule>
        <Filter>
           <Tag>
              <Key>key</Key>
              <Value>value</Value>
           </Tag>
        </Filter>
        transition/expiration actions.
        ...
    </Rule>
</LifecycleConfiguration>

But is it possible to create a similar rule that deletes any object NOT in the key=value pair? For example, anytime my object is accessed I could update it's tag with the days current date e.g., object-last-accessed=07-26-2019. Then I could create a Lambda function that deletes the current S3 Lifecycle policy each day and then create a new lifecycle policy that has a tag for each of the last 30 days, then my lifecycle policy would automatically delete any object that has not been accessed in the last 30 days; anything that was accessed longer than 30 days would have a date value older than any value inside the lifecycle policy and hence it would get deleted.

Here's an example of what I desire (note I added the desired field <exclude>,

<LifecycleConfiguration>
    <Rule>
        <Filter>
           <exclude>
              <Tag>
                 <Key>last-accessed</Key>
                 <Value>07-30-2019</Value>
              </Tag>
              ...
              <Tag>
                 <Key>last-accessed</Key>
                 <Value>07-01-2019</Value>
              </Tag>
           <exclude>
        </Filter>
        transition/expiration actions.
        ...
    </Rule>
</LifecycleConfiguration>

Is something like my made up <exclude> value possible? I want to delete any S3 Object that has not been accessed in the last 30 days (that's different than an object which is older than 30 days).


Solution

  • From what I understand, this is possible but via a different mechanism.

    My solution is to take a slightly different approach and set a tag on every object and then alter that tag as you need. So in your instance when the object is created set object-last-accessed to "default" do that through an S3 trigger to a piece of Lambda or when the object is written to S3.

    When the object is accessed, then update the tag value to the current date.

    If you already have a bucket full of objects, you can use S3 batch to set the tag to the current date and use that as a delta reference point from which to assume files were last accessed

    https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectTagging.html

    Now set the lifecycle rule to remove objects with a tag of "default" after 10 days (or whatever you want). Add additional rules to remove files with a tag of a date 10 days after that date. You will need to update the lifecycle rule periodically, but you can create 1000 at a time. this doc gives details of the formal for a rule https://docs.aws.amazon.com/AmazonS3/latest/API/API_LifecycleRule.html I'd suggest something like this

    <LifecycleConfiguration>
        <Rule>
            <ID>LastAccessed Default Rule</ID>
            <Filter>
                <Tag>
                    <Key>object-last-accessed</Key>
                    <Value>default</Value>
                </Tag>
            </Filter>
            <Status>Enabled</Status>
            <Expiration>
                <Days>10</Days> 
            </Expiration>
        </Rule>
        <Rule>
            <ID>Last Accessed 2020-05-19 Rule</ID>
            <Filter>
                <Tag>
                    <Key>object-last-accessed</Key>
                    <Value>2020-05-19</Value>
                </Tag>
            </Filter>
            <Status>Enabled</Status>
            <Expiration>
                <Date>2020-05-29</Date> 
            </Expiration>
        </Rule>
    </LifecycleConfiguration>