ruby-on-railsamazon-s3digital-oceanrails-activestorage

Migrating ActiveStorage attachments from local storage to S3


I have a a Rails app currently using ActveRecord to attach many images to a few different models via has_many_attached :images. The app is beginning to outgrow the storage of these images in the local ./storage directory so I've begun the process of migrating to an S3 compatible service.

New records with attached images are now saving to the new S3 service and the images are being recalled correctly when viewing the records. I used s3cmd to sync all files from the local directory to the new S3 service, which seemed to complete successfully, and renamed the ./storage directory to ./storage.bak for testing.

All preexisting records fail to fetch their associated images from the S3 service now though. I've issued a ActiveStorage::Blob.update_all(service_name: 'digitalocean_s3') from the console, which did update all records' service_name, but the images still fail to load. Interestingly, When clicking on a link to one of these images Digital Ocean (host of the S3 compatible bucket) displays the following (0's and x's replaced by me):

<Error>
 <Code>NoSuchKey</Code>
 <Message/>
 <BucketName>attachments-production</BucketName>
 <RequestId>tx000000000000000000000-0000000000-00000000-nyc3d</RequestId>
 <HostId>xxxxxxxx-nyc3d-nyc3-xxxx</HostId>
</Error>

I'm guessing I need to alter something else about these Blobs, but what? Am I missing something else?


Solution

  • Actually, the placement of the attachment files seems to be different when using the local ./storage directory than it is when using an S3 bucket. As stated in my reply above, when stored in the local storage directory ActiveStorage is using a subdirectory structure like this:

    ./storage/va/j4/vaj4us85jdi33jdiw48

    But, when configured to use an S3 bucket, it stores all files directly in the root of the bucket with no simulated directory structure, like this:

    /vaj4us85jdi33jdiw48

    So the trick was to get all of these 500,000+ attachments from many thousand of subdirectories to the root of the S3 bucket. I used s3cmd in conjunction with find to accomplish this using the one-liner below:

    find /app_dir/storage -type f ! -path "*/variants/*" -exec bash -c 's3cmd sync --preserve "{}" "s3://bucket-name/$(basename "{}")"' \;

    This finds all files, regardless of subdirectory, then fires up the s3cmd sync command for each individually and drops them into the root of the bucket. I'm choosing to exclude the ./storage/variants directory here, I'm fine with those getting regenerated later.

    There are probably more efficient ways of doing this, but I was able to copy 300 GB of attachments in about 60 hours using this method, which was fast enough for this project.