ruby-on-railsruby-on-rails-5mirroringrails-activestorage

How to sync new ActiveStorage mirrors?


Starting with ActiveStorage you can know define mirrors for storing your files.

local:
  service: Disk
  root: <%= Rails.root.join("storage") %>

amazon:
  service: S3
  access_key_id: <%= Rails.application.credentials.dig(:aws, :access_key_id) %>
  secret_access_key: <%= Rails.application.credentials.dig(:aws, :secret_access_key) %>
  region: us-east-1
  bucket: mybucket

mirror:
  service: Mirror
  primary: local
  mirrors:
    - amazon
    - another_mirror

If you add a mirror after a certain point of time you have to take care about copying all files e.g. from "local" to "amazon" or "another_mirror".

  1. Is there a convenient method to keep the files in sync?
  2. Or method run a validation to check if all files are avaiable on each service?

Solution

  • I have a couple of solutions that might work for you, one for Rails <= 6.0 and one for Rails >= 6.1:

    Firstly, you need to iterate through your ActiveStorage blobs:

    ActiveStorage::Blob.all.each do |blob|
      # work with blob
    end
    

    then...

    1. Rails <= 6.0

      You will need the blob's key, checksum, and the local file on disk.

      local_file = ActiveStorage::Blob.service.primary.path_for blob.key
      
      # I'm picking the first mirror as an example,
      # but you can select a specific mirror if you want
      mirror = blob.service.mirrors.first
      
      mirror.upload blob.key, File.open(local_file), checksum: blob.checksum
      

      You may also want to avoid uploading a file if it already exists on the mirror. You can do that by doing this:

      mirror = blob.service.mirrors.first
      
      # If the file doesn't exist on the mirror, upload it
      unless mirror.exist? blob.key
        # Upload file to mirror
      end
      

      Putting it together, a rake task might look like:

      # lib/tasks/active_storage.rake
      
      namespace :active_storage do
      
        desc 'Ensures all files are mirrored'
        task mirror_all: [:environment] do
      
        # Iterate through each blob
        ActiveStorage::Blob.all.each do |blob|
      
          # We assume the primary storage is local
          local_file = ActiveStorage::Blob.service.primary.path_for blob.key
      
          # Iterate through each mirror
          blob.service.mirrors.each do |mirror|
      
            # If the file doesn't exist on the mirror, upload it
            mirror.upload(blob.key, File.open(local_file), checksum: blob.checksum) unless mirror.exist? blob.key
      
            end
          end
        end
      end
      

      You may run into a situation like @Rystraum mentioned where you might need to mirror from somewhere other than the local disk. In this case, the rake task could look like this:

      # lib/tasks/active_storage.rake
      
      namespace :active_storage do
      
        desc 'Ensures all files are mirrored'
        task mirror_all: [:environment] do
      
          # All services in our rails configuration
          all_services = [ActiveStorage::Blob.service.primary, *ActiveStorage::Blob.service.mirrors]
      
          # Iterate through each blob
          ActiveStorage::Blob.all.each do |blob|
      
            # Select services where file exists
            services = all_services.select { |file| file.exist? blob.key }
      
            # Skip blob if file doesn't exist anywhere
            next unless services.present?
      
            # Select services where file doesn't exist
            mirrors = all_services - services
      
            # Open the local file (if one exists)
            local_file = File.open(services.find{ |service| service.is_a? ActiveStorage::Service::DiskService }.path_for blob.key) if services.select{ |service| service.is_a? ActiveStorage::Service::DiskService }.any?
      
            # Upload local file to mirrors (if one exists)
            mirrors.each do |mirror|
              mirror.upload blob.key, local_file, checksum: blob.checksum
            end if local_file.present?
      
            # If no local file exists then download a remote file and upload it to the mirrors (thanks @Rystraum)
            services.first.open blob.key, checksum: blob.checksum do |temp_file|
              mirrors.each do |mirror|
                mirror.upload blob.key, temp_file, checksum: blob.checksum
              end
            end unless local_file.present?
      
          end
        end
      end
      

      While the first rake task answers the OP's question, the latter is much more versatile:

      • It can be used with any combination of services
      • A DiskService is not required
      • Uploading via DiskServices are prioritized
      • Avoids extra exists? calls as we only call it once per service per blob
    2. Rails > 6.1

      Its super easy, just call this on each blob...

      blob.mirror_later
      

      Wrapping it up as a rake task looks like:

      # lib/tasks/active_storage.rake
      
      namespace :active_storage do
      
        desc 'Ensures all files are mirrored'
        task mirror_all: [:environment] do
          ActiveStorage::Blob.all.each do |blob|
            blob.mirror_later
          end
        end
      end