ruby-on-railsamazon-s3zipaws-sdk-ruby

S3 Upload/Download Timeout Issues


I'm attempting to create a kmz format file of geotagged images, using S3 file storage and sdk access through a ruby-on-rails app on Heroku.

I'm running the file processes when the "project" view loads, but the instance methods I've written to access S3 and process the files take about 40s to complete, resulting in a 504 timeout error.

I've already looking into zipping on S3 itself without local download, but that doesn't appear feasible. Is there a better way to approach this download/upload process to speed it up, or a better place to run it to avoid the timeout if not?

Methods here in Project Controller:

  # GET /projects/1
  # GET /projects/1.json
  def show
   @pictures = @project.pictures.all
   @project.generate_kml
   @project.download_project
   @project.generate_kmz
  end

With full detail:

def generate_kml
        content = []
        content.push('<?xml version="1.0" encoding="UTF-8"?>')
        content.push('<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">')
        content.push('<Document>')
        content.push("<name>#{self.id}.kmz</name>")
        #cycles through each picture in db for the current project
        self.pictures.each do |pic|
            pic_title = pic.image.to_s.split('/').last
            content.push('<Placemark>')
            content.push("<name>#{pic_title}</name>")
            content.push('<description>')
            content.push('<![CDATA[')
            line = '<img style="max-width:1000px;" src="' + '' + pic_title + '">' 
            content.push(line)
            content.push(']]>')
            content.push('</description>')
            content.push('<Point>')
            content.push("<coordinates>-#{pic.long},#{pic.lat}</coordinates>")
            content.push('</Point>')
            content.push('</Placemark>')
        end
        content.push('</Document>')
        content.push('</kml>')
        #pushes upload to S3 folder
        s3 = Aws::S3::Resource.new
        obj = s3.bucket(ENV['S3_BUCKET']).object("uploads/" + "#{self.id}" + "/doc.kml")
        File.open("kml_temp", "w+") { |f| 
        f.puts(content)
        obj.put(body: f)
        }
    end

    def generate_kmz
        #create
        directory_to_zip = "/tmp/#{self.id}"
        output_file = "/tmp/kmz_directory/#{self.id}.kmz"
        zf = ZipFileGenerator.new(directory_to_zip, output_file)
        zf.write()
        #send to S3
        s3 = Aws::S3::Resource.new
        obj = s3.bucket(ENV['S3_BUCKET']).object("uploads/kmz_directory/" + "#{self.id}.kmz")
        obj.upload_file("/tmp/kmz_directory/#{self.id}.kmz")
    end

    def download_project
        #tmp cleanup    
        #FileUtils.rm_r '/tmp'

        #delete target directory if exists
        if Dir.exist?("/tmp/#{self.id}") 
            FileUtils.remove_dir("/tmp/#{self.id}")
        end

        #create kmz_dir if needed
        if Dir.exist?("/tmp/kmz_directory") 
        else
           FileUtils.mkdir "/tmp/kmz_directory"  
        end

        #create target dir
        FileUtils.mkdir "/tmp/#{self.id}" 

        #download pics
        s3 = Aws::S3::Resource.new
        s3.bucket(ENV['S3_BUCKET']).object_versions({ prefix:"uploads/#{self.id}" }).each do |object|
            #get file name
            full_key = object.key
            file_name = full_key.to_s.split('/').last
            #save to /tmp
            object.get(response_target: "/tmp/#{self.id}/#{file_name}")
        end

    end

Solution

  • Heroku limits web requests to 30 seconds. Typically long running processes are done on worker dynos using something like sidekiq or delayed job. Your web client could poll the ProjectsController#show action every couple of seconds, and when the file is ready the action could render a page with a link to the kml file in the s3 bucket.