ruby-on-railsruby-on-rails-6rails-activestorageprawncombine-pdf

Validate PDF is stampable - Rails, Prawn, CombinePDF


I'm working at a company where we upload a good amount of PDFs, which we later stamp using Prawn. Occasionally these PDFs upload and save fine, but when we try to stamp them later they don't work and our managers have to re-convert the file, and re-input a bunch of data.

As such we're looking for ways to validate the PDFs before they're attached to ensure they're going to be stampable later, or convert them to a PDF format we know is going to work with Prawn.

I have two questions

  1. is there anything wrong with our stamping code? (posted below)
  2. is there any way to do that sort of validation? including
    • converting to a Prawn doc before uploading
    • converting to a Prawn doc and attempting some trivial operation before uploading
    • any other solutions
    begin
      paid_stamp_pdf_file = Tempfile.new('paid')

      Prawn::Document.generate(paid_stamp_pdf_file.path) do |pdf|
        if self.is_paid_by_trust? && self.submitted_to_trust_date.present?
          text = "Submitted to Trust - " + self.submitted_to_trust_date.strftime('%m/%d/%Y') + "\nPAID #{Date.parse(paid_on_date).strftime('%m/%d/%Y')}" + " - $#{'%.2f' % amount}" + payment_method_text 
        else
          text = "PAID #{Date.parse(paid_on_date).strftime('%m/%d/%Y')}" + " - $#{'%.2f' % amount}" + payment_method_text 
        end

        pdf.transparent(0.6) do
          pdf.fill_color "ff0000" 
          pdf.text text, :size => 30, style: :bold, align: :center, valign: :center
        end
      end

      # Stamp "PAID" to every page of the file
      paid_stamp = CombinePDF.load(paid_stamp_pdf_file.path).pages[0]
      
      URI.open(self.account_statement_file.blob.url) do |tmp_pdf_file|
        pdf = CombinePDF.load tmp_pdf_file.path
        pdf.pages.each {|page| page << paid_stamp}
        
        ActiveRecord::Base.transaction do
          if pdf.save tmp_pdf_file.path
            file_name = self.account_statement_file.filename
            self.account_statement_file.purge
            self.account_statement_file.attach(io: File.open(tmp_pdf_file.path), filename: file_name, content_type: 'application/pdf')
            self.update(is_paid: true, paid_date: paid_on_date, marked_paid_by_user_id: user.id)
            return true
          else
            return false
          end
        end  
      end 
    rescue Exception => e
      Rails.logger.error("Failed to mark statement ID #{self.id}: #{e.message}")
      return false
    end

Any help is greatly appreciated!

ruby 2.7.2

rails 6.1.1

prawn 2.4.0

combine_pdf 1.0.21

Edit:

Was able to replicated error, trying to load from file url

enter image description here

Occurs at line

enter image description here

Same error occurs when trying to parse downloaded file

enter image description here


Solution

  • For anyone else who sees this it was related to CombinePDF only parsing until it reaches what the metadata says the length, but some files lie about that so it causes them to fail and produce a RangeError: index out of range. Adding this work around, then using the relaxed option it adds fixed the issues for me, hopefully it merges into the gem itself soon.

    https://github.com/boazsegev/combine_pdf/issues/191