rubypaper-trail-gemredaction

How to redact information from Paper Trail's versions?


For the EU's GDPR compliance (user privacy), we need to redact personally identifiable information form the versions of our records. I've come up with something that seems to work, but figure I should ask if there's an established way to do this.

class User < ActiveRecord::Base
  has_paper_trail
end

user = User.create! name: 'Josh'
user.update_attributes name: 'Josh2'
user.update_attributes name: 'Josh3'
user.destroy!

def self.get_data
  PaperTrail::Version.order(:id).where(item_id: 1).map { |ver| [ver.event, ver.object, ver.object_changes] }
end

# =====  BEFORE  =====
get_data
# => [["create", nil, {"id"=>[nil, 1], "name"=>[nil, "Josh"]}],
#     ["update", {"id"=>1, "name"=>"Josh"}, {"name"=>["Josh", "Josh2"]}],
#     ["update", {"id"=>1, "name"=>"Josh2"}, {"name"=>["Josh2", "Josh3"]}],
#     ["destroy", {"id"=>1, "name"=>"Josh3"}, nil]]

PaperTrail::Version.where_object_changes(name: 'Josh').each do |ver|
  ver.object['name'] = 'REDACTED' if ver.object && ver.object['name'] == 'Josh'
  if oc = ver.object_changes
    oc['name'] = oc['name'].map { |name| name == 'Josh' ? 'REDACTED' : name }
    ver.object_changes = oc
  end
  ver.save!
end

# =====  AFTER  =====
get_data
# => [["create", nil, {"id"=>[nil, 1], "name"=>[nil, "REDACTED"]}],
#     ["update",
#      {"id"=>1, "name"=>"REDACTED"},
#      {"name"=>["REDACTED", "Josh2"]}],
#     ["update", {"id"=>1, "name"=>"Josh2"}, {"name"=>["Josh2", "Josh3"]}],
#     ["destroy", {"id"=>1, "name"=>"Josh3"}, nil]]

UPDATE: Actually, I'm going to need to scope the record by an association, as well, so my example isn't sufficient.


Solution

  • For the EU's GDPR compliance (user privacy), we need to redact personally identifiable information form the versions of our records. I've come up with something that seems to work, but figure I should ask if there's an established way to do this.

    No, as of today, 2018-05-30, there is no built-in feature or documented solution for GDPR redaction.

    PaperTrail provides many ways to iterate over, and to query records in the versions table. where_object_changes is one such feature, but it generates some pretty complicated SQL.

    where_object_changes(name: 'Joan')
    
    SELECT "versions".*
    FROM "versions"
    WHERE .. ("versions"."object_changes" LIKE '%
    name:
    - Joan
    %' OR "versions"."object_changes" LIKE '%
    name:
    -%
    - Joan
    %')
    

    You may, justifiably, have concerns about the correctness of this query. In fact, as of PT 9.0.0, using where_object_changes to read YAML from a text column raises an error to that effect. Reading JSON from text or from a json/b column is still allowed.

    Anyway, if I've succeeded in making you wary of such complicated SQL then you should choose a simpler approach, perhaps iterating over all of the version records for that user (user.versions.find_each)