ruby-on-railspostgresqlactiverecordupsertactiverecord-import

use activerecord-import to upsert only when values have changed


How to use activerecord-import to upsert only when values have changed? Body: I'm using the activerecord-import gem in a Rails 6 app to bulk upsert data into a PostgreSQL database. Here's an example of how I'm using it:

User.import(
  [:id, :name, :age],
  [
    [1, "Alice", 30],
    [2, "Bob", 25]
  ],
  on_duplicate_key_update: {
    conflict_target: [:id],
    columns: [:name, :age]
  }
)

This works as expected for inserting and updating based on id. However, I noticed that even if the values for a given row haven't changed, PostgreSQL still performs an UPDATE. For performance reasons (especially with large datasets), I want to avoid unnecessary updates and only update if any of the fields have actually changed.

Is there a way to do this using activerecord-import? Ideally something like:

WHERE users.name IS DISTINCT FROM EXCLUDED.name OR users.age IS DISTINCT FROM EXCLUDED.age

Can activerecord-import support this kind of conditional update? If so, what would the syntax look like?


Solution

  • It appears (Docs) you can supply a condition argument.

    :condition

    The :condition attribute optionally specifies a WHERE condition on :conflict_action. Only rows for which this expression returns true will be updated. Note that it’s evaluated last, after a conflict has been identified as a candidate to update.

    For example (using your proposed condition):

    User.import(
      [:id, :name, :age],
      [
        [1, "Alice", 30],
        [2, "Bob", 25]
      ],
      on_duplicate_key_update: {
        conflict_target: [:id],
        columns: [:name, :age]
      }
      condition: "users.name IS DISTINCT FROM EXCLUDED.name OR users.age IS DISTINCT FROM EXCLUDED.age"
    )
    

    This condition is a straight pass through SQL String, so any valid WHERE condition should play nicely (Source)