ruby-on-railsdatabasepostgresqlactiverecordsidekiq

Rails unique validation didn't work and background jobs


I have an application with a model named appointment. On this model, there is a column with the name event_uid and a validation like the following:

validates :event_uid, uniqueness: true, allow_nil: true

The unique validation is only on rails application and not in the database (postgresql).

I am using background job with sidekiq on heroku to sync some remote calendars. I am not sure what happened, but it seems like I got multiple records with duplicate event_uid values. They have been created in the exact same second.

My guess is that something happened on the workers and for some reason they got invoked at the same time or the queue frozen and when it got back it ran the same job twice. I don't understand why rails let the above pass (maybe because workers run on different threads plays a role?). I added the following migration:

add_index :appointments, [:event_uid], unique: true

With the hope that it won't happen again. Ok so now the questions:


Solution

  • The Rails uniqueness validation has been reason for confusion a long time.

    When you persist a user instance, Rails will validate your model by running a SELECT query to see if any user records already exist with the provided email. Assuming the record proves to be valid, Rails will run the INSERT statement to persist the user.

    https://thoughtbot.com/blog/the-perils-of-uniqueness-validations

    This means, if you have several workers / threads selecting at the same time they will all return false and insert the record.

    Most of the time it is desirable to have an index on database level to avoid these race conditions too. However, you need to now also handle any ActiveRecord::RecordNotUnique exception.

    What do you think, will this be enough?

    Yes, adding an index is a good idea but now you need to also handle ActiveRecord::RecordNotUnique.

    Is it dangerous to allow unique / presence validations to exist only on application level if you are using create / update with background jobs?

    This depends on the application but most of the time you want to have an index on db level too.

    Any guess what could have caused the workers to run the same job more than one and exactly the same second?

    Most background job libraries only guarantee that at least one job gets enqueued but not exactly one. Your jobs should always be idempotent (can run several times). A good read is this guide about ActiveJob design, especially the part about idempotency.