I have built my Rails model schema to match the iphone contacts, which includes multi-value email associations, etc. I have a controller action that imports an entire contacts array (likely 1,000+ objects, each potentially containing multiple email objects). I need this to run fairly efficiently, so I was looking at activerecord-import for batch importing. However, I need to validate the uniqueness of the email within the scope of each contact so that I don't keep adding duplicates every time the batch is imported. Should I build my own version of update_attributes
by hand, or is there an existing solution that you might recommend for validating/updating lots of records like this?
Contact Model
class Contact > ActiveRecord::Base has_many :addresses has_many :emails has_many :websites accepts_nested_attributes_for :addresses, :emails, :websites attr_accessible :prefix, :first_name, :middle_name, :last_name, :suffix, :nickname, :organization, :job_title, :department, :birthday, :addresses_attributes, :emails_attributes, :websites_attributes end
Email Model
class Email > ActiveRecord::Base belongs_to :contact # validates_uniqueness_of :account, :scope => :contact_id # prevents duplicate, but also skips sibling values # validates :contact_id, :presence => true, :on => :create # causes 422 error validates :account, :presence => true, :format => /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i, :on => :create attr_accessible :contact_id, :email_id, :account, :label end
There is no built-in support in activerecord-import for doing this.
However, if you know both the contact_id and the email address at the time of import you can use a unique index on the contact_id and the email address. In MySQL, you can use the ON DUPLICATE KEY UPDATE support (activerecord-import does support this) to not import a duplicate, but to rather update the existing record. With MySQL you can also use INSERT IGNORE (activerecord-import supports this to) which will ignore any errors when importing records that cause index violations (this is another way to avoid duplicates).
If you are using another RDMS like PostgreSQL or SQLite then you will want to look at their docs to see how you can ignore violating key constraints (like INSERT IGNORE). I do not believe either support anything similar to ON DUPLICATE KEY UPDATE out of the box.
If you don't know both the contact_id and the email address at the time you want to import then you'll have to do a little upfront work in your code to have enough information about whether you'd be duplicating or creating.
Hope this helps.