scalaperformancedata-structurescollectionsscala-collections

Dedup List of case class


I have a list of case class :

case class MyClass(id1: String, id2: String, nb: Int)

val myList = List(
  MyClass("id1", "id2", 3),
  MyClass("id3", "id4", 4),
  MyClass("id2", "id1", 3), // <- Delete this one (dedup with MyClass("id1", "id2", 3))
  MyClass("id4", "id2", 4), // <- Delete this one (dedup with MyClass("id2", "id4", 4))
  MyClass("id5", "id6", 12)
)

myList.foldLeft(List[MyClass]()) {
  (acc, elem) =>
    if (myList.contains(MyClass(elem.id2, elem.id1, elem.nb))) {
      acc
    } else {
      acc :+ elem
    }
}

I want to delete duplicates if id1 and id2 is inverted with same nb value (as describe in code comment)

I tried with foldLeft but it's very slow (my list has 400k+ entries)

Anyone has a magic solution for this use case ?

Thanks !


Solution

  • If only ids are considered for equality - one option is to use distinctBy on "ordered" tuple:

    val result = myList.distinctBy(c => if (c.id1 > c.id2) (c.id1, c.id2) else (c.id2, c.id1))
    

    For case when all three members should be considered - I would "reorder" ids in MyClass and use distinct:

    val result = myList.map {
        case MyClass(id1, id2, v) if id1 > id2 => MyClass(id2, id1, v)
        case c => c
      }.distinct