godeduplication

Can we write a generic array/slice deduplication in go?


Is there a way to write a generic array/slice deduplication in go, for []int we can have something like (from http://rosettacode.org/wiki/Remove_duplicate_elements#Go ):

func uniq(list []int) []int {
  unique_set := make(map[int] bool, len(list))
  for _, x := range list {
     unique_set[x] = true
   }
  result := make([]int, len(unique_set))
  i := 0
  for x := range unique_set {
     result[i] = x
    i++
  }
  return result
}

But is there a way to extend it to support any array? with a signature like:

func deduplicate(a []interface{}) []interface{}

I know that you can write that function with that signature, but then you can't actually use it on []int, you need to create a []interface{} put everything from the []int into it, pass it to the function then get it back and put it into a []interface{} and go through this new array and put everything in a new []int.

My question is, is there a better way to do this?


Solution

  • While VonC's answer probably does the closest to what you really want, the only real way to do it in native Go without gen is to define an interface

    type IDList interface {
       // Returns the id of the element at i
       ID(i int) int
    
       // Returns the element
       // with the given id
       GetByID(id int) interface{}
    
       Len() int
    
       // Adds the element to the list
       Insert(interface{})
    }
    
    // Puts the deduplicated list in dst
    func Deduplicate(dst, list IDList) {
        intList := make([]int, list.Len())
        for i := range intList {
            intList[i] = list.ID(i)
        }
    
        uniques := uniq(intList)
        for _,el := range uniques {
            dst.Insert(list.GetByID(el))
        }
    }
    

    Where uniq is the function from your OP.

    This is just one possible example, and there are probably much better ones, but in general mapping each element to a unique "==able" ID and either constructing a new list or culling based on the deduplication of the IDs is probably the most intuitive way.

    An alternate solution is to take in an []IDer where the IDer interface is just ID() int. However, that means that user code has to create the []IDer list and copy all the elements into that list, which is a bit ugly. It's cleaner for the user to wrap the list as an ID list rather than copy, but it's a similar amount of work either way.