javaarraysoptimization

How to efficiently remove duplicates from an array without using Set


I was asked to write my own implementation to remove duplicated values in an array. Here is what I have created. But after tests with 1,000,000 elements it took very long time to finish. Is there something that I can do to improve my algorithm or any bugs to remove ?

I need to write my own implementation - not to use Set, HashSet etc. Or any other tools such as iterators. Simply an array to remove duplicates.

public static int[] removeDuplicates(int[] arr) {

    int end = arr.length;

    for (int i = 0; i < end; i++) {
        for (int j = i + 1; j < end; j++) {
            if (arr[i] == arr[j]) {                  
                int shiftLeft = j;
                for (int k = j+1; k < end; k++, shiftLeft++) {
                    arr[shiftLeft] = arr[k];
                }
                end--;
                j--;
            }
        }
    }

    int[] whitelist = new int[end];
    for(int i = 0; i < end; i++){
        whitelist[i] = arr[i];
    }
    return whitelist;
}

Solution

  • Since this question is still getting a lot of attention, I decided to answer it by copying this answer from Code Review.SE:

    You're following the same philosophy as the bubble sort, which is very, very, very slow. Have you tried this?:

    • Sort your unordered array with quicksort. Quicksort is much faster than bubble sort (I know, you are not sorting, but the algorithm you follow is almost the same as bubble sort to traverse the array).

    • Then start removing duplicates (repeated values will be next to each other). In a for loop you could have two indices: source and destination. (On each loop you copy source to destination unless they are the same, and increment both by 1). Every time you find a duplicate you increment source (and don't perform the copy). @morgano