I have implemented the select/median of medians algorithm using the following as a reference http://www.ics.uci.edu/~eppstein/161/960130.html (this has previously been linked here Median of Medians in Java).
My code seems to work for small arrays (~100) and even works for arrays of size 100001 http://pastebin.com/mwRc4Hig (answer 5008), but then fails on an input array of size 10001 http://pastebin.com/YwVBmgDk (answer 4960, my code outputs 4958).
Note that the correct answers for the texts above are equivalent to sorting the array and returning the element at array[array.length / 2], regardless of whether the array size is even or odd.
I'm not sure how to debug this issue. The functionality seems arbitrary and I'm just lost. Here below is my code:
public class MedianOfMedians {
public static void main(String[] args) {
MedianOfMedians mds = new MedianOfMedians();
mds.run();
}
private void run() {
Scanner in = new Scanner(System.in);
int n = in.nextInt();
int[] numArray = new int[n];
for (int i = 0; i < n; i++) {
numArray[i] = in.nextInt();
}
int median = select(numArray, numArray.length / 2);
System.out.print(median);
}
private int select(int[] numArray, int k) {
if (numArray.length <= 10) {
int[] sorted = insertionSort(numArray);
return sorted[k];
}
int divCount = (numArray.length % 5 == 0) ? numArray.length / 5 - 1 : numArray.length / 5;
int[] medOfMed = new int[divCount + 1];
int counter = 0;
int[] subArray;
while (counter <= divCount) {
subArray = splitByFive(counter, divCount, numArray);
medOfMed[counter] = select(subArray, subArray.length / 2);
counter++;
}
int M = select(medOfMed, numArray.length / 10);
List<Integer> lt = new ArrayList<>();
List<Integer> eq = new ArrayList<>();
List<Integer> gt = new ArrayList<>();
for (int i : numArray) {
if (i < M) {
lt.add(i);
} else if (i == M) {
eq.add(i);
} else {
gt.add(i);
}
}
if (k < lt.size()) {
return select(createArray(lt), k);
} else if (k > lt.size() + eq.size()) {
return select(createArray(gt), k - lt.size() - eq.size());
} else {
return M;
}
}
private int[] splitByFive(int splitIter, int divisions, int[] toSplit) {
int numToCopy;
if (splitIter == divisions) {
numToCopy = toSplit.length - (5 * splitIter);
} else {
numToCopy = 5;
}
int[] subArray = new int[numToCopy];
System.arraycopy(toSplit, splitIter * 5, subArray, 0, numToCopy);
return subArray;
}
private int[] createArray(List<Integer> list) {
int[] result = new int[list.size()];
for (int i = 0; i < list.size(); i++) {
result[i] = list.get(i);
}
return result;
}
private int[] insertionSort(int[] numArray) {
for (int i = 1; i < numArray.length; i++) {
int j = i;
while (j - 1 >= 0 && numArray[j] < numArray[j - 1]) {
int temp = numArray[j];
numArray[j] = numArray[j - 1];
numArray[j - 1] = temp;
j--;
}
}
return numArray;
}
}
I don't have time to debug your code, but maybe I can offer a debugging technique for you to try yourself that's useful for recursive algorithms like this.
If there is an input that the algorithm fails on (and there is, as you found) then there is a smallest such input -- and the smaller this input, the easier it is to figure out what's going wrong. Because the algorithm is recursive, you have a nice way to isolate the first place that things go wrong: you can test that the result you are about to return from select()
is correct (using a slow, trusted method like copying the data to a temporary buffer, sorting it and then grabbing the half-way element) just before returning the value. Doing this will be much easier if you rearrange the function to use just a single return
statement, e.g.:
private int select(int[] numArray, int k) {
int knownCorrectAnswer = selectSlowlyButDefinitelyCorrectly(numArray, k);
int willReturn;
if (numArray.length <= 10) {
int[] sorted = insertionSort(numArray);
willReturn = sorted[k]; // Just remember what we will return
} else { // Need to add else branch here now
...
if (k < lt.size()) {
willReturn = select(createArray(lt), k);
} else if (k > lt.size() + eq.size()) {
willReturn = select(createArray(gt), k - lt.size() - eq.size());
} else {
willReturn = M;
}
} // End of inserted else branch
if (willReturn == knownCorrectAnswer) {
return willReturn;
} else {
yell("First problem occurs with numArray=<...> and k=<...>!");
}
}
yell()
should print out the entire problem instance and halt the program (e.g. by throwing an exception). The nice thing about this setup is that you know that when yell()
gets called, every call to select()
that has already completed was correct -- since if it wasn't, yell()
would have already been called and the program would have halted before now. So the output produced by yell()
is guaranteed to be the first (not necessarily the smallest, but often that also) subproblem in which things went wrong.