javasortingbinaryradix-sortcounting-sort

Radix sort with counting sort incorrectly sorts letters converted to binary


I'm having trouble understanding why my code for radix sort with counting sort doesn't correctly sort the input when I convert it to binary. I'm basically using the same code for letters represented as decimal and they work just fine but here it isn't even close.

Below is the code that takes part in binary radix sort:

    static String[] countSort(String[] input, int position)
    {
        int[] count = new int[2];
        int n = input.length;

        char temp;
        for (String value : input) {
            temp = value.charAt(value.length()-1 - position);

            count[temp-'0']++;
        }

        for (int i = 1; i < 2; i++) {
            count[i] = count[i] + count[i - 1];
        }

        String[] output = new String[n];
        for (int i = n - 1; i >= 0; i--) {
            temp = input[i].charAt(input[i].length()-1 - position);

            output[count[temp-'0']-1] = input[i];
            count[temp-'0']--;
        }

        return output;
    }

    public static String[] radixSortBinary(String str, int stringLength) {

        //convert letters to binary
        char[] charArr = str.toCharArray();
        String[] array = new String[charArr.length];
        for (int i=0; i<charArr.length; i++)
            array[i] = Integer.toBinaryString(charArr[i]);

        System.out.println("Binary input:" + Arrays.toString(array));

        //iterate over each character position (starting from the least significant)
        for (int i = stringLength-1; i >= 0; --i) {
            array = countSort(array, i);
        }


        System.out.println("Binary output:" + Arrays.toString(array));

        //convert back to letters
        StringBuilder sb = new StringBuilder();
        for (int i=0; i<array.length; i++) {
            Arrays.stream(array[i].split("(?<=\\G.{7})")).forEach(s -> sb.append((char) Integer.parseInt(s, 2)));
            array[i] = sb.toString();
            sb.setLength(0);
        }

        return array;
    }

    public static void main(String[] args) {

    Scanner scan = new Scanner(System.in);
    String input2 = scan.next();

    String[] result = radixSortBinary(input2, 7);
    System.out.println("Output:" + Arrays.toString(result));
}

console:

input: 
ababababababa
Binary input:[1100001, 1100010, 1100001, 1100010, 1100001, 1100010, 1100001, 1100010, 1100001, 1100010, 1100001, 1100010, 1100001]
Binary output:[1100001, 1100010, 1100001, 1100010, 1100001, 1100010, 1100001, 1100010, 1100001, 1100010, 1100001, 1100010, 1100001]
Output:[a, b, a, b, a, b, a, b, a, b, a, b, a]

another case:

input: 
abcdefgdftglkgfdj
Binary input:[1100001, 1100010, 1100011, 1100100, 1100101, 1100110, 1100111, 1100100, 1100110, 1110100, 1100111, 1101100, 1101011, 1100111, 1100110, 1100100, 1101010]
Binary output:[1100100, 1100100, 1100100, 1110100, 1101100, 1100010, 1101010, 1100110, 1100110, 1100110, 1100001, 1100101, 1100011, 1101011, 1100111, 1100111, 1100111]
Output:[d, d, d, t, l, b, j, f, f, f, a, e, c, k, g, g, g]

Any help would be greatly appreciated!


Solution

  • The bug is here:

    //iterate over each character position (starting from the least significant)
    for (int i = stringLength-1; i >= 0; --i) {
        array = countSort(array, i);
    }
    

    That code does not do what the comment says it does. Actually this iterates over each character position starting from the most significant character. There is such as thing as MSB-radix sort, but that's different.

    Reversing that loop made it work correctly for me.

    By the way you should probably ensure that all binary strings have the same length, padded with leading zeroes if necessary. Working directly on the bits of the integer representations of characters would not require that as an explicit step.