javastringperformanceapache-stringutils

Understanding StringUtils.join performance decisions


I was looking at the implementation of Apache Commons' StringUtils.join method and stumbled upon a line I assume is thought for performance but I don't understand why they did it the way it is, with those specific values.

Here's the implementation:

public static String join(Object[] array, String separator, int startIndex, int endIndex) {
    if (array == null) {
        return null;
    }
    if (separator == null) {
        separator = EMPTY;
    }

    // endIndex - startIndex > 0:   Len = NofStrings *(len(firstString) + len(separator))
    //           (Assuming that all Strings are roughly equally long)
    int noOfItems = (endIndex - startIndex);
    if (noOfItems <= 0) {
        return EMPTY;
    }

    StringBuilder buf = new StringBuilder(noOfItems * 16); // THE QUESTION'S ABOUT THIS LINE

    for (int i = startIndex; i < endIndex; i++) {
        if (i > startIndex) {
            buf.append(separator);
        }
        if (array[i] != null) {
            buf.append(array[i]);
        }
    }
    return buf.toString();
}

My questions regard the StringBuilder buf = new StringBuilder(noOfItems * 16); line:


Solution

  • 16 is a slight over-estimate (presumably based on experience/statistics) of the expected average size of the strings-with-separator.

    Pre-allocating enough space to hold the entire result avoids replacing the backing array during execution with a larger (double the size) array and copying over the elements (which is an O(n) operation).

    Over estimating, even by quite a bit, to allocate a larger array is worth the cost if it avoids the replacement operation in most situations.