javaapache-stringutilsstring-utils

String truncation in Java based on some conditions


My requirement is to truncate the string with max count and based on some conditions. The conditions are,

For the above requirement I write the code, but it fails when the string have two consecutive characters like , and ; . My code follows,

    public String getTruncateText(String text, int count) {
        int textLength = text.length();
        String truncatedText = text.substring(0, Math.min(count, textLength)).trim();
        int index = StringUtils.lastIndexOfAny(truncatedText,
                new String[] {" ",".",",",";",":","-","。","、",":",",",";"});
        return truncatedText.substring(0, index > 0 ? index : truncatedText.length()) + "...";
    }

    @Test
    public void Test() {
        String text = "I want to truncate text, Test"; 
        assertThat(getTruncateText(text, 15)).isEqualTo("I want to..."); //Success
        assertThat(getTruncateText(text, 25)).isEqualTo("I want to truncate text..."); //Success
        assertThat(getTruncateText(text, 1)).isEqualTo("I..."); //Success
        assertThat(getTruncateText(text, 2)).isEqualTo("I..."); //Success
        assertThat(getTruncateText(text, 300)).isEqualTo("I want to truncate text..."); //Failed
    }

Since I am new to JAVA world, apologies for the bad code... :)

Thanks in advance. Cheers!!!


Solution

  • You might need StringUtil.LastIndexOfAnyBut which returns the index of the last character that is NOT in a given set of characters.

    I have fixed the solution, and there are several things to point out here.

        public static String getTruncateText(String text, int count) {
            String truncatedText = text.substring(0, Math.min(count, text.length())).trim();
            String[] endCharacters = new String[] {" ",".",",",";",":","-","。","、",":",",",";"};
            
            int index = StringUtils.lastIndexOfAny(truncatedText, endCharacters);
            truncatedText = truncatedText.substring(0, index >= 0 ? index : truncatedText.length());
            
            // Find index of the a non-ending character in the reversed string
            int indexReversed = StringUtils.indexOfAnyBut(StringUtils.reverse(truncatedText), String.join("", endCharacters)) ;
            // Subtracting index for reversed string from (the length of the truncate string - 1)
            index = indexReversed >= 0 ? truncatedText.length() - indexReversed - 1 : -1;
            
            // Because we want to include the character in this index, the end index for the substring is added by 1
            return truncatedText.substring(0, index >= 0 ? index + 1 : 0) + "...";
        }
    

    Originally, you used index > 0 here, however, when index=0, "..." is supposed to be return, but index > 0 will cause it to go to the else branch, giving the length of the text as the end index and returning the full string instead.

    int index = StringUtils.lastIndexOfAny(truncatedText, endCharacters);
    truncatedText = truncatedText.substring(0, index >= 0 ? index : truncatedText.length());
    

    Having done some research, I found that people DID try to implement lastIndexOfAnyBut for the library, but it has never been added to any released version. For more information, you can check out this thread.

    So I use indexOfAnyBut instead on the reversed truncatedText to find the first occurence of a non-ending character before the ending character (e.g. "I want to truncate text, Test"). Note that String.join is used here as indexOfAnyBut doesn't accept String[] as the second argument.

    // Find index of the a non-ending character in the reversed string
    int indexReversed = StringUtils.indexOfAnyBut(StringUtils.reverse(truncatedText), String.join("", endCharacters)) ;
    // Subtracting index for reversed string from (the length of the truncate string - 1)
    index = indexReversed >= 0 ? truncatedText.length() - indexReversed - 1 : -1;