javadictionarysortingcollectionsjava-stream

Alternative of streams in sorting frequently occurred words


So, I have a method which takes List of Strings as an arguments and, reads it. Then sorts them by frequency, and if the words have the same frequency they are printed alphabetically. (take in fact that there are also Russian words, and they always go beneath English words).

Here is an example of a good output:

лицами-18
Apex-15
azet-15
xder-15
анатолю-15
андреевич-15
батальона-15
hello-13
zello-13
полноте-13

And here is my code:

public class Words {

public String countWords(List<String> lines) {

    StringBuilder input = new StringBuilder();
    StringBuilder answer = new StringBuilder();

    for (String line : lines){
        if(line.length() > 3){
            if(line.substring(line.length() - 1).matches("[.?!,]+")){
                input.append(line.substring(0,line.length()-1)).append(" ");
            }else{
                input.append(line).append(" ");
            }
        }
    }

    String[] strings = input.toString().split("\\s");

    List<String> list = new ArrayList<>(Arrays.asList(strings));

    Map<String, Integer> unsortMap = new HashMap<>();
    while (list.size() != 0){
        String word = list.get(0);
        int freq = Collections.frequency(list, word);
        if (word.length() >= 4 && freq >= 10){
            unsortMap.put(word.toLowerCase(), freq);
        }

        list.removeAll(Collections.singleton(word));
    }
    //The Stream logic is here
    List<String> sortedEntries = unsortMap.entrySet().stream()
            .sorted(Comparator.comparingLong(Map.Entry<String, Integer>::getValue)
                    .reversed()
                    .thenComparing(Map.Entry::getKey)
            )
            .map(it -> it.getKey() + " - " + it.getValue())
            .collect(Collectors.toList());
    
    //Logic ends here

    for (int i = 0; i < sortedEntries.size(); i++) {
        if(i<sortedEntries.size()-1) {
            answer.append(sortedEntries.get(i)).append("\n");
        }
        else{
            answer.append(sortedEntries.get(i));
        }
    }

    return answer.toString();

 }
}

My issue: Currently the code is working fine, and it gives successful results, however as you can see I am using streams to sort the strings. However, I am just interested if there is other solution to write my code without using streams. To be more precise is there any other way to sort Strings by frequency and then by alphabetic order (if they have same frequency), without using streams.


Solution

  • Anything you can do in streams you can do in conventional Java. But using streams usually makes for much shorter, simpler, and easier-to-read code!

    By the way, the first half of your code could be replaced with simply this:

    Map < String, AtomicInteger > map = new HashMap <>();
    for ( String word : words ) {
        map.putIfAbsent( word , new AtomicInteger( 0 ) );
        map.get( word ).incrementAndGet();
    }
    

    The second half of your code is reporting on a map by sorting first on value, then on key.

    That challenge is discussed in Questions, Sorting a HashMap based on Value then Key? and Sort a Map<Key, Value> by values. There are some clever solutions among those Answers, such as this one by Sean.

    But I would rather keep things simple. I would translate the map of our word and word-count to objects of our own custom class, each object holding the word and word-count as fields.

    Java 16+ brings the records feature, making such a custom class definition much easier. A record is a briefer way to write a class whose main purpose is to communicate data transparently and immutably. The compiler implicitly creates the constructor, getters, equals & hashCode, and toString.

    record WordAndCount (String word , int count ) {}
    

    Before Java 16, use a conventional class in place of that record. Here is the 33-line source-code equivalent of that record one-liner.

    final class WordAndCount {
        private final String word;
        private final int count;
    
        WordAndCount ( String word , int count ) {
            this.word = word;
            this.count = count;
        }
    
        public String word () { return word; }
    
        public int count () { return count; }
    
        @Override
        public boolean equals ( Object obj ) {
            if ( obj == this ) return true;
            if ( obj == null || obj.getClass() != this.getClass() ) return false;
            var that = ( WordAndCount ) obj;
            return Objects.equals( this.word , that.word ) && this.count == that.count;
        }
    
        @Override
        public int hashCode () {
            return Objects.hash( word , count );
        }
    
        @Override
        public String toString () {
            return "WordAndCount[" + "word=" + word + ", " + "count=" + count + ']';
        }
    }
    

    We make an array of objects of that record type, and populate.

    List<WordAndCount> wordAndCounts = new ArrayList <>(map.size()) ;
    for ( String word : map.keySet() ) {
        wordAndCounts.add( new WordAndCount( word, map.get( word ).get() ) );
    }
    

    Now sort. The Comparator interface has some handy factory methods where we can pass a method reference.

    wordAndCounts.sort(
            Comparator
                    .comparingInt( WordAndCount ::count )
                    .reversed()
                    .thenComparing( WordAndCount ::word )
    );
    

    Let’s pull all that code together.

    package work.basil.text;
    
    import java.util.*;
    import java.util.concurrent.atomic.AtomicInteger;
    
    public class EngRus {
        public static void main ( String[] args ) {
            // Populate input data.
            List < String > words = EngRus.generateText(); // Recreate the original data seen in the Question.
            System.out.println( "words = " + words );
    
            // Count words in the input list.
            Map < String, AtomicInteger > map = new HashMap <>();
            for ( String word : words ) {
                map.putIfAbsent( word , new AtomicInteger( 0 ) );
                map.get( word ).incrementAndGet();
            }
            System.out.println( "map = " + map );
    
            // Report on word count, sorting first by word-count numerically and then by word alphabetically.
            record WordAndCount( String word , int count ) { }
            List < WordAndCount > wordAndCounts = new ArrayList <>( map.size() );
            for ( String word : map.keySet() ) {
                wordAndCounts.add( new WordAndCount( word , map.get( word ).get() ) );
            }
            wordAndCounts.sort( Comparator.comparingInt( WordAndCount :: count ).reversed().thenComparing( WordAndCount :: word ) );
            System.out.println( "wordAndCounts = " + wordAndCounts );
        }
    
        public static List < String > generateText () {
            String input = """
                    лицами-18
                    Apex-15
                    azet-15
                    xder-15
                    анатолю-15
                    андреевич-15
                    батальона-15
                    hello-13
                    zello-13
                    полноте-13
                    """;
    
            List < String > words = new ArrayList <>();
            input.lines().forEach( line -> {
                String[] parts = line.split( "-" );
                for ( int i = 0 ; i < Integer.parseInt( parts[ 1 ] ) ; i++ ) {
                    words.add( parts[ 0 ] );
                }
            } );
            Collections.shuffle( words );
            return words;
        }
    }
    

    When run:

    words = [андреевич, hello, xder, батальона, лицами, полноте, анатолю, лицами, полноте, полноте, анатолю, анатолю, zello, hello, лицами, xder, батальона, Apex, xder, андреевич, анатолю, hello, xder, Apex, xder, андреевич, лицами, zello, полноте, лицами, Apex, батальона, zello, полноте, xder, hello, azet, батальона, zello, hello, полноте, Apex, полноте, полноте, azet, андреевич, полноте, Apex, анатолю, hello, azet, лицами, анатолю, zello, анатолю, Apex, zello, андреевич, лицами, xder, hello, полноте, zello, Apex, батальона, лицами, hello, azet, Apex, анатолю, анатолю, zello, полноте, анатолю, Apex, батальона, андреевич, лицами, андреевич, azet, azet, лицами, лицами, zello, azet, анатолю, xder, батальона, полноте, лицами, hello, лицами, xder, xder, лицами, zello, андреевич, батальона, лицами, андреевич, azet, полноте, hello, андреевич, лицами, hello, Apex, батальона, hello, azet, лицами, zello, батальона, анатолю, Apex, azet, xder, андреевич, андреевич, батальона, анатолю, батальона, Apex, xder, azet, azet, xder, azet, анатолю, Apex, батальона, Apex, Apex, лицами, батальона, xder, батальона, hello, андреевич, андреевич, azet, zello, андреевич, xder, azet, анатолю, zello]

    map = {андреевич=15, xder=15, zello=13, батальона=15, azet=15, лицами=18, анатолю=15, hello=13, Apex=15, полноте=13}

    wordAndCounts = [WordAndCount[word=лицами, count=18], WordAndCount[word=Apex, count=15], WordAndCount[word=azet, count=15], WordAndCount[word=xder, count=15], WordAndCount[word=анатолю, count=15], WordAndCount[word=андреевич, count=15], WordAndCount[word=батальона, count=15], WordAndCount[word=hello, count=13], WordAndCount[word=zello, count=13], WordAndCount[word=полноте, count=13]]