javajava-8java-stream

Get word frequencies from an array of string sentences using Java 8


I have the following array as input

String[] input = new String[] {
       "This is a sample string",
       " string ",                   // additional spaces here cause issues while splitting
       "Another sample string",
       "This is not    a sample string"
};

I need to count the frequencies of individual words. The required output is:

{a=2, not=1, string=4, This=2, is=2, sample=3, Another=1}

So far, I got somewhat of a working code:

// 1. Convert String[] into a single " " delimited String 
String joined = String.join(" ", input);

// 2. Split on " " and then calculate count using Collectors.groupingBy
Map <String, Long> output = 
        Arrays
            .stream(joined.split(" "))
            .filter(s -> !s.equals(""))    // To Deal with Empty Strings
            .collect(
                Collectors.groupingBy(
                    Function.identity(),
                    Collectors.counting()
                )
            );

System.out.println(output);

This looks very crude to me, please suggest a better way to do this using Streams API.


Solution

  • Your code looks mostly correct, Couple changes will make it work. Use String.split("\\s+") to split on any sequence of whitespace characters instead of splitting on space joined.split(" "). And with the current code any words with case sensitive will be treated as two different for example Sample and sample so either convert all to uppercase of lowercase if you want to get the counts with case-insensitive

     Map<String, Long> output = Arrays.stream(joined.split("\\s+"))
            .map(String::toLowerCase) // For case-insensitivity conversion if needed
            .filter(s -> !s.isEmpty()) // Filter out empty strings 
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));