I have the following array as input
String[] input = new String[] {
"This is a sample string",
" string ", // additional spaces here cause issues while splitting
"Another sample string",
"This is not a sample string"
};
I need to count the frequencies of individual words. The required output is:
{a=2, not=1, string=4, This=2, is=2, sample=3, Another=1}
So far, I got somewhat of a working code:
// 1. Convert String[] into a single " " delimited String
String joined = String.join(" ", input);
// 2. Split on " " and then calculate count using Collectors.groupingBy
Map <String, Long> output =
Arrays
.stream(joined.split(" "))
.filter(s -> !s.equals("")) // To Deal with Empty Strings
.collect(
Collectors.groupingBy(
Function.identity(),
Collectors.counting()
)
);
System.out.println(output);
This looks very crude to me, please suggest a better way to do this using Streams API.
Your code looks mostly correct, Couple changes will make it work. Use String.split("\\s+")
to split on any sequence of whitespace characters instead of splitting on space joined.split(" ")
. And with the current code any words with case sensitive will be treated as two different for example Sample
and sample
so either convert all to uppercase of lowercase if you want to get the counts with case-insensitive
Map<String, Long> output = Arrays.stream(joined.split("\\s+"))
.map(String::toLowerCase) // For case-insensitivity conversion if needed
.filter(s -> !s.isEmpty()) // Filter out empty strings
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));