How to remove the multiple occurrences of words in a String
? The hard thing here is, I don't know which word it is. See below examples.
This is how how I tried to split a paragraph into a sentence sentence But, there is a problem My paragraph
includes
dates dates dates dates like Jan 13, 2014 , wordsincludes
like U S and numbers
Here, some words have multiple occurrence. Words like sentence
, dates
, includes
and how
have occurred more than once. Note than this repeat may not occur near to each other, like includes
. I want to remove these so it will be like below.
This is how I tried to split a paragraph into a sentence But, there is a problem My paragraph includes dates like Jan 13, 2014 , words like U S and numbers
Note that removing multi occurrence does not mean removing all occurrences of the multi occurred word. It will simply keep a one copy and remove the rest.
Just like the above, there will be very big String
s which I have no idea about which word has occurred more than once. How can I make this happen?
Copy the text one word at a time and ignore the duplicates along the way. Use a hashset to keep track of the duplicates.
Something like this...
String text = "This is how how I tried to split a paragraph into a sentence sentence But, there is a problem My paragraph includes dates dates dates dates like Jan 13, 2014 , words includes like U S and numbers";
StringBuilder result = new StringBuilder();
HashSet<String> set = new HashSet<String>();
for(String s : text.split(" ")) {
if (!set.contains(s)) {
result.append(s);
result.append(" ");
set.add(s);
}
}
System.out.println(result);
You'll have to touch it up a little to handle the punctuation properly, but that should get you started,.