I'm new to java and currently, I'm learning strings.
How to remove multiple words from a string?
I would be glad for any hint.
class WordDeleterTest {
public static void main(String[] args) {
WordDeleter wordDeleter = new WordDeleter();
// Hello
System.out.println(wordDeleter.remove("Hello Java", new String[] { "Java" }));
// The Athens in
System.out.println(wordDeleter.remove("The Athens is in Greece", new String[] { "is", "Greece" }));
}
}
class WordDeleter {
public String remove(String phrase, String[] words) {
String[] array = phrase.split(" ");
String word = "";
String result = "";
for (int i = 0; i < words.length; i++) {
word += words[i];
}
for (String newWords : array) {
if (!newWords.equals(word)) {
result += newWords + " ";
}
}
return result.trim();
}
}
Output:
Hello
The Athens is in Greece
I've already tried to use replacе here, but it didn't work.
Programmers often do this:
String sentence = "Hello Java World!";
sentence.replace("Java", "");
System.out.println(sentence);
=> Hello Java World
Strings are immutable, and the replace function returns a new string object. So instead write
String sentence = "Hello Java World!";
sentence = sentence.replace("Java", "");
System.out.println(sentence);
=> Hello World!
(the whitespace still exists)
With that, your replace function could look like
public String remove(String phrase, String[] words) {
String result = phrase;
for (String word: words) {
result = result.replace(word, "").replace(" ", " ");
}
return result.trim();
}
=> Hello World!
(the whitespace is curated)
Now this solution will remove all occurrences of your word within the phrase - whether it is a word or part of a word. As the OP commented, removing "is" from "This is Sparta" will result in "Th Sparta". To get around that make sure the word to be replaced is embedded between whitespace characters. This is a perfect situation to switch to regular expressions.
public String remove(String phrase, String[] words) {
String result = phrase;
for (String word: words) {
String regexp = "\\s" + word + "\\s";
result = result.replaceAll(regexp, " ");
}
return result.trim();
}
For explanation:
The pattern sequence \s
resembles a whitespace (space, tab, linefeed, ...). The double backslash is necessary for the Java compiler to not interprete a single backslash as escape character for something else. So the regular expression matches the word including the whitespaces before and after the word, and replaceAll is instructed to replace that match with a single space. Which also means the second call to remove double blanks is unnecessary now.
Here is a nice tutorial: https://docs.oracle.com/javase/tutorial/essential/regex/