I would like to remove all apostrophes from an input String of English prose, but retain the original meaning and capitalisation, ie
What's the best/simplest way to achieve this in java?
There are some hard and fast rules for replacing contractions. Just have a method that performs those functions on your strings.
public String removeContractions(String inputString) {
inputString = inputString.replaceAll("n't", " not");
inputString = inputString.replaceAll("'re", " are");
inputString = inputString.replaceAll("'m", " am");
inputString = inputString.replaceAll("'ll", " will");
inputString = inputString.replaceAll("'ve", " have");
return inputString;
}
This will even preserve your possessives.
Of course, there are some contractions which are dependent upon context, such as he'd
. This could be "he could", "he would", "he had", etc., and as such is beyond simple replacement algorithms and more in the realm of machine learning.
public String removeControversialContractions(String inputString) {
inputString = inputString.replaceAll("'d", " would");
inputString = inputString.replaceAll("'s", "s");
return inputString;
}
Perhaps for the 's
you could check to see if the word containing it begins with a capital letter (indicating a name) and conditionally replace it with either s
or is
. However, this wouldn't catch normal contractions at the beginning of sentences, so...
If you want a simple and perfect approach, I'm not sure you'll get one. To do these more complicated things, you'll need either a large dictionary file which you constantly reference or machine learning techniques.