javastringtext-processing

Removing contractions


I would like to remove all apostrophes from an input String of English prose, but retain the original meaning and capitalisation, ie

What's the best/simplest way to achieve this in java?


Solution

  • There are some hard and fast rules for replacing contractions. Just have a method that performs those functions on your strings.

    public String removeContractions(String inputString) { 
    
        inputString = inputString.replaceAll("n't", " not");
        inputString = inputString.replaceAll("'re", " are");
        inputString = inputString.replaceAll("'m", " am");
        inputString = inputString.replaceAll("'ll", " will");
        inputString = inputString.replaceAll("'ve", " have");
    
        return inputString;
    }
    

    This will even preserve your possessives.

    Of course, there are some contractions which are dependent upon context, such as he'd. This could be "he could", "he would", "he had", etc., and as such is beyond simple replacement algorithms and more in the realm of machine learning.

    public String removeControversialContractions(String inputString) {
    
        inputString = inputString.replaceAll("'d", " would");
        inputString = inputString.replaceAll("'s", "s");
    
        return inputString;
    }
    

    Perhaps for the 's you could check to see if the word containing it begins with a capital letter (indicating a name) and conditionally replace it with either s or is. However, this wouldn't catch normal contractions at the beginning of sentences, so...

    If you want a simple and perfect approach, I'm not sure you'll get one. To do these more complicated things, you'll need either a large dictionary file which you constantly reference or machine learning techniques.