javascriptregexsentencecase

Transform string of text using JavaScript


I am working on a code to transform a string of text into a Sentence case which would also retain Acronyms. I did explore similar posts in StackOverflow, however, I couldn't find the one which suits my requirement.

I have already achieved the transformation of Acronyms and the first letter in the sentence. however, I ran into other issues like some letters in the sentence are still in Uppercase, especially texts in and after Double Quotes (" ") and camelcase texts.

Below is the code I am currently working on, I would need someone to help me Optimize the code and to fix the issues.

String.prototype.toSentenceCase = function() {
  var i, j, str, lowers, uppers;
  str = this.replace(/(^\w{1}|\.\s*\w{1})/gi, function(txt) {
    return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
  });

  
  // Certain words such as initialisms or acronyms should be left uppercase
  uppers = ['Id', 'Tv', 'Nasa', 'Acronyms'];
  for (i = 0, j = uppers.length; i < j; i++)
    str = str.replace(new RegExp('\\b' + uppers[i] + '\\b', 'g'),
      uppers[i].toUpperCase());

 // To remove Special caharacters like ':' and '?'
    str = str.replace(/[""]/g,'');
    str = str.replace(/[?]/g,'');
    str = str.replace(/[:]/g,' - ');

return str;
}

Input: play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa.

Current Output: Play around - This is a String Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the ACRONYMS as it is like NASA.

Expected Output: Play around - this is a string of text, which needs to be converted to sentence case at the same time keeping the ACRONYMS as it is like NASA.


Solution

  • Here's a runnable version of the initial code (I have slightly modified the input string):

    String.prototype.toSentenceCase = function() {
      var i, j, str, lowers, uppers;
      str = this.replace(/(^\w{1}|\.\s*\w{1})/gi, function(txt) {
        return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
      });
    
      
      // Certain words such as initialisms or acronyms should be left uppercase
      uppers = ['Id', 'Tv', 'Nasa', 'Acronyms'];
      for (i = 0, j = uppers.length; i < j; i++)
        str = str.replace(new RegExp('\\b' + uppers[i] + '\\b', 'g'),
          uppers[i].toUpperCase());
    
     // To remove Special caharacters like ':' and '?'
        str = str.replace(/[""]/g,'');
        str = str.replace(/[?]/g,'');
        str = str.replace(/[:]/g,' - ');
    
    return str;
    }
    
    const input = `play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa. another sentence. "third" sentence starting with a quote.`
    const result = input.toSentenceCase()
    console.log(result)


    I ran into other issues like some letters in the sentence are still in Uppercase, especially texts in and after Double Quotes (" ") and camelcase texts.

    Some letters remain uppercased because you are not calling .toLowerCase() anywhere in your code. Expect in the beginning, but that regex is targetingonly the initial letters of sentences, not other letters.

    It can be helpful to first lowercase all letters, and then uppercase some letters (acronyms and initial letters of sentences). So, let's call .toLowerCase() in the beginning:

    String.prototype.toSentenceCase = function() {
      var i, j, str, lowers, uppers;
    
      str = this.toLowerCase();
    
      // ...
    
      return str;
    }
    

    Next, let's take a look at this regex:

    /(^\w{1}|\.\s*\w{1})/gi
    

    The parentheses are unnecessary, because the capturing group is not used in the replacer function. The {1} quantifiers are also unnecessary, because by default \w matches only one character. So we can simplify the regex like so:

    /^\w|\.\s*\w/gi
    

    This regex finds two matches from the input string:

    Both matches contain only one letter (\w), so in the replacer function, we can safely call txt.toUpperCase() instead of the current, more complex expression (txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase()). We can also use an arrow function:

    String.prototype.toSentenceCase = function() {
      var i, j, str, lowers, uppers;
    
      str = this.toLowerCase();
    
      str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
    
      // ...
    
      return str;
    }
    

    However, the initial letter of the third sentence is not uppercased because the sentence starts with a quote. Because we are anyway going to remove quotes and question marks, let's do it at the beginning.

    Let's also simplify and combine the regexes:

    // Before
    str = str.replace(/[""]/g,'');
    str = str.replace(/[?]/g,'');
    str = str.replace(/[:]/g,' - ');
    
    // After
    str = str.replace(/["?]/g,'');
    str = str.replace(/:/g,' - ');
    

    So:

    String.prototype.toSentenceCase = function() {
      var i, j, str, lowers, uppers;
    
      str = this;
    
      str = str.toLowerCase();
    
      str = str.replace(/["?]/g,'');
      str = str.replace(/:/g,' - ');
    
      str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
    
      // ...
    
      return str;
    }
    

    Now the initial letter of the third sentence is correctly uppercased. That's because when we are uppercasing the initial letters, the third sentence doesn't start with a quote anymore (because we have removed the quote).

    What's left is to uppercase acronyms. In your regex, you probably want to use the i flag as well for case-insensitive matches.

    Instead of using a for loop, it's possible to use a single regex to look for all matches and uppercase them. This allows us to get rid of most of the variables as well. Like so:

    String.prototype.toSentenceCase = function() {
      var str;
    
      str = this;
    
      str = str.toLowerCase();
    
      str = str.replace(/["?]/g,'');
      str = str.replace(/:/g,' - ');
    
      str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
    
      str = str.replace(/\b(id|tv|nasa|acronyms)\b/gi, (txt) => txt.toUpperCase());
    
      return str;
    }
    

    And looks like we are now getting correct results!

    Three more things, though:

    1. Instead of creating and mutating the str variable, we can modify this and chain the method calls.
    2. It might make sense to rename the txt variables to match variables, since they are regex matches.
    3. Modifying a built-in object's prototype is a bad idea. Creating a new function is a better idea.

    Here's the final code:

    function convertToSentenceCase(str) {
      return str
        .toLowerCase()
        .replace(/["?]/g, '')
        .replace(/:/g, ' - ')
        .replace(/^\w|\.\s*\w/gi, (match) => match.toUpperCase())
        .replace(/\b(id|tv|nasa|acronyms)\b/gi, (match) => match.toUpperCase())
    }
    
    const input = `play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa. another sentence. "third" sentence starting with a quote.`
    const result = convertToSentenceCase(input)
    console.log(result)