google-apps-scriptgoogle-docsword-count

Word Count by Foreground Colors in Docs Apps Script


I'm trying to write a script with Google Apps Script that returns word counts per color.

I wrote code that returns overall word count and code that returns foreground color per paragraph. It's my understanding that the paragraphs holds the foreground color property but the text doesn't. I'm having difficulty combining my two functions to return word count per a given color. Below is what I have. Any suggestions or direction would help.

  function myFunction() {

  var doc = DocumentApp.openByUrl("https://docs.google.com/document/d/1wi0EEWZECyn2Q7B0XWY3D6WKJ2TNZ_XzN47LTT8yWng/edit");
  var docBody = doc.getBody();
  var text = docBody.getText();
  var paragraph = docBody.getParagraphs();

// Return word count of text.

  if (text.length === 0) 
        return 0;

  text = text.replace(/\r\n|\r|\n/g, " ");
  var replacePunctuation = text.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()"?“”]/g," ");
  var finalString = replacePunctuation.replace(/\s{2,}/g," ");
  var count = finalString.trim().split(/\s+/).length; 
  Logger.log(count);
  
//Identify all paragraph colors 

  var i;
  var color;
  for (i = 0; i < paragraph.length; i++) { 
    color = paragraph[i].getForegroundColor();
    Logger.log("paragraph " + i + ": " + color);
  }

//Identify individual colors of characters is a given paragraph
  
  var i;
  var color;
  for (i = 0; i < paragraph[0].getText().length; i++) { 
    var color = paragraph[0].editAsText().getForegroundColor(i);
    Logger.log("character" + i + ": " + color);
  }

  }

Solution

  • The issue

    The main issue of your case scenario is that DocumentApp only has two methods for getting the foreground color: getForegroundColor() and getForegroundColor(offset). The first only return a hexadecimal color value if all the text in the element (that being in your case the paragraph) has the same background color, otherwise it will return null. The second will return the foreground color of each character in the text. Thus, there is no method for returning the color of a word.

    The workaround

    To count the amount of words in each foreground color, you could first split all the text into separate words and store them into an array with their starting indices in the text, assuming a word will only have a single color.

    Then, iterate over a for loop over the word array and get the foreground color of the first character of each word and add 1 to an array that has the counts for each colors.