javascriptperformancerandomdynamically-generatedmaintainability

What better way to design the built-in stats function of a sentence generator?


Context : a random sentence generator

  1. the function generateSentence() generates random sentences returned as strings (works fine)

  2. the function calculateStats() outputs the number of unique strings the above function can theoretically generate (works fine also in this mockup, so be sure to read the disclaimer, I don't want to waste your time)

  3. the function generateStructure() and the words lists in Dictionnary.lists are constantly growing as time passes

Quick mockup of the main generator function :

function generateSentence() {
  var words = [];
  var structure = generateStructure();

  structure.forEach(function(element) {
    words.push(Dictionnary.getElement(element));
  });

  var fullText = words.join(" ");
  fullText = fullText.substring(0, 1).toUpperCase() + fullText.substring(1);
  fullText += ".";
  return fullText;
}

var Dictionnary = {
  getElement: function(listCode) {
    return randomPick(Dictionnary.lists[listCode]);
  },
  lists: {
    _location: ["here,", "at my cousin's,", "in Antarctica,"],
    _subject: ["some guy", "the teacher", "Godzilla"],
    _vTransitive: ["is eating", "is scolding", "is seeing"],
    _vIntransitive: ["is working", "is sitting", "is yawning"],
    _adverb: ["slowly", "very carefully", "with a passion"],
    _object: ["this chair", "an egg", "the statue of Liberty"],
  }
}

// returns an array of strings symbolizing types of sentence elements
// example : ["_location", "_subject", "_vIntransitive"]
function generateStructure() {
  var str = [];

  if (dice(6) > 5) {// the structure can begin with a location or not
    str.push("_location");
  }

  str.push("_subject");// the subject is mandatory

  // verb can be of either types
  var verbType = randomPick(["_vTransitive", "_vIntransitive"]);
  str.push(verbType);

  if (dice(6) > 5) {// adverb is optional
    str.push("_adverb");
  }

  // the structure needs an object if the verb is transitive
  if (verbType == "_vTransitive") {
    str.push("_object");
  }

  return str;
}

// off-topic warning! don't mind the implementation here,
// just know it's a random pick in the array
function randomPick(sourceArray) {
  return sourceArray[dice(sourceArray.length) - 1];
}

// Same as above, not the point, just know it's a die roll (random integer from 1 to max)
function dice(max) {
  if (max < 1) { return 0; }
  return Math.round((Math.random() * max) + .5);
}

At some point I wanted to know how many different unique strings it could output and I wrote something like (again, very simplified) :

function calculateStats() {// the "broken leg" function I'm trying to improve/replace
  var total = 0;
  // lines below : +1 to account for 'no location' or 'no adverb'
  var nbOfLocations = Dictionnary.lists._location.length + 1;
  var nbOfAdverbs = Dictionnary.lists._adverb.length + 1;

  var nbOfTransitiveSentences = 
    nbOfLocations *
    Dictionnary.lists._vTransitive.length *
    nbOfAdverbs *
    Dictionnary.lists._object.length;
  var nbOfIntransitiveSentences =
    nbOfLocations *
    Dictionnary.lists._vIntransitive.length *
    nbOfAdverbs;

  total = nbOfTransitiveSentences + nbOfIntransitiveSentences;
  return total;
}

(Side note : don't worry about namespace pollution, type checks on input parameters, or these sort of things, this is assumed to be in a bubble for the sake of example clarity.)

Important disclaimer : This is not about fixing the code I posted. This is a mock-up and it works as it is. The real question is "As the complexity of possible structures grow in the future, as well as the size and diversity of the lists, what would be a better strategy to calculate stats for these types of random constructions, rather than my clumsy calculateStats() function, which is hard to maintain, likely to handle astronomically big numbers*, and prone to errors?"

* in the real tool, there are 351 120 unique structures at this moment, and for sentences... the total number has exceeded (10 power 80) for some time already.


Solution

  • Since your structure for sentences change quite a lot (it does change in this small example I can't imagine how much does it change in the actual code), I would do something similar to this:

    First I will need to save in a way all the possible sentence structures that exist for a given Dictionary... Maybe I would create a Language object that has a Dictionary as a property, and I could add the possible sentence structures (this part probably could be optimized and find a more procedural way of generating all the possible sentence structures, like a rule engine). What do you mean by sentence structure? Well, following your example I will call sentence structure to the next:

    [ 'location', 'transitive-verb', 'adverb', 'object' ] < - Transitive sentence
    [ 'location', 'instransitive-verb', 'adverb' ] <- Intransitive sentence
    

    You could probably find a way of generate this structures... or hardcode them.

    But... why do I think that could improve the way you calculate the stats? Because you minimize the hardcoding of each number of sentences by using a map / reduce operation and make it more extensible.

    So... how?

    Imagine that we have our structures accessible in the global scope or through an object, or in the dictionary itself:

    // Somewhere in the code
    const structures = [
      [ 'location', 'transitive-verb', 'adverb', 'object' ],
      [ 'location', 'instransitive-verb', 'adverb' ] 
    ];
    ...
    // In this example I just passed it as an argument
    function calculateStats(structures) {
      const numberOfCombinations = structures.reduce((total, structure) => {
          // We should calculate the number of combinations a structure has
          const numberOfSentences = structure.reduce((acc, wordType) => {
              // For each word type, we access the list and get the lenght (I am not doing safety checks for any wordType)
              return acc + Dictionary.lists[wordType].length
          }, 0);//Initial accumulator
    
          return total + numberOfSentences;
      }, 0); // Initial accumulator
      return numberOfCombinations;
    }
    

    So we would use the power of iterating through different structures instead of hard coding each possible combination, so you basically only need to add structures and your calculateStats function shouldn't grow.

    If you need to do more complex calculations you would need to change the functions used in the reducers.

    I have a very small knowledge about grammar or syntactic analysis, so probably you, with more knowledge, could find ways of make it simpler or do "smarter calculations".

    I took the freedom to write it in ES6-ish style, if reduce is a strange animal for you, you can read more here or use the lodash / ramda / whatever ^^