javascriptsplitreducefrequencyword-count

Javascript Custom Word or String Frequency Counter


I've been looking at many similar questions, but none of the solutions (as far as I could find) apply to my specific use case.

Explanation / Situation

I have been logging my key input for the last couple of weeks. Just for fun, I am experimenting with custom keyboard layouts and so I wanted some chart or heatmap to begin with reflecting my key input frequency.

This is a sample of the keylogger data:

[left-cmd]sdfdsfs[esc][left-ctrl]e[left-ctrl]xytial -f /var/log/ke[tab][return][up][left][left][left][left][left][del][del]ai[return][left-cmd]  [left-cmd][left-cmd]6[del]3[left][left][right][del][del]10[left]a[left]a[esc][left-option] [left-option]gst[left-ctrl]e[left-ctrl]hgst[return]g[left-option] replacepasswordhere[left-option][left-cmd]r[left-cmd][right-shift]c[right-shift]ure[del]

Progression

I mapped out all the characters that I want to see the frequency of (basically all characters/keys). I tried some things that ended up not being really useful. Also, the answers to the word frequency counter questions are all some variation of this:

function wordFreq(string) {
    var words = string.replace(/[.]/g, '').split(/\s/);
    var freqMap = {};
    words.forEach(function(w) {
        if (!freqMap[w]) {
            freqMap[w] = 0;
        }
        freqMap[w] += 1;
    });

    return freqMap;
}

The thing is that I don't have a clear separation between my keys, so as far as I know a simple .split() won't get me far.

Current approach

I have been trying to find some way of splitting the string of input text into a useful array. I've spent hours trying to do this that I am almost starting to hate the word reduce😂.

I don't expect a typed-out answer but just want to ask if there's a way to split this the way I want or maybe a completely different approach, that I could look into. (I'll do my own research, I just need some advice or opinions from other developers to get me started) I am basically blinded by tunnel vision trying to split the string how I need it.

This is what I am trying to get the array to look like, by taking a section of the sample mentioned in the beginning in mind:

const keys = ['[left-cmd]', 's', 'd', 'f', 'd', 's', 'f', 's', '[esc]', '[left-ctrl]', 'e', '[left-ctrl]', 'x', 'y', 't', 'i', 'a', 'l',' ', '-', 'f', ' ', '/', 'v', 'a', 'r', '/', 'l', 'o', 'g', '/', 'k', 'e', '[tab]', '[del]', '[return]', '[up]']

Solution

  • I'm reading the strings having 2 possible states: normal char, or inside a bracket char. If it's normal char I place it as token. If it's inside a bracket I aggregate it until I find a closing bracket. Then I place that as a token. Until end of string.

    var str = "[left-cmd]sdfdsfs[esc][left-ctrl]e[left-ctrl]xytial -f /var/log/ke[tab][return][up][left][left][left][left][left][del][del]ai[return][left-cmd]  [left-cmd][left-cmd]6[del]3[left][left][right][del][del]10[left]a[left]a[esc][left-option] [left-option]gst[left-ctrl]e[left-ctrl]hgst[return]g[left-option] replacepasswordhere[left-option][left-cmd]r[left-cmd][right-shift]c[right-shift]ure[del]"
    
    function tokenize(str) {
      var state = "normal";
      var tokens = [];
      var bucket = null;
      for (var i = 0; i < str.length; i++) {
        var c = str[i];
        if (state == "normal") {
          if (c == '[') {
            state = "brackets";
            bucket = []
            continue;
          }
          tokens.push(c);
        }
        if (state == "brackets") {
          if (c == ']') {
            state = "normal";
            tokens.push('[' + bucket.join("") + ']');
            continue;
          }
          bucket.push(c);
        }
      }
      return tokens;
    }
    
    console.log(tokenize(str))