javascripttypescriptalgorithmviterbi

How can I optimize my viterbi algorithm for chord detection?


I am building a Digital Audio Workstation in the likes of FL Studio and have a chord detection algorithm. It works as it is but is extremely inefficient which I need to optimize for.

First we start with the input:

[
    ['C', 'E'],
    ['E', 'G'],
    ['G']
]

Note that these are in order, and that for demonstrations sake, the first two E's and G's are the same note because at each note change we add an item to the list:

G          --------
E --------------
C ---------

The alogrithm I chose is the Viterbi algorithm in which I have base chord templates which I then combine with each of the 12 notes in this case thats 72 templates:

// Base chords; example after combination: Cmaj, Cmin .... BMaj, Bmin
const chordList: ChordInformation[] = [
    { label: 'maj', keys: [0,4,7], bias: -0.2 },
    { label: 'min', keys: [0,3,7], bias: -0.2 },
    { label: 'dim', keys: [0,3,6], bias:  0.0 },
    { label: 'maj7', keys: [0,4,7,11], bias: -0.1 },
    { label: '7',    keys: [0,4,7,10], bias:  0.0 },
    { label: 'm7',   keys: [0,3,7,10], bias:  0.0 },
];

Then we go through the viterbi algorithm, testing each chord before (72) with each chord after (72), calculate the best one and choose that as the chord interpretation:

// N === templates.length
for (let newSpace = 0; newSpace < N; newSpace++) {
    let bestScore = Infinity, prevSpaceOfBest = -1;
     for (let prevSpace = 0; prevSpace < N; prevSpace++) {
          const candidate = this.chordPath[prevX][prevSpace].bestScore
               + (prevSpace === newSpace ? 0 : this.params.switchCost)
               + this.chordScores[x][newSpace];
          if (candidate < bestScore) { 
               bestScore = candidate; 
               prevSpaceOfBest = prevSpace; 
           }
      }
      this.chordPath[x][newSpace] = {bestScore: bestScore, prevSpace: prevSpaceOfBest};  
}

As you can see even with only 6 chords, its already > 5000 (72x72) operations for each calculation, and although I maintain the arrays as a global object, when a note changes at one position, all positions after must have their values updated. This means in a project with 1000 buckets input, If bucket number 15 changes, then buckets 15-1000 must be updated which is too costly.

Heres what Ive tried:

  1. Calculating just chord templates and combine with notes afterwards. This initially works, we initially compute each emit score with all 12 offsets making it O(n) and store the best offset for later, then the chordPath scales with the much smaller chordList length in this case 6, so 6x6 for each interpretation. This break when two chords have the same calculated score. There might be a solution here with backtracking but I couldnt find one. The 72x72 algorithm works where this fails because it always calcuates each path all the way to the end whereas the 6x6 has to only allow one at each portion. This tie happens often for instance a bucket with ['C'] will score the same for Fmaj, Cmaj, F#dim and so on.
  2. Pruning unlikely chord transitions. Im not going to do this because musically its not sound

It seems to me that theres no way onward with this algorithm. I have considered looking for a Transformer approach but I dont know where I could find one trained and open source.


Solution

  • Ok I've figured it out. The approach works as follows: 1. prune buckets to only calculate top K elements (this doesn't actually filter out complex harmonies because something like C-E-G should never be interpreted as Dbmaj7 etc. therefore we don't need it). For 6 chords and K=3 this now brings the complexity to 18x18.

    As I would like to add more chords and this would scale the time complexity exponentially, I improved upon it further. To further improve the time complexity, we use core chords and add extensions on top. Below you can see the core chord tones of major positively impact it the most, with extensions only slightly positively impacting it.

    { label: 'maj', keys: [0,4,7], bias: -0.2, extensions: {
        11: .5, // 7
        2: .25 // 9
    } },
    

    Because these extensions will only contribute to the viterbi scores and dont actually get tracked we have to go back through the the note buckets outside of the viterbi after the interpretations and build the exact chord name. For instance if its a Cmajor7 chord, the viterbi will identify it a Cmaj and then I will go back through the note buckets manually and add the 7 when I see the B in the bucket.

    This brings the total time complexity to 12x12 (maj, min, dim, aug). This implementation needs some polish but I think its the best solution for my use case.