javaarraysmatrixdynamic-programmingsequence-alignment

global sequence alignment dynamic programming finding the minimum in a matrix


I have 2 sequences, AACAGTTACC and TAAGGTCA, and I'm trying to find a global sequence alignment. I managed to create a 2D array and create the matrix, and I even filled it with semi-dynamic approach.

Here is my code to fill the matrix:

void process() {
    for (int i = 1; i <= sequenceA.length; i++) {
        for (int j = 1; j <= sequenceB.length; j++) {
            int scoreDiag = opt[i-1][j-1] + equal(i, j);
            int scoreLeft = opt[i][j-1] - 1;
            int scoreUp = opt[i-1][j] - 1;
            opt[i][j] = Math.max(Math.max(scoreDiag, scoreLeft), scoreUp);
        }
    }
}

private int equal(int i, int j) {
    if (sequenceA[i - 1] == sequenceB[j - 1]) {
        return 1;
    } else {
        return -1;
    }
}

My main problem is that this code generates this output:

 0   -1   -2   -3   -4   -5   -6   -7   -8     
-1   -1    0   -1   -2   -3   -4   -5   -6      
-2   -2    0    1    0   -1   -2   -3   -4     
 -3   -3   -1    0    0   -1   -2   -1   -2      
 -4   -4   -2    0   -1   -1   -2   -2    0      
 -5   -5   -3   -1    1    0   -1   -2   -1       
-6   -4   -4   -2    0    0    1    0   -1      
-7   -5   -5   -3   -1   -1    1    0   -1       
-8   -6   -4   -4   -2   -2    0    0    1      
 -9   -7   -5   -5   -3   -3   -1    1    0      
-10   -8   -6   -6   -4   -4   -2    0    0 

But I want it to look like this (all i care are numbers in the pictures):

enter image description here

I got to apply penaltys: per mismatch 1 and per gap 2, if its match 0.


Solution

  • There are several things that you need to modify:

    1. Note that in the image you give us the alignment goes from the bottom-right corner to the top-left corner. So in that image they are not really aligning AACAGTTACC and TAAGGTCA, but CCATTGACAA and ACTGGAAT.
    2. You say that you want a global alignment, but you actually compute a local alignment. The main difference is in penalties at the beginning of the sequences. In a global alignment you have to compute insertions and deletions at the first row and columns.
    3. Third, you are not applying correctly the penalties you mention. Instead, you always penalize with -1 and reward with +1.
    4. In the example image they are not taking the maximum value at each position, but the minimum (this is because your penalties are positive and the rewards is 0, not the other way around, so you want to minimize the values).

    The full solution is:

    // Note that these sequences are reversed!
    String sequenceA ="CCATTGACAA";
    String sequenceB = "ACTGGAAT";
    
    // The penalties to apply
    int gap = 2, substitution = 1, match = 0;
    
    int[][] opt = new int[sequenceA.length() + 1][sequenceB.length() + 1];
    
    // First of all, compute insertions and deletions at 1st row/column
    for (int i = 1; i <= sequenceA.length(); i++)
        opt[i][0] = opt[i - 1][0] + gap;
    for (int j = 1; j <= sequenceB.length(); j++)
        opt[0][j] = opt[0][j - 1] + gap;
    
    for (int i = 1; i <= sequenceA.length(); i++) {
        for (int j = 1; j <= sequenceB.length(); j++) {
            int scoreDiag = opt[i - 1][j - 1] +
                    (sequenceA.charAt(i-1) == sequenceB.charAt(j-1) ?
                        match : // same symbol
                        substitution); // different symbol
            int scoreLeft = opt[i][j - 1] + gap; // insertion
            int scoreUp = opt[i - 1][j] + gap; // deletion
            // we take the minimum
            opt[i][j] = Math.min(Math.min(scoreDiag, scoreLeft), scoreUp);
        }
    }
    
    for (int i = 0; i <= sequenceA.length(); i++) {
        for (int j = 0; j <= sequenceB.length(); j++)
            System.out.print(opt[i][j] + "\t");
        System.out.println();
    }
    

    The result is just as in the example you gave us (but reversed, remember!):

    0   2   4   6   8   10  12  14  16  
    2   1   2   4   6   8   10  12  14  
    4   3   1   3   5   7   9   11  13  
    6   4   3   2   4   6   7   9   11  
    8   6   5   3   3   5   7   8   9   
    10  8   7   5   4   4   6   8   8   
    12  10  9   7   5   4   5   7   9   
    14  12  11  9   7   6   4   5   7   
    16  14  12  11  9   8   6   5   6   
    18  16  14  13  11  10  8   6   6   
    20  18  16  15  13  12  10  8   7
    

    So the final alignment score is found at opt[sequenceA.length()][sequenceB.length()] (7). If you really need to show the reversed matrix as in the image, do this:

    for (int i = sequenceA.length(); i >=0; i--) {
        for (int j = sequenceB.length(); j >= 0 ; j--)
            System.out.print(opt[i][j] + "\t");
        System.out.println();
    }