pythonalgorithmkdb+hill-climbing

kdb/q Modified Climbing hill algo


I have a question about the methods for doing the climbing hill algorithm with a specific problem I have.

I have 2 metrics: Sum_x and Sum_y, and I have about a hundred rows in this format:

CategoryA CategoryB x y
A 10 1 4
A 12 3 1
B 10 7 4
C 17 3 3

where the combination of CategoryA and CategoryB are unique. I need to find which rows are included in Sum_x and Sum_y.

Do you know how I can do this? My current approach is to code each row in binary, starting with all rows as 0, and change each value from 0 to 1 to sum the x and y column, plotting the sums of x and y and finding which matches.

Is this the right approach? I'm trying to write this in Q for now, but I think even Python will work for this.


Solution

  • here's a solution using dynamic programming:

    find_subsets_summing_to_value: { [p_list; p_sum]
        len:count p_list;
        // dp[x][y] = 1b iff any of the first x elements of p_list sum to y
        dp: #[1+len; enlist (1b , p_sum#0b)] {: .[y; z; :; y[z[0]-1][z[1]] | y[z[0]-1][z[1] - x[z[0]-1]]]; }[p_list]/ cross[1+ til count p_list; 1 + til p_sum];
        backtrack: { [p_dp; p_list; p_subsets; p_current_subset; p_idx; p_target]
            if[ (p_idx=0)&p_target=0; : p_subsets , enlist p_current_subset; ];
            if[ (p_idx=0)| p_target<0; : p_subsets; ];
            if[ p_dp[p_idx][p_target];
                p_subsets: .z.s[p_dp; p_list; p_subsets; p_current_subset , p_list[p_idx-1]; p_idx - 1; p_target - p_list[p_idx - 1]];
                ];
            : .z.s[p_dp; p_list; p_subsets; p_current_subset; p_idx - 1; p_target]
            };
        : backtrack[dp; p_list; (); (); len; p_sum];
        };
    
    find_matches: { [p_tbl; p_sumX; p_sumY]
        : {distinct x where {y=sum x `y}[;y] each x}[;p_sumY] raze { {`CategoryA`CategoryB xasc y $[1=count x; enlist x; x]}[;y] each (cross/)group[y `x] x }[;p_tbl] each distinct find_subsets_summing_to_value[p_tbl `x; p_sumX];
        };
    
    demo_find_matches: { [p_tbl; p_sumX; p_sumY]
        show "***************************************";
        show p_tbl;
        -1 "SUM_X: " , string[p_sumX];
        -1 "SUM_Y: " , string[p_sumY];
        show "***************************************";
        show each find_matches[p_tbl; p_sumX; p_sumY];
        -1 "\n\n";
        };
    
    q)demo_find_matches[([] CategoryA:`A`A`B`C; CategoryB:10 12 10 17; x:1 3 7 3; y:4 1 4 3); 8; 8]
    "***************************************"
    CategoryA CategoryB x y
    -----------------------
    A         10        1 4
    A         12        3 1
    B         10        7 4
    C         17        3 3
    SUM_X: 8
    SUM_Y: 8
    "***************************************"
    CategoryA CategoryB x y
    -----------------------
    A         10        1 4
    B         10        7 4
    
    q)demo_find_matches[([] CategoryA:`A`A`B`B`C`C; CategoryB:1 2 1 2 1 2; x:1 2 2 3 4 5; y:7 5 4 2 9 5); 4; 9]
    "***************************************"
    CategoryA CategoryB x y
    -----------------------
    A         1         1 7
    A         2         2 5
    B         1         2 4
    B         2         3 2
    C         1         4 9
    C         2         5 5
    SUM_X: 4
    SUM_Y: 9
    "***************************************"
    CategoryA CategoryB x y
    -----------------------
    C         1         4 9
    CategoryA CategoryB x y
    -----------------------
    A         1         1 7
    B         2         3 2
    CategoryA CategoryB x y
    -----------------------
    A         2         2 5
    B         1         2 4