I have a problem which I have been able to solve using Stata, but now my data has grown to a size that I cannot process it from memory any more. I hope to do this in MySQL instead. I am trying to calculate the Manhattan distance of items between n groups. I have manipulated the data so far that I hope it is ready to work for the computation:
SELECT * FROM exampleshares;
+----------+-------------+-------------+
| item | group | share |
+----------+-------------+-------------+
| A | group1 | .3 |
| B | group1 | .7 |
| A | group2 | .2 |
| B | group2 | .6 |
| C | group2 | .2 |
| A | group3 | .3 |
| C | group3 | .6 |
+----------+-------------+-------------+
The Manhattan Distance of this example would be :
+----------+-------------+-------------+
| groupX | groupY | M distance |
+----------+-------------+-------------+
| group1 | group1 | 0 |
| group1 | group2 | .4 |
| group1 | group3 | 1.3 |
| group2 | group1 | .4 |
| group2 | group2 | 0 |
| group2 | group3 | 1.1 |
| group3 | group1 | 1.3 |
| group3 | group2 | 1.1 |
| group3 | group3 | 0 |
+----------+-------------+-------------+
For example, the distance between group1 and group2 is computed as |.3-.2|+|.7-.6|+|0-.2|=0.4, ie. the sum of the absolute difference in shares. How do I do this in MySQL?
During my search if found a couple of solutions to calculating the difference to a previous row by group, but nothing to what I am looking for specifically.
I believe you will have to use a stored routine or some other script to accomplish this. Here is a stored routine that will do it:
delimiter //
drop procedure if exists manhattanDistance//
create procedure manhattanDistance (in startGroup char(32), in endGroup char(32), out manhattanDistance decimal(2,1))
not deterministic
reads sql data
begin
drop table if exists tmp_items;
create temporary table tmp_items as select distinct item from exampleshares;
select sum(abs(ifnull(es1.share, 0) - ifnull(es2.share, 0))) into manhattanDistance
from tmp_items ti
left join exampleshares es1 on es1.item = ti.item and es1.group = startGroup
left join exampleshares es2 on es2.item = ti.item and es2.group = endGroup;
end//
delimiter ;
call manhattanDistance('group1', 'group2', @distanceBetweenGroup1And2);
select @distanceBetweenGroup1And2;