I have a matrix that is in the form of "obj obj distance" a toy matrix would look like this:
r = [[1,1,1,2,2,2,3,3,3] [1,2,3,1,2,3,1,2,3] [0,.5,.75,.75,0,.25,.5,.25,0]]
I currently have a solution along the lines of:
using NamedArrays
function DistanceMatrixFromTriColumn(r::AbstractArray)
Names = unique(vcat(r[:,1],r[:,2]))
Dist = NamedArray(zeros(length(Names),length(Names)),(Names,Names))
for row in Names
for col in Names
Dist[row,col] = r[(r[:,1] .== row) .& (r[:,2] .== col),:3][1]
end
end
return Dist
end
DistanceMatrixFromTriColumn(r)
which is really just a nested for loop that goes through the first column, and then goes through the second column. It outputs something like this:
3×3 Named Matrix{Float64}
A ╲ B │ 1.0 2.0 3.0
──────┼─────────────────
1.0 │ 0.0 0.5 0.75
2.0 │ 0.75 0.0 0.25
But I think there is a faster way, there should be a way to use Indexes to do something similar, no? I think I remember seeing someone do a solution like that in R once-upon-a-time.
For the case you have posted where the objects are reliably numbered as such, you can make this quite a lot faster with something along the lines of
function distancematrixfromtricolumn(r::AbstractArray)
names = unique(view(r,:,1))
dist = NamedArray(zeros(length(names),length(names)),(names,names))
for i = 1:size(r,1)
row = Int(r[i,1])
col = Int(r[i,2])
dist[row,col] = r[i,3]
end
return dist
end
You may also notice a few stylistic changes: I have switched a few names to lowercase to follow the Julia convention that functions and variables are lowercase whereas Modules and Types are CamelCase.
Comparing the two versions:
julia> distancematrixfromtricolumn(r)
3×3 Named Matrix{Float64}
A ╲ B │ 1.0 2.0 3.0
──────┼─────────────────
1.0 │ 0.0 0.5 0.75
2.0 │ 0.75 0.0 0.25
3.0 │ 0.5 0.25 0.0
julia> DistanceMatrixFromTriColumn(r) == distancematrixfromtricolumn(r)
true
julia> using BenchmarkTools
julia> @benchmark DistanceMatrixFromTriColumn($r)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min … max): 3.897 μs … 535.305 μs ┊ GC (min … max): 0.00% … 98.80%
Time (median): 4.863 μs ┊ GC (median): 0.00%
Time (mean ± σ): 5.828 μs ± 16.267 μs ┊ GC (mean ± σ): 9.92% ± 3.54%
▂▆▇█▇▇▅▃▁
▃▅█████████▇▇▇▆▅▅▆▅▅▄▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
3.9 μs Histogram: frequency by time 10.6 μs <
Memory estimate: 7.39 KiB, allocs estimate: 73.
julia> @benchmark distancematrixfromtricolumn($r)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.352 μs … 528.474 μs ┊ GC (min … max): 0.00% … 99.32%
Time (median): 1.543 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.821 μs ± 8.510 μs ┊ GC (mean ± σ): 8.04% ± 1.72%
▇█▄▁▁▁
███████▆▆▆▅▄▃▃▃▃▂▂▂▂▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁ ▂
1.35 μs Histogram: frequency by time 3.33 μs <
Memory estimate: 2.08 KiB, allocs estimate: 25.
While there are in principle a few other things one could consider to make the for
loop even faster, at this point much of our execution time and all of our allocations are coming from the call to unique
and the construction of the NamedArray
.