arraystcl

Reducing size of array while in loop using TCL


I have an array where the key is an element ID and the value a set of xyz coordinates. I have to find pairs of elements that are closer than a certain limit to each other. To do this I have two nested for-each loops where i calculate the distance between the first and all other elements and so on. Each time i find a pair matching the distance condition, save it to a separate list-of-pairs.This of course takes a lot of time. In order to speed the process up I remove the elements which had a match from the array using unset. This way the array to go thorough gets smaller and smaller. Now I know that it is not recommended to modify an array while iterating through it. What would be a better way to complete this task? Here is my code:

puts "Finding element pairs where distance from center to center is < ply_t..."
foreach { EID_1 xyz_1 } [array get d_EIDxyz] {

    set x1 [lindex $xyz_1 0]
    set y1 [lindex $xyz_1 1]
    set z1 [lindex $xyz_1 2]


    foreach {EID_2 xyz_2} [array get d_EIDxyz] {
        
        set x2 [lindex $xyz_2 0]
        set y2 [lindex $xyz_2 1]
        set z2 [lindex $xyz_2 2]
    
    
        set dis [expr {sqrt(($x2 - $x1) ** 2 + ($y2 - $y1) ** 2 + ($z2 - $z1) ** 2)}]
    
        if { $dis < $ply_t && $dis > 0 } {
            lappend pairs [list $EID_1 $EID_2]
            # remove the pair IDs found from the array to speed up the search
            catch { unset d_EIDxyz($EID_1) }
            catch { unset d_EIDxyz($EID_2) }
        }       
    
    }
}

puts "pairs: $pairs"

Solution

  • Each of your loops here is iterating over the key/value pairs returned by array get. This array get operation is run once when you start each loop, the key/value list it returns will not be affected by deleting elements from the array after this.

    So removing the matched elements from the array will not speed things up. Also if it did affect the iteration I think you could then miss correct results - if you had three points A,B,C which were all within the specified distance, matching A and B would remove them from consideration and you would then fail to find pairs A,C and B,C.

    It might help to first check the distance on each dimension separately - if any of the x, y or z distances is greater than the threshold you can skip doing the (probably slower) exact calculation.

    Also the three lindex operations can be replaced by a single lassign. So the loops would become:

    foreach { EID_1 xyz_1 } [array get d_EIDxyz] {
    
        lassign $xyz_1 x1 y1 z1
    
        foreach {EID_2 xyz_2} [array get d_EIDxyz] {
            
            lassign $xyz_2 x2 y2 z2
    
            if {abs($x2 - $x1) > $ply_t} continue
            if {abs($y2 - $y1) > $ply_t} continue
            if {abs($z2 - $z1) > $ply_t} continue
            if {$x1==$x2 && $y1==$y2 && $z1==$z2} continue
        
            set dis [expr {sqrt(($x2 - $x1) ** 2 + ($y2 - $y1) ** 2 + ($z2 - $z1) ** 2)}]
        
            if { $dis < $ply_t } {
                lappend pairs [list $EID_1 $EID_2]
            }       
        }
    }
    

    Update later: Here's a further thought. Taking the sqrt is likely to be the slowest operation, but you can avoid that completely since:

    sqrt( x**2 + y**2 + z**2 ) < ply_t <=> ( x**2 + y**2 + z**2 ) < ply_t**2

    So you could just do:

    set ply_t_sq [expr {$ply_t ** 2}]
    
    foreach { EID_1 xyz_1 } [array get d_EIDxyz] {
    
        lassign $xyz_1 x1 y1 z1
    
        foreach {EID_2 xyz_2} [array get d_EIDxyz] {
            
            lassign $xyz_2 x2 y2 z2
    
            set dis_sq [expr {($x2 - $x1) ** 2 + ($y2 - $y1) ** 2 + ($z2 - $z1) ** 2}]
        
            if { $dis_sq < $ply_t_sq && $dis_sq > 0 } {
                lappend pairs [list $EID_1 $EID_2]
            }       
        }
    }