arraysbashshelldifference

How to tell if two arrays contain the same elements


Following are two arrays of strings

arr1=("aa" "bb" "cc" "dd" "ee")
echo ${#arr1[@]} //output => 5
arr2=("cc" "dd" "ee" "ff")
echo ${#arr2[@]} //output => 4

The difference of the two arrays is arr_diff=("aa" "bb" "ff") I can get the difference using the following and other methods from stackoverflow

arr_diff=$(echo ${arr1[@]} ${arr2[@]} | tr ' ' '\n' | sort | uniq -u)
OR
arr_diff=$(echo ${arr1[@]} ${arr2[@]} | xargs -n1 | sort | uniq -u)
echo ${arr_diff[@]} //output => (aa bb ff)

The point is not printing out the difference of the arrays, but getting the size of the difference array, so that I can validate if the two arrays have the same elements or not. However, if I try to query the size of the difference array, I get wrong answer.

echo ${#arr_diff[@]} //output => 1

I always get output as 1 irrespective of size of difference array (even when size is zero, i.e. both arr1 and arr2 have the same elements)


Solution

  • To get the different elements from 2 arrays you can use this awk:

    arr1=("aa" "bb" "cc" "dd" "ee")
    arr2=("cc" "dd" "ee" "ff")
    
    awk 'FNR == NR {
       arr[$1]
       next
    }
    {
       if ($1 in arr)
          delete arr[$1]
       else
          print $1
    }
    END {
       for (i in arr)
          print i
    }' <(printf '%s\n' "${arr1[@]}") <(printf '%s\n' "${arr2[@]}")
    
    ff
    aa
    bb
    

    Now to get the difference in an array use:

    read -ra diffarr < <(awk -v ORS=' ' 'FNR == NR {arr[$1]; next} {if ($1 in arr) delete arr[$1]; else print $1} END{for (i in arr) print i}' <(printf '%s\n' "${arr1[@]}") <(printf '%s\n' "${arr2[@]}"))
    
    # check diffarr content
    declare -p diffarr
    declare -a diffarr=([0]="ff" [1]="aa" [2]="bb")
    
    # print number of elements in diffarr
    
    echo "${#diffarr[@]}"
    3