iosswift5

Is it correct that copy on write is applied to (all) the "original" array if a change is made to that array while there happens to be a copy extant?


Say

struct Teste: Codable {
    let a: Int
    lazy var x: Int = {
        print("I just changed this Teste item")
        return a + 1000
    }()
}

and you

var hugeData: [Teste] // 750 million of these

So it's something like your data from the cloud, held in a singleton or such.

Say then you happen to

   let see: [Teste] = hugeData.filter{ .. }
   // 4 of these. we'll say one of them happens to be hugeData[109333072]

maybe to display in a view or such. Next! While see exists, you happen to for some reason

let width = hugeData[109333072].x

At that moment, it has to copy-due-to-write all of hugeData, just because the stupid little see exists at that time.

It only just occurred to me that a huge danger (if you deal with massive amounts of data), that if something changes in the big original array, while you just happen to have some pissant little copy-on-write array in existence, it's a huge performance woe.

(And/or, as mentioned in the bullet point, maybe Swift deals with this by actually doing the copy-due-to-write on the little one??)

In short, is it correct that copy on write is applied to (all) the "original" array if a change is made to that array while there happens to be a copy extant?

Here's a short self-explanatory demo ...

struct Teste: Codable {
    let a: Int
    lazy var x: Int = {
        print("I just calculated it", a + 1000)
        return a + 1000
    }()
}
    print("yo")
    tt = [Teste(a:7), Teste(a:8), Teste(a:9)]
    let tt2 = tt.filter{ $0.a > 8 }
    print(tt)
    print(tt2)
    print("yes, still nil in tt2")
    print("go..", tt[2].x)
    print("go again..", tt[2].x)
    print(tt)
    print(tt2)
    print("yes, copy on write was? applied to the original huge array")
    print("ie YES, still nil in tt2")
    let tt3 = tt.filter{ $0.a > 8 }
    print(tt)
    print(tt3)
    print("yes, the set value seems to carry through to tt3")

result

yo
[.Teste(a: 7, $__lazy_storage_$_x: nil), .Teste(a: 8, $__lazy_storage_$_x: nil), .Teste(a: 9, $__lazy_storage_$_x: nil)]
[.Teste(a: 9, $__lazy_storage_$_x: nil)]
yes, still nil in tt2
I just calculated it 1009
go.. 1009
go again.. 1009
[.Teste(a: 7, $__lazy_storage_$_x: nil), .Teste(a: 8, $__lazy_storage_$_x: nil), .Teste(a: 9, $__lazy_storage_$_x: Optional(1009))]
[.Teste(a: 9, $__lazy_storage_$_x: nil)]
yes, copy on write was? applied to the original huge array
ie YES, still nil in tt2
[.Teste(a: 7, $__lazy_storage_$_x: nil), .Teste(a: 8, $__lazy_storage_$_x: nil), .Teste(a: 9, $__lazy_storage_$_x: Optional(1009))]
[.Teste(a: 9, $__lazy_storage_$_x: Optional(1009))]
yes, the set value seems to carry through to tt3

So in such a case would it copy the 750 million item array?


Solution

  • Array CoW is the mechanism in which, when you modify an array, and that array shares its storage with another array, the contents of that storage will be copied, and the two arrays will no longer share their storages, because they each have their own copy.

    var x = [1,2,3]
    var y = x // now x and y share storage
    x.append(4) // now the shared storage is copied
    

    The situation you have described does not involve arrays that share storage. filter returns a new array that has the filtered elements. This new array cannot possibly share the same storage as the old array, because they have different elements. In this case, see will have a storage containing 4 elements.

    So CoW semantics are really irrelevant here. There are no arrays that share storage. You just have two arrays, that each have their own separate storages.

    So the line

    let width = hugeData[109333072].x
    

    Will only change hugeData and will not cause any copies


    A clarification on what "share storage" means:

    The Array struct contains a reference to some class type that is in the heap. This is where the array's elements are stored. When you copy the value of the Array struct value like in var y = x, this reference is copied. i.e. x._storage === y._storage. This is what I mean by "shares the same storage".

    When you modify y in some way, y._storage will be assigned to some other class instance (that is a copy of the elements in the storage), and it is no longer the case that x._storage === y._storage.