I wrote the following code in Xcode to try and prove copy on write in Swift:
func print(address o: UnsafeRawPointer) {
print(o)
}
func longStringMemoryTest() {
var longStr1 = "abcdefghijklmnopqr"
var longStr2 = longStr1
print(address: longStr1)
print(address: longStr2)
print("[append 'stu' to 'longStr2']")
longStr2 += "stu"
print(address: longStr1)
print(address: longStr2)
var test = "abcdefghijklmnopqr"
print(address: test)
print("[Fin]")
}
However, the console always prints the same address for longStr1 and longStr2, even though longStr1 has a value of "abcdefghijklmnopqr" and longStr2 has a value of "abcdefghijklmnopqrstu". I can't figure out what I'm missing in this code. Can you explain how to prove copy on write for strings in Swift and why the address is always the same for longStr1 and longStr2?
Short Answer
You have nothing to prove because String
supports copy-on-write from the documentation and it works under the hood. For instance, we can find the following in the header of String.swift
source file:
/// Performance Optimizations
/// =========================
///
/// Although strings in Swift have value semantics, strings use a copy-on-write
/// strategy to store their data in a buffer. This buffer can then be shared
/// by different copies of a string. A string's data is only copied lazily,
/// upon mutation, when more than one string instance is using the same
/// buffer. Therefore, the first in any sequence of mutating operations may
/// cost O(*n*) time and space.
///
/// When a string's contiguous storage fills up, a new buffer must be allocated
/// and data must be moved to the new storage. String buffers use an
/// exponential growth strategy that makes appending to a string a constant
/// time operation when averaged over many append operations.
Long Answer
First of all your code has a mistake:
var a = "a"
var b = "b"
print(address: a) // 0x00006000031f8d70
print(address: b) // 0x00006000031f8d70
Two different variables with different values have the same address! To fix the issue you should pass them as in-out parameters (like references) to get valid addresses:
print(address: &a) // 0x000000016f0c50a8
print(address: &b) // 0x000000016f0c5098
Now it works right, but:
a = b
print(address: &a) // 0x000000016f0c50a8
print(address: &b) // 0x000000016f0c5098
Variables have different addresses after the assignment because they are still allocated different parts of the stack memory for String
structs. But at the same time they can share the common data because String
works like a wrapper around a memory buffer.
We can access to the internal String's buffer by the following key path a._guts._object.nativeUTF8Start
with lldb
so let's try that in XCode Console while debugging line by line:
var a = "a"
// (lldb) p a._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R0 = 0x100000000000020 {}
// Comment: a points to the buffer 0x100000000000020
a += "aa"
// (lldb) p a._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R1 = 0x300000000000020 {}
// Comment: a points to new buffer 0x300000000000020 after mutation
var b = a
// (lldb) p a._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R2 = 0x300000000000020 {}
// (lldb) p b._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R3 = 0x300000000000020 {}
// Comment: a and b point to the shared buffer 0x300000000000020 after assignment
b += "bb"
// (lldb) p a._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R4 = 0x300000000000020 {}
// (lldb) p b._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R5 = 0x500000000000020 {}
// Comment: a continues pointing to the buffer 0x300000000000020,
// but b points to new buffer 0x500000000000020 after mutation
This test shows how copy-on-write works with String
in action.