How can we prove that copy on write is applied to strings of more than 15 characters in Swift?

I wrote the following code in Xcode to try and prove copy on write in Swift:

func print(address o: UnsafeRawPointer) {
    print(o)
}


func longStringMemoryTest() {
    var longStr1 = "abcdefghijklmnopqr"
    var longStr2 = longStr1
    
    print(address: longStr1)
    print(address: longStr2)
    
    print("[append 'stu' to 'longStr2']")
    longStr2 += "stu"
    
    print(address: longStr1)
    print(address: longStr2)
    var test = "abcdefghijklmnopqr"
    print(address: test)
    print("[Fin]")
}

However, the console always prints the same address for longStr1 and longStr2, even though longStr1 has a value of "abcdefghijklmnopqr" and longStr2 has a value of "abcdefghijklmnopqrstu". I can't figure out what I'm missing in this code. Can you explain how to prove copy on write for strings in Swift and why the address is always the same for longStr1 and longStr2?

Solution

Short Answer

You have nothing to prove because String supports copy-on-write from the documentation and it works under the hood. For instance, we can find the following in the header of String.swift source file:

/// Performance Optimizations
/// =========================
///
/// Although strings in Swift have value semantics, strings use a copy-on-write
/// strategy to store their data in a buffer. This buffer can then be shared
/// by different copies of a string. A string's data is only copied lazily,
/// upon mutation, when more than one string instance is using the same
/// buffer. Therefore, the first in any sequence of mutating operations may
/// cost O(*n*) time and space.
///
/// When a string's contiguous storage fills up, a new buffer must be allocated
/// and data must be moved to the new storage. String buffers use an
/// exponential growth strategy that makes appending to a string a constant
/// time operation when averaged over many append operations.

Long Answer

First of all your code has a mistake:

var a = "a"
var b = "b"
print(address: a) // 0x00006000031f8d70
print(address: b) // 0x00006000031f8d70

Two different variables with different values have the same address! To fix the issue you should pass them as in-out parameters (like references) to get valid addresses:

print(address: &a) // 0x000000016f0c50a8
print(address: &b) // 0x000000016f0c5098

Now it works right, but:

a = b
print(address: &a) // 0x000000016f0c50a8
print(address: &b) // 0x000000016f0c5098

Variables have different addresses after the assignment because they are still allocated different parts of the stack memory for String structs. But at the same time they can share the common data because String works like a wrapper around a memory buffer.

We can access to the internal String's buffer by the following key path a._guts._object.nativeUTF8Start with lldb so let's try that in XCode Console while debugging line by line:

var a = "a"
// (lldb) p a._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R0 = 0x100000000000020 {} 
// Comment: a points to the buffer 0x100000000000020

a += "aa"
// (lldb) p a._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R1 = 0x300000000000020 {}
// Comment: a points to new buffer 0x300000000000020 after mutation

var b = a
// (lldb) p a._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R2 = 0x300000000000020 {}
// (lldb) p b._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R3 = 0x300000000000020 {}
// Comment: a and b point to the shared buffer 0x300000000000020 after assignment

b += "bb"
// (lldb) p a._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R4 = 0x300000000000020 {}
// (lldb) p b._guts._object.nativeUTF8Start
// (UnsafePointer<UInt8>) $R5 = 0x500000000000020 {}
// Comment: a continues pointing to the buffer 0x300000000000020,
// but b points to new buffer 0x500000000000020 after mutation

This test shows how copy-on-write works with String in action.