stringtclnumericalphanumericmemory-footprint

Expected memory footprint of numbers vs alphanumerics in Tcl


It is aid that in Tcl, "everything is a string". But then again, it is not exactly 100% exactly like that (hence the shimmering effect). My question is this: let us say that I have, in a list, the following values:

list "12" 14 2a "1a"

My expectation, that, if "everything is a string", the memory allocated for all 4 elements should be identical (by the spec/expected behaviour), as we are looking at 4 instances of strings which length is 2.
Is that a correct assumption?


Solution

  • Those values will probably start out as strings (it's not entirely certain that this will happen, due to the possibility of literal sharing). Let's take a close look (if you're doing this yourself, the tcl::unsupported::representation command is in 8.6 only and never modifies the object being looked at; I put a ;format x on that first line to ensure that we don't get any surprise conversions).

    % set value [list "12" 14 2a "1a"];format x
    x
    % tcl::unsupported::representation [lindex $value 0]
    value is a pure string with a refcount of 2, object pointer at 0x100874070, string representation "12"
    % tcl::unsupported::representation [lindex $value 1]
    value is a pure string with a refcount of 2, object pointer at 0x1008741c0, string representation "14"
    % tcl::unsupported::representation [lindex $value 2]
    value is a pure string with a refcount of 2, object pointer at 0x100874c40, string representation "2a"
    % tcl::unsupported::representation [lindex $value 3]
    value is a pure string with a refcount of 2, object pointer at 0x1008743d0, string representation "1a"
    % tcl::unsupported::representation $value
    value is a list with a refcount of 4, object pointer at 0x100874910, internal representation 0x1008eb050:0x0, string representation "12 14 2a 1a"
    

    So yes, those values are pretty much the same. There's no internal representation of any of them, and the string representation each is using is a buffer that was allocated to be 3 bytes long (one for each character, and one for a terminating NUL byte). However, the list itself has a string representation (as well as its internal list rep) despite not being asked for one; that's because you're using a list of literals and Tcl's compiler optimises that to a single literal as that's usually the Right Thing To Do.