java.netstring-literalsstring-table

Where do Java and .NET string literals reside?


A recent question about string literals in .NET caught my eye. I know that string literals are interned so that different strings with the same value refer to the same object. I also know that a string can be interned at runtime:

string now = DateTime.Now.ToString().Intern(); 

Obviously a string that is interned at runtime resides on the heap but I had assumed that a literal is placed in the program's data segment (and said so in my answer to said question). However I don't remember seeing this anywhere. I assume this is the case since it's how I would do it and the fact that the ldstr IL instruction is used to get literals and no allocation seems to take place seems to back me up.

To cut a long story short, where do string literals reside? Is it on the heap, the data segment or some-place I haven't thought of?


Edit: If string literals do reside on the heap, when are they allocated?


Solution

  • Strings in .NET are reference types, so they are always on the heap (even when they are interned). You can verify this using a debugger such as WinDbg.

    If you have the class below

       class SomeType {
          public void Foo() {
             string s = "hello world";
             Console.WriteLine(s);
             Console.WriteLine("press enter");
             Console.ReadLine();
          }
       }
    

    And you call Foo() on an instance, you can use WinDbg to inspect the heap.

    The reference will most likely be stored in a register for a small program, so the easiest is to find the reference to the specific string is by doing a !dso. This gives us the address of our string in question:

    0:000> !dso
    OS Thread Id: 0x1660 (0)
    ESP/REG  Object   Name
    002bf0a4 025d4bf8 Microsoft.Win32.SafeHandles.SafeFileHandle
    002bf0b4 025d4bf8 Microsoft.Win32.SafeHandles.SafeFileHandle
    002bf0e8 025d4e5c System.Byte[]
    002bf0ec 025d4c0c System.IO.__ConsoleStream
    002bf110 025d4c3c System.IO.StreamReader
    002bf114 025d4c3c System.IO.StreamReader
    002bf12c 025d5180 System.IO.TextReader+SyncTextReader
    002bf130 025d4c3c System.IO.StreamReader
    002bf140 025d5180 System.IO.TextReader+SyncTextReader
    002bf14c 025d5180 System.IO.TextReader+SyncTextReader
    002bf15c 025d2d04 System.String    hello world             // THIS IS THE ONE
    002bf224 025d2ccc System.Object[]    (System.String[])
    002bf3d0 025d2ccc System.Object[]    (System.String[])
    002bf3f8 025d2ccc System.Object[]    (System.String[])
    

    Now use !gcgen to find out which generation the instance is in:

    0:000> !gcgen 025d2d04 
    Gen 0
    

    It's in generation zero - i.e. it has just be allocated. Who's rooting it?

    0:000> !gcroot 025d2d04 
    Note: Roots found on stacks may be false positives. Run "!help gcroot" for
    more info.
    Scan Thread 0 OSTHread 1660
    ESP:2bf15c:Root:025d2d04(System.String)
    Scan Thread 2 OSTHread 16b4
    DOMAIN(000E4840):HANDLE(Pinned):6513f4:Root:035d2020(System.Object[])->
    025d2d04(System.String)
    

    The ESP is the stack for our Foo() method, but notice that we have a object[] as well. That's the intern table. Let's take a look.

    0:000> !dumparray 035d2020
    Name: System.Object[]
    MethodTable: 006984c4
    EEClass: 00698444
    Size: 528(0x210) bytes
    Array: Rank 1, Number of elements 128, Type CLASS
    Element Methodtable: 00696d3c
    [0] 025d1360
    [1] 025d137c
    [2] 025d139c
    [3] 025d13b0
    [4] 025d13d0
    [5] 025d1400
    [6] 025d1424
    ...
    [36] 025d2d04  // THIS IS OUR STRING
    ...
    [126] null
    [127] null
    

    I reduced the output somewhat, but you get the idea.

    In conclusion: strings are on the heap - even when they are interned. The interned table holds a reference to the instance on the heap. I.e. interned strings are not collected during GC because the interned table roots them.