.netgenericsprofilingclrclr-profiling-api

ICorProfilerCallback::ClassUnloadStarted not called for a generic class, even though the class was unloaded


I'm currently debugging my company's CLR profiler (over ASP.NET 4.7.3282.0, .NET framework 4.7.2), and seeing a scenario in which the CLR unloads a generic class, but the ClassUnloadStarted callback is not called.

In a nutshell, our profiler keeps track of loaded classes based on ClassIDs, following the ClassLoadStarted, ClassLoadFinished and ClassUnloadStarted callbacks. At some point, the class gets unloaded (along with its relevant module), but the ClassUnloadStarted callback is not called for the relevant ClassID. Therefore, we're left with a stall ClassID, thinking that the class is still loaded. Later on, when we try to query that ClassID, the CLR unsurprisingly crashes (since it now points to junk memory).

My question, considering the detailed scenario below:

I couldn't find any documentation or reasoning regarding this behaviour specifically, of ClassUnloadStarted not being called. No hints I could find in the CoreCLR code, too. Thanks in advance for any help!

The Detailed Scenario:

This is the class in question (IComparable(T) with T=ClassFromModuleFoo):

System/IComparable`1<ClassFromModuleFoo>

While the application runs, the issue manifests after some modules have been unloaded.
Here's the exact load/unload callbacks flow, based on debug prints added:

  1. The class System/IComparable'1(ClassFromModuleFoo), of mscorlib, is loaded.
  2. Immediately afterwards, the class ClassFromModuleFoo, of the module Foo, is loaded into assembly #1.
  3. Module Foo finishes to load into assembly #1.
  4. Then, module Foo is loaded again into a different assembly, #2.
  5. The IComparable and ClassFromModuleFoo are loaded again, this time in assembly #2. Now there are two instances of each class: one in Foo loaded in assembly #1, and one in Foo loaded in assembly #2.
  6. Module Foo begins to unload from assembly #1.
  7. ClassUnloadStarted callback is called for ClassFromModuleFoo in assembly #1.
  8. Module Foo finished to unload from assembly #1.
  9. ClassUnloadStarted is not called for System/IComparable'1(ClassFromModuleFoo) of assembly #1 anytime later (even though its module unloaded and its ClassID points to now thrashed memory).

Some additional information:

Edit:

Thanks to my very smart colleague, I was able to reproduce the issue with a small example project, that simulates this scenario by loading and unloading of AppDomains. Here it is:
https://github.com/shaharv/dotnet/tree/master/testers/module-load-unload

The crash occurs for this class in the test, which is unloaded, and for which the CLR didn't call the unload callback:

Loop/MyGenList`1<System/String>

Here's the relevant code, which is loaded and unloaded a few times:

namespace Loop
{
    public class MyGenList<T>
    {
        public List<T> _tList;

        public MyGenList(List<T> tList)
        {
            _tList = tList;
        }
    }

    class MyGenericTest
    {
        public void TestFunc()
        {
            MyGenList<String> genList = new MyGenList<String>(new List<string> { "A", "B", "C" });

            try
            {
                throw new Exception();
            }
            catch (Exception)
            {

            }
        }
    }
}

At some point, the profiler crashes trying to query the ClassID of that class - thinking it's still valid, since the unload callback was not called for it.

On a side note, I tried porting this example to .NET Core for investigating further, but couldn't figure out how, since .NET Core doesn't support secondary AppDomains (and I'm not very sure it supports on-demand assembly unloading in general).


Solution

  • After making it possible in .Net Core (unloading wasn't supported before 3.0), we managed to replicate it (thanks valiano!). It is confirmed to be a bug by coreclr team (https://github.com/dotnet/coreclr/issues/26126).

    From davmason's explanation:

    There are three separate types involved and each callback is only giving you two (but a different set of two).

    Plugin.MyGenList1: the unbound generic type Plugin.MyGenList1 : the generic type bound to thecanonical type (used for normal references) Plugin.MyGenList1 : the generic type bound to System.String. For ClassLoadStarted we have logic that that specifically excludes unbound generic types (i.e. Plugin.MyGenList1) from being shown to the profiler in ClassLoader::Notify

    This means you ClassLoadStarted only gives you callbacks for the canonical and string instances. This seems the right thing to do here, since as a profiler you would only care about bound generic types and there's nothing of interest for unbound ones.

    The issue is that we do a different set of filtering for ClassUnloadStarted. That callback occurs inside EEClass::Destruct, and Destruct is only called on non-generic types, unbound generic types, and canonical generic types. Non-canonical generic types ( i.e. Plugin.MyGenList1 ) are skipped.