javascalamemory-leaksprofilingyourkit

What does it mean Objects Retained by Inner Class Back References


newbi using profiler, I am using yourkit. I see in inspections possible memory leak

Objects Retained by Inner Class Back References
Find objects retained via synthetic back reference of its inner classes.
Problem: Such objects are potential memory leaks.

what does it means ? can someone give a good example of such object and why this might considered as leak ? Thanks


Solution

  • Unfortunately you didn't assign any language tag to this question so I'll assume your language is Java. The important thing to understand what happens is to recall that Java support nested aka inner classes which can be static and non-static. This issue might come from a non-static inner class only. Also it is important that all anonymous inner classes in Java are non-static even if they don't technically need that.

    Consider some big application that has a global Scheduler service that can run delayed or repetitive ScheduledJobs. Something like this:

    public interface ScheduledJob {
        boolean isRepetitive();
    
        long getDelay();
    
        void runJob();
    }
    
    class Scheduler {
        private final List<ScheduledJob> jobs = new ArrayList<>();
    
        public void registerJob(ScheduledJob job) {
            jobs.add(job);
        }
    
        public void runScheduler() {
           // some logic to run all registered jobs
        }
    }
    

    Now consider you have some plug-in system and an integration module that should run some job once per configured time interval and configuration is stored in the DB.

    public interface Module {
        void register(Scheduler scheduler);
    }
    
    public class IntegrationModule implements Module {
    
        private java.sql.Connection db;
    
        private long readDelayConfiguration() {
            // read data from DB
        }
    
        public void register(Scheduler scheduler) {
            final long delay = readDelayConfiguration();
    
            scheduler.registerJob(new ScheduledJob() {
                @Override
                public boolean isRepetitive() {
                    return true;
                }
    
                @Override
                public long getDelay() {
                    return delay;
                }
    
                @Override
                public void runJob() {
                    // do some integration stuff
                }
            });
        }
    }
    

    What this code is actually compiled into is something like this:

    class ScheduledJob_IntegrationModule_Nested implements ScheduledJob {
        private final IntegrationModule outerThis;
        private final long delay;
    
        public ScheduledJob_IntegrationModule_Nested(IntegrationModule outerThis, long delay) {
            this.outerThis = outerThis;
            this.delay = delay;
        }
    
        @Override
        public boolean isRepetitive() {
            return true;
        }
    
        @Override
        public long getDelay() {
            return delay;
        }
    
        @Override
        public void runJob() {
            // do some integration stuff
        }
    }
    
    public class IntegrationModule implements Module {
    
        // some other stuff
        ...
    
        public void register(Scheduler scheduler) {
            final long delay = readDelayConfiguration();
            scheduler.registerJob(new ScheduledJob_IntegrationModule_Nested(this, delay));
        }
    }
    

    So now an instance of an anonymous subclass ScheduledJob captures this of the IntegrationModule. It means that even if there are no direct references of the IntegrationModule, the fact that a global Scheduler object retains a reference to an instance of ScheduledJob_IntegrationModule_Nested means that IntegrationModule with all its fields is also retained effectively forever. This is a pure memory leak.

    Note that the case would be the same if ScheduledJob_IntegrationModule_Nested was a non-anonymous but non-static nested class of IntegrationModule. Only static nested classes do not capture implicitly an instance of their "owner" class.

    A bit more complicated example would be if you imagine that this is a web application that handles HTTP requests and handlers are stateful. So there is some "dispatcher" that analyzes incoming HTTP requests and then creates an instance of a proper handler and delegates the job to it. This is actually a typical approach in many web-frameworks.

    public abstract class StatefulRequestProcessor {
        protected final Scheduler scheduler;
        protected final HttpRequest request;
    
        public StatefulRequestProcessor(Scheduler scheduler, HttpRequest request) {
            this.scheduler = scheduler;
            this.request = request;
        }
    
        public abstract void process();
    }
    

    And now assume that for some kind of incoming requests there is some delayed cleanup

    public class MyStatefulRequestProcessor extends StatefulRequestProcessor {
        public MyStatefulRequestProcessor(Scheduler scheduler, HttpRequest request) {
            super(scheduler, request);
        }
    
        @Override
        public void process() {
    
            // do some processing and finally get some stored ID
            ...
            final long id = ...
    
            // register a clean up of that ID
            scheduler.registerJob(new ScheduledJob() {
                @Override
                public boolean isRepetitive() {
                    return false;
                }
    
                @Override
                public long getDelay() {
                    return 24 * 60 * 60 * 1000L; // one day later
                }
    
                @Override
                public void runJob() {
                    // do some clean up
                    cleanUp(id);
                }
            });
        }
    }
    

    Now this is technically not a memory leak because after about 24 hours the scheduler will release an instance of an anonymous ScheduledJob and thus MyStatefulRequestProcessor will also become available for garbage collection. However it means that for that 24 hours you must have store in your memory whole MyStatefulRequestProcessor including things like HttpRequest, HttpResponse, etc. even though technically they are not needed after the main processing is finished.

    For C# the case is similar except typically you would have a delegate that captures its parent instead of a nested class.


    Update: what to do?

    This is a less of a hard facts area and more of an opinion-based.

    What is a memory leak?

    The first question here is what is a "memory leak"? I think there are two different although connected aspects:

    1. Memory leak is a behavior of a program that manifests itself in a steady and potentially unlimited growth of the memory consumption. This is a bad thing because this degrades performance and might ultimately lead to an out-of-memory crash.

    2. Memory leak is a behavior of a program when some memory areas (objects in OOP world) are retained much longer than the developer intended them to.

    Quite often bad behavior described in the definition #1 is a result of mistake defined in #2.

    What to do and are inner classes evil?

    I think that the reason YourKit warns you about such things is because this behavior also might be intended is often non-obvious for a programmer because the back reference is generated implicitly and you can easily forget about that. Moreover the Java compiler is not smart enough to make the correct decision on its own and requires a programmer to make the decision by explicitly specifying (or avoiding) the static keyword. And since there is nowhere to put static for anonymous inner classes, they all capture their parent even if they don't really need to.

    To answer the question "what to do about it?" you first should understand why the compiler generates that "back reference". Going back to IntegrationModule example, there might be two different behaviors:

    1. We want to read the delay once from the configuration and use it forever (until application restart)
    2. We want the delay to be adjustable on the fly by editing configuration (i.e. without restarting).

    In the first case you can re-write the code as

    public class IntegrationModule implements Module {
    
        // some other stuff
        ...
    
        public void register(Scheduler scheduler) {
            final long delay = readDelayConfiguration();
            scheduler.registerJob(new ScheduledJob_IntegrationModule_Nested(this, delay));
        }
    
    
        static class IntegrationScheduledJob implements ScheduledJob {
            private final long delay;
    
            public IntegrationScheduledJob(long delay) {
                this.delay = delay;
            }
    
            @Override
            public boolean isRepetitive() {
                return true;
            }
    
            @Override
            public long getDelay() {
                return delay;
            }
    
            @Override
            public void runJob() {
                // do some integration stuff
            }
        }
    }
    

    So you make your anonymous class named and static and explicitly pass all the dependencies inside.

    In the second case we effectively want to call readDelayConfiguration from the getDelay. This is why the anonymous object needs this of the outer class and so the compiler provides it for us. You still may convert your anonymous class to a named static one and explicitly pass all the dependencies required to read the configuration but that new class will have to hold all that dependencies anyway so there is not much of a benefit.

    Lifecycles

    Another important point is lifecycles of different objects. Non-static inner classes are absolutely OK if their lifecycles lie entirely withing the lifecycle of their parent object or at most is just a little bit more into the future. So the issue is really about disparity of the lifecycles. If the parent object has infinite lifecycle (i.e. retained globally itself), everything is OK. The problem might arise only when such an inner object is "leaked" by a short-time object into the "outer world" with an extended or possible infinite lifecycle such as in the Scheduler example. I would say that in such cases, when this is the intended behavior, you should use named static inner classes and pass the outer this explicitly and probably even write some comments to make it clearer both to tools like YouKit and to fellow developers that this is indeed was thought over and not an accident.