javaandroidjol

How are Java objects laid out in memory on Android?


I'm fairly familiar with the layout of objects on the heap in HotSpot, but not so much for Android.

For example, in a 32-bit HotSpot JVM, an object on the heap is implemented as an 8-byte header, followed by the object's fields (one byte for boolean, four bytes for a reference, and everything else as expected), laid out in some specific order (with some special rules for fields from superclasses), and padded out to a multiple of 8 bytes.

I've done some research, but I can't find any Android-specific information.

(I'm interested in optimizing some extremely widely used data structures to minimize memory consumption on Android.)


Solution

  • dalvik/vm/oo/Object.h is your friend here. The comment for struct Object says:

    /*
     * There are three types of objects:
     *  Class objects - an instance of java.lang.Class
     *  Array objects - an object created with a "new array" instruction
     *  Data objects - an object that is neither of the above
     *
     * We also define String objects.  At present they're equivalent to
     * DataObject, but that may change.  (Either way, they make some of the
     * code more obvious.)
     *
     * All objects have an Object header followed by type-specific data.
     */
    

    java.lang.Class objects are special; their layout is defined by the ClassObject struct in Object.h. Array objects are simple:

    struct ArrayObject : Object {
        /* number of elements; immutable after init */
        u4              length;
    
        /*
         * Array contents; actual size is (length * sizeof(type)).  This is
         * declared as u8 so that the compiler inserts any necessary padding
         * (e.g. for EABI); the actual allocation may be smaller than 8 bytes.
         */
        u8              contents[1];
    };
    

    For arrays, the widths are in vm/oo/Array.cpp. Booleans are width 1, objects have sizeof(Object*) length (usually 4), and all other primitive types have their expected (packed) length.

    Data objects are really simple:

    /*
     * Data objects have an Object header followed by their instance data.
     */
    struct DataObject : Object {
        /* variable #of u4 slots; u8 uses 2 slots */
        u4              instanceData[1];
    };
    

    The layout of a DataObject (all non-Class class instances) is governed by computeFieldOffsets in vm/oo/Class.cpp. According to the comment there:

    /*
     * Assign instance fields to u4 slots.
     *
     * The top portion of the instance field area is occupied by the superclass
     * fields, the bottom by the fields for this class.
     *
     * "long" and "double" fields occupy two adjacent slots.  On some
     * architectures, 64-bit quantities must be 64-bit aligned, so we need to
     * arrange fields (or introduce padding) to ensure this.  We assume the
     * fields of the topmost superclass (i.e. Object) are 64-bit aligned, so
     * we can just ensure that the offset is "even".  To avoid wasting space,
     * we want to move non-reference 32-bit fields into gaps rather than
     * creating pad words.
     *
     * In the worst case we will waste 4 bytes, but because objects are
     * allocated on >= 64-bit boundaries, those bytes may well be wasted anyway
     * (assuming this is the most-derived class).
     *
     * Pad words are not represented in the field table, so the field table
     * itself does not change size.
     *
     * The number of field slots determines the size of the object, so we
     * set that here too.
     *
     * This function feels a little more complicated than I'd like, but it
     * has the property of moving the smallest possible set of fields, which
     * should reduce the time required to load a class.
     *
     * NOTE: reference fields *must* come first, or precacheReferenceOffsets()
     * will break.
     */
    

    So, superclass fields come first (as usual), followed by reference-type fields, followed by a single 32-bit field (if available, and if padding is required because there's an odd number of 32-bit reference fields) followed by 64-bit fields. Regular 32-bit fields follow. Note that all fields are 32-bit or 64-bit (shorter primitives are padded). In particular, at this time, the VM does not store byte/char/short/boolean fields using less than 4 bytes, though it certainly could support this in theory.

    Note that all of this is based on reading the Dalvik source code as of commit 43241340 (Feb 6, 2013). Since this aspect of the VM doesn't appear to be publically documented, you should not rely on this to be a stable description of the VM's object layout: it may change over time.