javapdfbox

PDFBox SmallMap doesn't respect Map.Entry hashcode contract


I'm working on an optimization pass to merge resources in a pdf document with PDFBox.

After many hours of research into why my code generates too many hash collisions, I'm surprised to discover a potential bug in the latest version of PDFBox.

Let me explain: in a COSDictionary, the entrySet is supported by a SmallMap. If you want to calculate a hashcode of the Map.Entry generated by a COSDictionary entrySet object, you discover that the hashcode will be calculated like this.

My code:

if (object instanceof COSDictionary) {
                int result = 3;
                for (Map.Entry<COSName, COSBase> entry : ((COSDictionary) object).entrySet())
                    result += entry.hashCode();
                if (object instanceof COSStream) { .... ..... ....}

In class, SmallMap and inner class SmallMapEntry (PDFBox) :

public int hashCode() {
            return this.getKey().hashCode();
        }

But the general contract in Map.Entry explicitely say that (JavaDoc) :

         /**
         * Returns the hash code value for this map entry.  The hash code
         * of a map entry {@code e} is defined to be: <pre>
         *     (e.getKey()==null   ? 0 : e.getKey().hashCode()) ^
         *     (e.getValue()==null ? 0 : e.getValue().hashCode())
         * </pre>
         * This ensures that {@code e1.equals(e2)} implies that
         * {@code e1.hashCode()==e2.hashCode()} for any two Entries
         * {@code e1} and {@code e2}, as required by the general
         * contract of {@code Object.hashCode}.
         *
         * @return the hash code value for this map entry
         * @see Object#hashCode()
         * @see Object#equals(Object)
         * @see #equals(Object)
         */

int hashCode();

I think it's a bug in PDFBox but i'm not sur, what do you think about this ?


Solution

  • Indeed, the SmallMapEntry method hashCode is not implemented as defined in the JavaDocs of Map.Entry. Similarly SmallMapEntry.equals is not implemented as defined for Map.Entry.equals, SmallMap.hashCode not as defined for Map.hashCode, and SmallMap.equals not as defined for Map.equals.

    But there is a reason for this, and it's not merely laziness or performance: The Map JavaDocs themselves explain a shortcoming of their hashCode and equals definitions:

    /**
     * ...
     * <p>Some map operations which perform recursive traversal of the map may fail
     * with an exception for self-referential instances where the map directly or
     * indirectly contains itself. This includes the {@code clone()},
     * {@code equals()}, {@code hashCode()} and {@code toString()} methods.
     * ...
    

    Dictionaries in a PDF often indirectly contain themselves (e.g. via Kids and Parent entries), so if implemented as defined in the JavaDocs, hashCode often would fail by stack overflow. Thus, an implementation that prevents recursive traversal (e.g. by only comparing the keys of entries) is actually advantageous here.