javaactive-directoryuuidapache-directory

Endianness of Active Directory objectUUID Attributes in Java / Apache Directory Studio


I am connecting to an Active Directory server from Java. I add the property:

 env.put("java.naming.ldap.attributes.binary", "objectGUID");

and then I read the objecUUID like this:

Attribute a = result.getAttributes().get("objectUUID");
byte[] b = (byte[])a.get();

and format it like this:

String id = Hex.
    encodeHexString(b).
    replaceAll(
       "(.{8})(.{4})(.{4})(.{4})(.{12})", 
       "$1-$2-$3-$4-$5"
    )
);

The result is a nicely format UUID. When I want to find an entry by its UUID, I remove the dashes:

id = id.replaceAll("[^a-zA-Z0-9]+", "");

and then insert backslashes:

id = id.replaceAll("([a-zA-Z0-9]{2,2})", "\\\\$1");

This all works nicely. What I am having a problem with is the fact that Apache Directory Studio shows (for example) my user's UUID as:

 8e591e3a-35ab-45cc-8dca-c5e451adc975

wheras my code shows the UUID of the same entry as:

 3a1e598e-ab35-cc45-8dca-c5e451adc975

as you can see, the high- and low order bytes are swapped for the left eight bytes, but identical on the right side. Why is that? This seems very weird ...

.rm


Solution

  • If you look here:

    https://en.wikipedia.org/wiki/Universally_unique_identifier

    Name                               Length (Bytes) Length (Hex Digits) Contents
    time_low                           4              8                   integer giving the low 32 bits of the time
    time_mid                           2              4                   integer giving the middle 16 bits of the time
    time_hi_and_version                2              4                   4-bit "version" in the most significant bits, followed by the high 12 bits of the time
    clock_seq_hi_and_res clock_seq_low 2              4                   1-3 bit "variant" in the most significant bits, followed by the 13-15 bit clock sequence
    node                               6              12                  the 48-bit node id
    

    You can see that the first 3 segments are integers/shorts related to the "time" data and therefore have endianness, whereas the other sections are just binary data and so do not.