pythonjavabytecodeendiannessbytebuffer

Java ByteBuffer similar function in Python


I need to reimplement following function which converts 8 digit List to 32 digit List via ByteBuffer to python. So it can be used a python pipeline. As a data scientist, I have been scratching my head how to do it right. I assume I can try to find a way to call java code in python as work around. But I think just implement in python is more elegant if it is achievable. Any help is appreciated!

static public List<Float> decodeEmbedding(List<Integer> embedding8d) {
    List<Float> embedding32d = new ArrayList<>();
    for (int index = 0; index < embedding8d.size(); ++index) {
        Integer x = embedding8d.get(index);
        byte[] bytes = ByteBuffer.allocate(4).putInt(x).array();
        for (int byteIndex = 0; byteIndex < 4; ++byteIndex) {
            embedding32d.add((float) bytes[byteIndex]);
        }
    }
    return embedding32d;
}

//------------ main ----------
List<Integer> input = Arrays.asList(
    -839660988,
    826572561,
    1885995405,
    819888698,
    -2010625315,
    -1614561409,
    -1220275962,
    -2135440498
);

System.out.println(decodeEmbedding(input));
//[-51.0, -13.0, -54.0, 68.0, 49.0, 68.0, 127.0, 17.0, 112.0, 106.0, 1.0, -115.0, 48.0, -34.0, -126.0, 58.0, -120.0, 40.0, 74.0, -35.0, -97.0, -61.0, -65.0, 127.0, -73.0, 68.0, 17.0, 6.0, -128.0, -73.0, -61.0, -114.0]


Solution

  • Here is one approach:

    1. We pack the integer into a bytes array of the format we want
    import struct
    bstring = struct.pack('>i', intvalue)
    
    1. Iterating over the bytes object gives us integers range [0, 256). But from the looks of it, you want signed range [-128, 128). So we adjust for this
    for byte in bstring:
        if byte >= 128:
            byte -= 256
        embedding32d.append(float(byte))
    

    To put the whole thing together:

    
    import struct
    
    def decodeEmbedding(embedding8d):
        embedding32d = []
        for intvalue in embedding8d:
            bstring = struct.pack('>i', intvalue)
            signed = (byte - 256 if byte >= 128 else byte for byte in bstring)
            embedding32d.extend(float(byte) for byte in signed)
        return embedding32d
    

    There is also a one-liner version.

    def decodeEmbedding(embedding8d):
        return [float(byte - 256 if byte > 127 else byte) for byte in
                struct.pack('>' + 'i' * len(embedding8d), *embedding8d)]