I need to reimplement following function which converts 8 digit List to 32 digit List via ByteBuffer
to python. So it can be used a python pipeline.
As a data scientist, I have been scratching my head how to do it right. I assume I can try to find a way to call java code in python as work around. But I think just implement in python is more elegant if it is achievable. Any help is appreciated!
static public List<Float> decodeEmbedding(List<Integer> embedding8d) {
List<Float> embedding32d = new ArrayList<>();
for (int index = 0; index < embedding8d.size(); ++index) {
Integer x = embedding8d.get(index);
byte[] bytes = ByteBuffer.allocate(4).putInt(x).array();
for (int byteIndex = 0; byteIndex < 4; ++byteIndex) {
embedding32d.add((float) bytes[byteIndex]);
}
}
return embedding32d;
}
//------------ main ----------
List<Integer> input = Arrays.asList(
-839660988,
826572561,
1885995405,
819888698,
-2010625315,
-1614561409,
-1220275962,
-2135440498
);
System.out.println(decodeEmbedding(input));
//[-51.0, -13.0, -54.0, 68.0, 49.0, 68.0, 127.0, 17.0, 112.0, 106.0, 1.0, -115.0, 48.0, -34.0, -126.0, 58.0, -120.0, 40.0, 74.0, -35.0, -97.0, -61.0, -65.0, 127.0, -73.0, 68.0, 17.0, 6.0, -128.0, -73.0, -61.0, -114.0]
Here is one approach:
import struct
bstring = struct.pack('>i', intvalue)
for byte in bstring:
if byte >= 128:
byte -= 256
embedding32d.append(float(byte))
To put the whole thing together:
import struct
def decodeEmbedding(embedding8d):
embedding32d = []
for intvalue in embedding8d:
bstring = struct.pack('>i', intvalue)
signed = (byte - 256 if byte >= 128 else byte for byte in bstring)
embedding32d.extend(float(byte) for byte in signed)
return embedding32d
There is also a one-liner version.
def decodeEmbedding(embedding8d):
return [float(byte - 256 if byte > 127 else byte) for byte in
struct.pack('>' + 'i' * len(embedding8d), *embedding8d)]