numpy.array.tostring
doesn't seem to preserve information about matrix dimensions (see this question), requiring the user to issue a call to numpy.array.reshape
.
Is there a way to serialize a numpy array to JSON format while preserving this information?
Note: The arrays may contain ints, floats or bools. It's reasonable to expect a transposed array.
Note 2: this is being done with the intent of passing the numpy array through a Storm topology using streamparse, in case such information ends up being relevant.
pickle.dumps
or numpy.save
encode all the information needed to reconstruct an arbitrary NumPy array, even in the presence of endianness issues, non-contiguous arrays, or weird structured dtypes. Endianness issues are probably the most important; you don't want array([1])
to suddenly become array([16777216])
because you loaded your array on a big-endian machine. pickle
is probably the more convenient option, though save
has its own benefits, given in the npy
format rationale.
I'm giving options for serializing to JSON or a bytestring, because the original questioner needed JSON-serializable output, but most people coming here probably don't.
The pickle
way:
import pickle
a = # some NumPy array
# Bytestring option
serialized = pickle.dumps(a)
deserialized_a = pickle.loads(serialized)
# JSON option
# latin-1 maps byte n to unicode code point n
serialized_as_json = json.dumps(pickle.dumps(a).decode('latin-1'))
deserialized_from_json = pickle.loads(json.loads(serialized_as_json).encode('latin-1'))
numpy.save
uses a binary format, and it needs to write to a file, but you can get around that with io.BytesIO
:
a = # any NumPy array
memfile = io.BytesIO()
numpy.save(memfile, a)
serialized = memfile.getvalue()
serialized_as_json = json.dumps(serialized.decode('latin-1'))
# latin-1 maps byte n to unicode code point n
And to deserialize:
memfile = io.BytesIO()
# If you're deserializing from a bytestring:
memfile.write(serialized)
# Or if you're deserializing from JSON:
# memfile.write(json.loads(serialized_as_json).encode('latin-1'))
memfile.seek(0)
a = numpy.load(memfile)