I am testing out Protocol buffers and trying to read a csv file, serialize it and write the output to a binary file and then read the binary file using ParseFromString. I am able to serialize and write the binary file however on reading it gives an index out of bounds exception or in the other case it just outputs the last line of the binary file, it skips everything before it.
My message is simple, it has two fields, time and metricusage.
syntax="proto3";
message excelData {
string time=1;
string meterusage=2;
}
The serialization and writing to a binary file code is below:
import metric_pb2
import sys
from csv import reader
excel_data=metric_pb2.excelData()
with open('out.bin', 'wb') as f:
with open('data.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
header = next(csv_reader)
if header != None:
for row in csv_reader:
excel_data.time=row[0]
excel_data.meterusage=row[1]
f.write(excel_data.SerializeToString())
f.close()
read_obj.close()
The troublesome part is below:
Approach 1: This only returns the last line of the binary file. It skips everything before it.
excel_data=metric_pb2.excelData()
with open('out.bin', 'rb') as f:
content=f.read()
excel_data.ParseFromString(content)
print(excel_data.time)
print(excel_data.meterusage)
Approach 2: If I read the serialized binary file like the csv file above it gives me an index out of bound error. My inclination is that maybe the binary file is byte data and does not contain string data types it is giving this error?
What's the correct way to read this binary file using message.ParseFromString() because reading it via a loop doesn't work, nor reading it as whole works? A snapshot of my created binary file is below:
Were you successful?
Here's a hacky solution for you that (per Protobuf techniques for streaming multiple messages) writes the (variable!) message length as bytes before each record.
import metric_pb2
import sys
from csv import reader
excel_data = metric_pb2.excelData()
with open('out.bin', 'wb') as f:
with open('data.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
header = next(csv_reader)
if header != None:
for row in csv_reader:
excel_data.time = row[0]
excel_data.meterusage = row[1]
bytes = excel_data.SerializeToString()
# Write the message's integer length as bytes
f.write(len(bytes).to_bytes(1, sys.byteorder))
# Write the message itself as bytes
f.write(bytes)
f.close()
read_obj.close()
Produces:
00000000: 1c 0a13 3230 3231 2d30 312d 3031 2030 303a 3030 3a30 3012 0535 342e 3635 ...2021-01-01 00:00:00..54.65
00000010: 1c 0a13 3230 3231 2d30 312d 3031 2030 303a 3030 3a30 3012 0535 352e 3138 ...2021-01-01 00:00:00..55.18
00000030: 1b 0a13 3230 3231 2d30 312d 3031 2030 303a 3030 3a30 3012 0435 352e 38 ...2021-01-01 00:00:00..55.8
NOTE
1c
== 28 (because54.65
and55.18
) and1b
== 27 (because55.8
)
import metric_pb2
import sys
excel_data = metric_pb2.excelData()
with open('out.bin', 'rb') as f:
while True:
# Read the message's length as bytes and convert it to an integer
len = int.from_bytes(f.read(1), sys.byteorder)
# Read that number of bytes as the message bytes
bytes = f.read(len)
if not bytes:
break
excel_data.ParseFromString(bytes)
print("[{time}] {meterusage}".format(
time=excel_data.time,
meterusage=excel_data.meterusage))
Produces:
[2021-01-01 00:00:00] 54.65
[2021-01-01 00:00:00] 55.18
[2021-01-01 00:00:00] 55.8
[2021-01-01 00:00:00] 56.0
[2021-01-01 00:00:00] 63.52
[2021-01-01 00:00:00] 78.1