First post, long time lurker, couldn't get the format the way I wanted it to be. -sorry
I'm trying to convert part of a binary file to a date/time (in Python). But whatever I try I'm unable to find the proper conversion.
My guess is that the left byte (0x30
) is not part of the data, and the remaining 8 bytes contain the relevant data.
Below are the binary parts, both in decimal and in Hex, and the date/time they represent. Any help is highly appreciated.
48 101 26 235 227 242 150 197 65
30 65 1a eb e3 f2 96 c5 41
-- should read as 16 December 2023 at 15:03
48 198 54 133 112 138 151 197 65
30 c6 36 85 70 8a 97 c5 41
-- should read as 17 December 2023 at 12:37
48 74 38 27 107 41 116 196 65
30 4a 26 1b 6b 29 74 c4 41
-- should read as 1 October 2022 at 12:49
I've tried to unpack the data as either double or long long int and then obtain a date from it. I've searched the site and tried chat GPT to no avail.
Extra sample data
30 23 84 b1 a8 b5 97 c5 41 : 17 December 2023 at 18:45
30 3f 91 e7 96 b5 97 c5 41 : 17 December 2023 at 18:45 (slightly later)
30 a6 d6 2f d1 b5 97 c5 41 : 17 December 2023 at 18:46
30 e8 16 9c b9 b5 97 c5 41 : 17 December 2023 at 18:47
The reason I asked in comments for some more examples, especially ones close to each other by time, was to see what parts in the binary values were changing. I considered several types of encoding (some even based on textual representations of the timestamps). I looked at temporenc
. I looked at floating point representations of seconds since the Epoch.
But one thing struck me: it was quite interesting to see that among these three examples:
{
'30 65 1a eb e3 f2 96 c5 41': '16 December 2023 at 15:03',
'30 c6 36 85 70 8a 97 c5 41': '17 December 2023 at 12:37',
'30 23 84 b1 a8 b5 97 c5 41': '17 December 2023 at 18:45',
}
the c5
byte (2nd from right) is constant, while the 3rd byte from the right is 97
for Dec. 17
and 96 for Dec. 16
.
Further, I started looking at the whole integer value of the bytes in reverse order (excluding the first and last ones that are constant and may be delimiters).
I then noticed that the time differences between two consecutive timestamps corresponded to a multiple of the int
values. That multiple is close to 8_388_608
, which is 2 ** 23
.
Fast-forward to a few more steps, and we get:
def f(k):
return (int(''.join(k.split()[1:-1][::-1]), 16) >> 23) - 4927272860
That function gives a fairly good approximation of the timestamps provided, in seconds since the Epoch. One additional thing is, there was a conspicuous 3600 seconds error for the October date, so I figured there was some daylight savings in your dates. Since you are in Europe, I used Zurich's timezone.
Put all together:
import pandas as pd
tz = 'Europe/Zurich'
examples = {
'30 65 1a eb e3 f2 96 c5 41': '16 December 2023 at 15:03',
'30 c6 36 85 70 8a 97 c5 41': '17 December 2023 at 12:37',
'30 4a 26 1b 6b 29 74 c4 41': '1 October 2022 at 12:49',
'30 23 84 b1 a8 b5 97 c5 41': '17 December 2023 at 18:45',
'30 3f 91 e7 96 b5 97 c5 41': '17 December 2023 at 18:45:30',
'30 a6 d6 2f d1 b5 97 c5 41': '17 December 2023 at 18:46',
'30 e8 16 9c b9 b5 97 c5 41': '17 December 2023 at 18:47',
}
examples = dict(sorted([
(k, pd.Timestamp(v, tz=tz)) for k, v in examples.items()
], key=lambda item: item[1]))
Then:
def f(k):
return (int(''.join(k.split()[1:-1][::-1]), 16) >> 23) - 4927272860
def to_time(k, tz):
return pd.Timestamp(f(k) * 1e9, tz=tz)
fmt = '%F %T %Z'
test = [
(
f'{v:{fmt}}', # given time
f'{to_time(k, tz=tz):{fmt}}', # estimate from bytes
(to_time(k, tz=tz) - v).total_seconds(), # difference in seconds
)
for k, v in examples.items()
]
>>> test
[('2022-10-01 12:49:00 CEST', '2022-10-01 12:49:30 CEST', 30.0),
('2023-12-16 15:03:00 CET', '2023-12-16 15:03:23 CET', 23.0),
('2023-12-17 12:37:00 CET', '2023-12-17 12:36:37 CET', -23.0),
('2023-12-17 18:45:00 CET', '2023-12-17 18:45:25 CET', 25.0),
('2023-12-17 18:45:30 CET', '2023-12-17 18:44:49 CET', -41.0),
('2023-12-17 18:46:00 CET', '2023-12-17 18:46:46 CET', 46.0),
('2023-12-17 18:47:00 CET', '2023-12-17 18:45:59 CET', -61.0)]
Perhaps with more examples and more info, you may adjust the constants used above. I tried to express the offset above in terms of an origin as a date, but it wasn't satisfying. One approach I tried was with:
origin = pd.Timestamp('2018-01-05 18:48:33')
offset = int(origin.value / 1e9)
def f(k):
return (int(''.join(k.split()[::-1])[3:-2], 16) >> 23) + offset
but I didn't find it much better from an "Occam's razor" perspective.