Consider the following JSON having a list of key-value pairs
{
"session1": 128,
"session2": 1048596,
"session3": 3145728,
"session4": 3145828,
"session5": 11534338,
"session6": 11544336,
"session7": 2097252
}
The key is a session identifier, and the value is the length of the value stored in the session.
I want to print counts of values by range - the ranges being (lower bound included, high bound excluded); 0-1MB, 1-2MB, 2-3MB, ... 12-13MB.
1MB = 1048576
2MB = 2097152
3MB = 3145728
4MB = 4194304
5MB = 5242880
6MB = 6291456
7MB = 7340032
8MB = 8388608
9MB = 9437184
10MB = 10485760
11MB = 11534336
12MB = 12582912
13MB = 13631488
The expected output is
{
"0-1MB": 1,
"1-2MB": 1,
"2-3MB": 1,
"3-4MB": 2,
"10-11MB": 2
}
The above is just representative, suggestions are welcome.
The following should work:
to_entries
| map(.value / 1048576 | floor | [tostring, "-", (.+1 | tostring), "MB"] | add)
| group_by(.)
| map({"key": .[0], "value": length})
| from_entries
For your input, it produces the following output:
{
"0-1MB": 1,
"1-2MB": 1,
"11-12MB": 2,
"2-3MB": 1,
"3-4MB": 2
}
(11534338 and 11544336 are counted in the "11-12MB" bucket rather than the "10-11MB" one, because 11*2^20 = 11534336, and those numbers are larger than that.)
If you wanted the keys in numeric order, you could also convert them to your preferred string labels after the group_by
:
to_entries
| map(.value / 1048576 | floor)
| group_by(.)
| map({"key": [(.[0] | tostring), "-", (.[0]+1 | tostring), "MB"] | add, "value": length})
| from_entries
Which produces:
{
"0-1MB": 1,
"1-2MB": 1,
"2-3MB": 1,
"3-4MB": 2,
"11-12MB": 2
}
Both solutions have the same basic steps:
{"key": x, "value": y}
entries (to_entries
)..value / 1048576 | floor
).group_by
). This produces an array like [[0], [1], [2], [3, 3], [11, 11]]
for your input.length
).from_entries
).