My goal is to have a simple .dat file and, from it, to plot both the actual data and the theoretical points of a perfect Zipf distribution, that is, a distribution where every item has a value equal to 1/(rank).
For instance, my data for most followed Instagram accounts is:
# List of most followed users on instagram
# By rank and millions of followers
# From Wikipedia
# https://en.wikipedia.org/wiki/List_of_most_followed_users_on_Instagram
# rank, millions of followers
1 222
2 120
3 105
4 101
5 101
6 100
7 99
8 93
9 86
10 85
11 80
12 79
13 76
14 73
15 71
16 69
17 67
18 65
19 63
20 63
From another thread I learned that I can just append a new column with the ideal Zipf distribution values per rank (in this case, 222, 111, 74, 55.5 etc) and then run the second plot as ,'' using 1:3
but this requires manually doing the calculation and appending it to the original file and that's the step I'm trying to avoid. Is this possible? How could I extend it to other distributions/calculations of data?
Use stats
to calculate the maximum value of the second column with
stats 'file.dat' u 2 nooutput
max = STATS_max
Then you calculate the Zipf distribution with (max/$1)
plot 'file.dat' u 1:2 pt 7 t 'data',\
'' u 1:(max/$1) w l t 'ideal Zipf'