I have build framework to do some algorithm evaluation. I have build methods to calculate based on data that I am passing into these method. RMSE@K, NDCG@K, MAE@K etc.
ndcg = []
rmse = []
mae = []
for i in xrange(11):
results = generate_metrics(data_file, i)
ndcg.append(np.mean(results['ndcg']))
rmse.append(np.mean(results['rmse']))
mae.append(np.mean(results['mae']))
plt.plot(ndcg)
plt.plot(rmse)
plt.plot(mae)
plt.plot()
plt.show()
I want to use ggplot within python to plot this in one graph: X axis is @k values which is 0-10 and y axis relevant value in each list.
how can I convert above lists to a data frame like this:
at_k ndcg rmse mae
1 1 0.4880583 0.3438043 0.3400933
2 2 0.4880583 0.3438043 0.3400933
3 3 0.4880583 0.3438043 0.3400933
4 4 0.4880583 0.3438043 0.3400933
5 5 0.4880583 0.3438043 0.3400933
6 6 0.4880583 0.3438043 0.3400933
7 7 0.4880583 0.3438043 0.3400933
8 8 0.4880583 0.3438043 0.3400933
9 9 0.4880583 0.3438043 0.3400933
10 10 0.4880583 0.3438043 0.3400933
and plot it using ggplot
Please note that this answer uses yhat'g ggpy for a python ggplot port. There exist other Python grammar of graphics implementations, such as plotnine, for which this answer does not work.
After generating some random data in the same form as your dataset using
import numpy as np
ndcg, rmse, mae = [], [], []
for i in xrange(11):
rand = np.random.sample(3)
ndcg.append(rand[0])
rmse.append(rand[1])
mae.append(rand[2])
I can create a Pandas DataFrame from it:
import pandas as pd
at_k = range(1, 12)
df = pd.DataFrame({"at_k": at_k, "ndcg": ndcg, "rmse": rmse, "mae": mae})
print df
This outputs
at_k mae ndcg rmse
0 1 0.153102 0.546553 0.794357
1 2 0.882718 0.342260 0.762997
2 3 0.153298 0.695626 0.581455
3 4 0.073772 0.491996 0.384631
4 5 0.014066 0.369490 0.606842
5 6 0.892553 0.818312 0.396829
6 7 0.143114 0.739370 0.812050
7 8 0.847054 0.323221 0.932366
8 9 0.122838 0.613340 0.393237
9 10 0.645705 0.486312 0.138259
10 11 0.339063 0.223995 0.115242
Yay! But we can't use this for plotting with yhat's ggplot yet. Following this example, we need to transform the data:
df2 = pd.melt(df[['at_k', 'mae', 'ndcg', 'rmse']], id_vars=['at_k'])
print df2
Now we've got something like this (truncated):
at_k variable value
0 1 mae 0.153102
1 2 mae 0.882718
2 3 mae 0.153298
3 4 mae 0.073772
...
30 9 rmse 0.393237
31 10 rmse 0.138259
32 11 rmse 0.115242
Now it's the time to plot:
ggplot(aes(x='at_k', y='value', colour='variable'), data=df2) +\
geom_point()