pythonmatplotlibpython-ggplot

Convert from matplotlib to ggplot2 within python


I have build framework to do some algorithm evaluation. I have build methods to calculate based on data that I am passing into these method. RMSE@K, NDCG@K, MAE@K etc.

ndcg = []
rmse = []
mae = []
for i in xrange(11):
    results = generate_metrics(data_file, i)
    ndcg.append(np.mean(results['ndcg']))
    rmse.append(np.mean(results['rmse']))
    mae.append(np.mean(results['mae']))
plt.plot(ndcg)
plt.plot(rmse)
plt.plot(mae)
plt.plot()
plt.show()

I want to use ggplot within python to plot this in one graph: X axis is @k values which is 0-10 and y axis relevant value in each list.

how can I convert above lists to a data frame like this:

   at_k      ndcg      rmse       mae
1     1 0.4880583 0.3438043 0.3400933
2     2 0.4880583 0.3438043 0.3400933
3     3 0.4880583 0.3438043 0.3400933
4     4 0.4880583 0.3438043 0.3400933
5     5 0.4880583 0.3438043 0.3400933
6     6 0.4880583 0.3438043 0.3400933
7     7 0.4880583 0.3438043 0.3400933
8     8 0.4880583 0.3438043 0.3400933
9     9 0.4880583 0.3438043 0.3400933
10   10 0.4880583 0.3438043 0.3400933

and plot it using ggplot


Solution

  • Please note that this answer uses yhat'g ggpy for a python ggplot port. There exist other Python grammar of graphics implementations, such as plotnine, for which this answer does not work.

    After generating some random data in the same form as your dataset using

    import numpy as np
    ndcg, rmse, mae = [], [], []
    for i in xrange(11):
        rand = np.random.sample(3)
        ndcg.append(rand[0])
        rmse.append(rand[1])
        mae.append(rand[2])
    

    I can create a Pandas DataFrame from it:

        import pandas as pd
    at_k = range(1, 12)
    df = pd.DataFrame({"at_k": at_k, "ndcg": ndcg, "rmse": rmse, "mae": mae})
    print df
    

    This outputs

        at_k       mae      ndcg      rmse
    0      1  0.153102  0.546553  0.794357
    1      2  0.882718  0.342260  0.762997
    2      3  0.153298  0.695626  0.581455
    3      4  0.073772  0.491996  0.384631
    4      5  0.014066  0.369490  0.606842
    5      6  0.892553  0.818312  0.396829
    6      7  0.143114  0.739370  0.812050
    7      8  0.847054  0.323221  0.932366
    8      9  0.122838  0.613340  0.393237
    9     10  0.645705  0.486312  0.138259
    10    11  0.339063  0.223995  0.115242
    

    Yay! But we can't use this for plotting with yhat's ggplot yet. Following this example, we need to transform the data:

    df2 = pd.melt(df[['at_k', 'mae', 'ndcg', 'rmse']], id_vars=['at_k'])
    print df2
    

    Now we've got something like this (truncated):

        at_k variable     value
    0      1      mae  0.153102
    1      2      mae  0.882718
    2      3      mae  0.153298
    3      4      mae  0.073772
    ...
    30     9     rmse  0.393237
    31    10     rmse  0.138259
    32    11     rmse  0.115242
    

    Now it's the time to plot:

    ggplot(aes(x='at_k', y='value', colour='variable'), data=df2) +\
        geom_point()
    

    enter image description here