pythonggplot2python-ggplotplotnine

plotnine doesn't add legend


I'm using plotnine to plot two graphs in the same plot. one graph uses the 'b' values from the dataframe you'll see below, and another one uses the values from 'c'.

All I need is to show a simple legend legend where I see 'c' and 'b' with their according color.

def plot_log_detected():
    df = DataFrame({'x': [1, 2, 3, 4, 5],
                    'b': >>>SOME VALUES DOESNT MATTER<<<,
                    'c': >>>SOME VALUES DOESNT MATTER<<<
                   })
    return ggplot(aes(x='x', y='b'), data=df) + geom_point(size=1) +\
           geom_line(aes(y='b'), color='black') + \
           geom_line(aes(y='c'), color='blue') +  \
           ggtitle("TITLE") + \
           labs(y="Y AXIS", x="X AXIS")

Solution

  • This won't show a legend if you use ggplot2 in R either: the legend for color will only be represented when you specify color= within an aesthetic for a geom. The "fix" is the same in both python or ggplot for r. You need to organize your data so that you follow tidy data principles. In this case, df$b and df$c columns each contain two pieces of information: (1) value of "y" and (2) type of "y". You should reorganize the data accordingly so that your column names become: x, type_of_y, and value_of_y.

    I'll explain by filling in a dataset like you presented, then indicate how we can change it to a tidy format, then how you can (properly) apply the code to represent a plot like I believe you want to to.

    The Basics

    Here's a dataset and a plot like your plot (again, it's in r... So I hope you can translate into python):

    df <- data.frame(
        x=c(1:5), b=c(10, 12, 14, 9, 8), c=c(9, 11, 11, 12, 14))
    
    ggplot(df, aes(x=x)) +
        geom_line(aes(y=b), color='red') +
        geom_line(aes(y=c), color='blue')
    

    enter image description here

    No legend, but the colors are there and we plot what you would expect. The problem here is that ggplot draws a legend when you specify color in the aes() call. To see this clearly, let's just do the same plot, but move the color=... inside aes():

    ggplot(df, aes(x=x)) +
        geom_line(aes(y=b, color='red')) +
        geom_line(aes(y=c, color='blue'))
    

    enter image description here

    Ok that's... wait. What? It has a legend now (because we put color inside aes()), but the colors are actually reversed in order and... you'll notice the colors are not red and blue, but the default "reddish" and "teal" colors of ggplot2. Actually, what happened is that we only specified that in the first geom_line call, we plotted the correct dataset, but we only "titled" the data as "red". Likewise, we "titled" the other dataset "blue". ggplot decided what colors to use based on the default palette.

    Getting Your Legend Without Tidy Data

    If you don't want to mess with your data, there is actually a way to do this and probably get an output you might be satisfied with. We just have to indicate in color= the name you want to call that series.

    ggplot(df, aes(x=x)) +
        geom_line(aes(y=b, color='b')) +
        geom_line(aes(y=c, color='c'))
    

    enter image description here

    What about just adding another color='blue' to get a "blue" color outside the aes() as well as inside? Well... that doesn't work. If you do this, for example, the result is identical to the original plot shown (with no legend, but correct color values), since the aes() is effectively overwritten in each geom_line call:

    # this doesn't work to keep legend and desired color, the second
    # color outside aes() overwrites the one inside aes()
    ggplot(df, aes(x=x)) +
        geom_line(aes(y=b, color='b'), color='red') +
        geom_line(aes(y=c, color='c'), color='blue')
    

    The Tidy Data Way (The "correct" way)

    While the above method works, it goes against the general principles of Tidy Data and how to organize you data so that it's easy to analyze... in ANY WAY you want to. Trust me: it's definitely the best practice moving forward for working with any dataset for versatility of analysis, and almost always worth the effort to organize your data in that way.

    ggplot wants you to specify aes() parameters as columns in your dataset. That means we should make each column serve a specific purpose in your dataset as such:

    Using dplyr, we can reorganize the data in this way pretty simply:

    df <- df %>% gather('type_of_y', 'value_of_y', -x)
    

    Giving you:

       x type_of_y value_of_y
    1  1         b         10
    2  2         b         12
    3  3         b         14
    4  4         b          9
    5  5         b          8
    6  1         c          9
    7  2         c         11
    8  3         c         11
    9  4         c         12
    10 5         c         14
    

    Then you plot accordingly, using only one geom_line call and apply the color aesthetic to type_of_y. Something like this:

    ggplot(df, aes(x=x, y=value_of_y)) +
        geom_line(aes(color=type_of_y))
    

    enter image description here

    In this way, you only have to specify one geom_line call. Might not seem too different here, but what if you had multiple columns in your original dataset? Take the case, for example, of having "x", then y values for "a", "b", "c"... "z"! You would have to specify all those lines in separate calls to geom_line! In the case above, no matter how many different y value columns you had... you only have the same two lines of code and only one call to geom_line. Make sense? For more information, I would suggest the link from above. Also, this article is a great read.

    You can then assign specific colors by adding scale_color_manual and specifying the colors that way (there's a few other ways too) - but if you need assistance there, I would ask in a separate question. Also... not sure how the code differs for python. Similarly, you can change title of legend via labs(color="your new legend title")... among other theme changes.

    I know it is not quite the same code in python, but that should be enough for you to figure our how to do it similarly there.