pythonggplot2plotnine

Using manual colors for ggplot in Python


I'm trying to assign custom color palette to the plot. Values can be between 0 and 4 and for values 0-1, 1-2, 2-3, 3-4 I'd like to assign given color (e.g. "#C5FABE", "#F5D562", "#E89A3F", "#CF3E3E")

data table has columns for date, variable_name and value; key_symptoms_names_list is 5 top entries from variable_name

       plot = (ggplot(data, aes(x='date', y='variable_name')) + 
            geom_point(aes(color='value'), size=3, shape='s') +
            labs(x="", y="") +
            scale_x_date(breaks=date_breaks('1 month'), labels=date_format('%b')) +
            scale_y_discrete(limits=key_symptoms_names_list[::-1]) +
            # scale_color_manual(values=["#C5FABE", "#F5D562", "#E89A3F", "#CF3E3E"]) +
            theme_classic() +
            theme(
                legend_position="none",
                figure_size=(13.9, 3.3),
                axis_title=element_text(size=7, weight=400),
                axis_text=element_text(size=7, weight=400),
                axis_line_x=element_line(size=0.0, color="none"),
                axis_line_y=element_line(size=0.0, color="none"),
                axis_text_x=element_text(size=7, hjust=0.0, weight=400),
                axis_ticks_major_x=element_line(size=0.5, color="#959595"),
                axis_ticks_length_major=3,
                panel_grid_major_x=element_line(
                    size=0.5, color="#c7c7c7", linetype="dashed"),
                axis_ticks_major_y=element_blank(),
    )
    )

enter image description here

I'd like to map these values onto the color scale provided above. However, using scale_color_manual results with error Continuous value supplied to discrete scale


Solution

  • The issue is that a manual scale can only be applied on a discrete or categorical variable, whereas according to the error message your value column is a continuous one.

    To fix that you have to manually discretize value using e.g. pandas.cut.

    Using some fake random example data:

    import pandas as pd
    import numpy as np
    from plotnine import ggplot, geom_point, aes, labs, scale_x_date, scale_color_manual, theme_classic
    from mizani.breaks import date_breaks
    from mizani.formatters import date_format
    
    np.random.seed(123)
    
    dates = pd.date_range(start='2020-01-01', end='2021-06-30', freq = 'W')
    variables = pd.Series(["A", "B", "C"]) 
    
    data = pd.DataFrame({
        "date": dates.append(dates).append(dates), 
        "variable_name": variables.repeat(len(dates)),
        "value": np.random.uniform(0, 4, len(dates) * 3)
    })
    
    data["value_cut"] = pd.cut(data.value, [0, 1, 2, 3, 4], right=False)
    
    (ggplot(data, aes(x='date', y='variable_name')) 
    + geom_point(aes(color='value_cut'), size=3, shape='s')
    + labs(x="", y="")
    + scale_x_date(breaks=date_breaks('1 month'), labels=date_format('%b'))
    + scale_color_manual(values=["#C5FABE", "#F5D562", "#E89A3F", "#CF3E3E"])
    + theme_classic() 
    )
    

    enter image description here