pythonplotlyplotly.graph-objects

How do I get the markers in my scatterplot to be connected by lines only along one axis?


I have been trying to create a scatter plot using the plotly package, but I keep running into a weird problem with how the plot is formatted. The plot I am trying to make has a categorical X axis and a continuous Y axis. What I want is for markers to delineate each point on the plot with markers, and then to have each marker connected to each other with a line. This sounds like it should be a relatively simple formatting task, but I have been unable to get it to work.

Here is a snapshot of an example dataframe that I am inputting into the code: example dataframe

Here is the current code that I am using, where df is my input dataframe:

grouping = df.set_index(sample_col)[group_col].to_dict()
fig = make_subplots(2, 1, subplot_titles=subplot_titles)
n = 0
colors = ["red","blue"]
colors_dict = {color_col_value:color for color_col_value,color in zip(list(df[class_col].unique()), colors)}
symbols = ["diamond", "arrow"]
symbols_dict = {id_type:symbol for id_type,symbol in zip(df[class_col].unique(), symbols)}
for index, gdf in enumerate(df.groupby([class_col])):
    m, gdf = gdf
    gdf = natsort_column(gdf, sample_col).reset_index(drop=True)
    gdf[group_col] = gdf[sample_col].map(grouping)
    fig.append_trace(go.Scatter(x=[gdf[group_col], gdf[sample_col], gdf['Sequence']],
                                        y=gdf[intensity_col],
                                        name=m,
                                        mode='markers',
                                        marker=dict(symbol=symbols_dict[m], size=12, color=colors_dict[m]),
                                        legendgroup='group{}'.format(index),
                                        showlegend=True),n,1)
    n+=1

fig.update_layout(template='plotly_white', height=1000, width=800)
fig.update_xaxes(categoryorder='array', categoryarray=sorted(samples))

When I have the mode of the scatter plot set to markers, the plot looks like this: plot example markers

However, I really want the markers to be connected with lines. But when I set mode='lines+markers', I get a plot that looks like this:

plot example lines+markers

The markers are all connected both along the x and y axes. This is frustrating because I only want the markers to be connected along the x axis, where the markers are connected based on their corresponding sequence. This does mean that not all markers would be connected between different samples, but this what I would want anyways. Connecting the markers along the intensity values is not useful for what I'm trying to visualize at all. My suspicion is the root of this problem is that the x axis is multi-category, but I'm not sure how I can amend this.

I don't know why this is not working and it would be really helpful if anyone could point out the right direction to me.


Solution

  • Ok so the answer turned out to be a lot more simple than I anticipated. Basically my code needed to be set up in almost the exact same way, plus one key change: I needed one more for loop over each sequence. I had a fundamental misunderstanding of how information is added to the plot, but now it is clear.

    So the code now looks like:

    grouping = df.set_index(sample_col)[group_col].to_dict()
    fig = make_subplots(2, 1, subplot_titles=subplot_titles)
    n = 0
    colors = ["red","blue"]
    colors_dict = {color_col_value:color for color_col_value,color in zip(list(df[class_col].unique()), colors)}
    symbols = ["diamond", "arrow"]
    symbols_dict = {id_type:symbol for id_type,symbol in zip(df[class_col].unique(), symbols)}
    for index, gdf in enumerate(df.groupby([class_col])):
        m, gdf = gdf
        gdf = natsort_column(gdf, sample_col).reset_index(drop=True)
        gdf[group_col] = gdf[sample_col].map(grouping)
        for sequence in gdf['Sequence'].unique():
            gdf_seq = gdf[gdf['Sequence']==sequence]
            fig.append_trace(go.Scatter(x=[gdf_seq[group_col], gdf_seq[sample_col]],
                                            y=gdf_seq[intensity_col],
                                            name=m,
                                            mode='markers',
                                            marker=dict(symbol=symbols_dict[m], size=12, color=colors_dict[m]),
                                            legendgroup='group{}'.format(index),
                                            showlegend=True),n,1)
        n+=1
    
    fig.update_layout(template='plotly_white', height=1000, width=800)
    fig.update_xaxes(categoryorder='array', categoryarray=sorted(samples))
    

    The plot is now properly configured, with one line and set of markers per sequence. I still have multicategory x axis, but with only two categories (group and sample), as that seems to work best with the plotly's current setup.