pythonplotplotlyparallel-coordinates

Parallel Coordinate plot in plotly with continuous and categorical data


Let's say I have some dataframe df with continuous and categorical data. Now I'd like to make a parallel-coordinate plot in plotly that contains both types of coordinates. Is it possible to combine these into one plot such that each datapoint line goes through all axes?

In the documentation I did find go.Parcoords and go.Parcats that treat these separately, but I didn't find a way to combine them. This is my minimal example:

import pandas as pd
import plotly.graph_objs as go
df = pd.DataFrame()
# continuous data
df['x1'] = [1,2,3,4]
df['x2'] = [9,8,7,6]
# categorical data
df['x3'] = ['a', 'b', 'b', 'c']
df['x4'] = ['A', 'B', 'C', 'C']
col_list = [dict(range=(df[col].min(), df[col].max()),
                 label=col,
                 values=df[col])
            for col in df.keys()
            #if col not in ['x3', 'x4']  # only works if we exclude these (uncomment to run)
            ]
fig = go.Figure(data=go.Parcoords(dimensions=col_list))
fig.show()

Solution

  • Here is a solution based on customizing the tick names (ticktext). First we replace each categorical value with an integer, and then we define our custom ticks with the corresponding categorical value as a string:

    import pandas as pd
    import plotly.graph_objs as go
    df = pd.DataFrame()
    df['x1'] = [1,2,3,4]
    df['x2'] = [9,8,7,6]
    df['x3'] = ['a', 'b', 'b', 'c']
    df['x4'] = ['A', 'B', 'C', 'C']
    keys = df.keys()
    categorical_columns = ['x3', 'x4']
    col_list = []
    
    for col in df.keys():
        if col in categorical_columns:  # categorical columns
            values = df[col].unique()
            value2dummy = dict(zip(values, range(len(values))))  # works if values are strings, otherwise we probably need to convert them
            df[col] = [value2dummy[v] for v in df[col]]
            col_dict = dict(
                label=col,
                tickvals=list(value2dummy.values()),
                ticktext=list(value2dummy.keys()),
                values=df[col],
            )
        else:  # continuous columns
            col_dict = dict(
                range=(df[col].min(), df[col].max()),
                label=col,
                values=df[col],
            )
        col_list.append(col_dict)
    fig = go.Figure(data=go.Parcoords(dimensions=col_list))
    fig.show()