Let's say I have some dataframe df
with continuous and categorical data. Now I'd like to make a parallel-coordinate plot in plotly that contains both types of coordinates. Is it possible to combine these into one plot such that each datapoint line goes through all axes?
In the documentation I did find go.Parcoords
and go.Parcats
that treat these separately, but I didn't find a way to combine them.
This is my minimal example:
import pandas as pd
import plotly.graph_objs as go
df = pd.DataFrame()
# continuous data
df['x1'] = [1,2,3,4]
df['x2'] = [9,8,7,6]
# categorical data
df['x3'] = ['a', 'b', 'b', 'c']
df['x4'] = ['A', 'B', 'C', 'C']
col_list = [dict(range=(df[col].min(), df[col].max()),
label=col,
values=df[col])
for col in df.keys()
#if col not in ['x3', 'x4'] # only works if we exclude these (uncomment to run)
]
fig = go.Figure(data=go.Parcoords(dimensions=col_list))
fig.show()
Here is a solution based on customizing the tick names (ticktext
). First we replace each categorical value with an integer, and then we define our custom ticks with the corresponding categorical value as a string:
import pandas as pd
import plotly.graph_objs as go
df = pd.DataFrame()
df['x1'] = [1,2,3,4]
df['x2'] = [9,8,7,6]
df['x3'] = ['a', 'b', 'b', 'c']
df['x4'] = ['A', 'B', 'C', 'C']
keys = df.keys()
categorical_columns = ['x3', 'x4']
col_list = []
for col in df.keys():
if col in categorical_columns: # categorical columns
values = df[col].unique()
value2dummy = dict(zip(values, range(len(values)))) # works if values are strings, otherwise we probably need to convert them
df[col] = [value2dummy[v] for v in df[col]]
col_dict = dict(
label=col,
tickvals=list(value2dummy.values()),
ticktext=list(value2dummy.keys()),
values=df[col],
)
else: # continuous columns
col_dict = dict(
range=(df[col].min(), df[col].max()),
label=col,
values=df[col],
)
col_list.append(col_dict)
fig = go.Figure(data=go.Parcoords(dimensions=col_list))
fig.show()