pythondataframepython-polarsaltair

Change color of single line in altair line chart based on other indicator column


Imagine having the following polars dataframe "df" that contains the temperature of a machine that is either "active" or "inactive":

import polars as pl
from datetime import datetime

df_content = {'timestamp': [datetime(2025, 10, 31, 23, 0), datetime(2025, 10, 31, 23, 1), datetime(2025, 10, 31, 23, 2), datetime(2025, 10, 31, 23, 3), datetime(2025, 10, 31, 23, 4), datetime(2025, 10, 31, 23, 5), datetime(2025, 10, 31, 23, 6), datetime(2025, 10, 31, 23, 7), datetime(2025, 10, 31, 23, 8), datetime(2025, 10, 31, 23, 9), datetime(2025, 10, 31, 23, 10), datetime(2025, 10, 31, 23, 11), datetime(2025, 10, 31, 23, 12), datetime(2025, 10, 31, 23, 13), datetime(2025, 10, 31, 23, 14), datetime(2025, 10, 31, 23, 15), datetime(2025, 10, 31, 23, 16), datetime(2025, 10, 31, 23, 17), datetime(2025, 10, 31, 23, 18), datetime(2025, 10, 31, 23, 19)],
 'temperature': [61.3, 61.39, 60.86, 61.95, 60.72, 60.72, 61.15, 59.97, 60.39, 60.46, 61.88, 61.04, 60.67, 60.31, 60.64, 60.64, 60.99, 61.49, 61.13, 60.2],
 'indicator': ['inactive', 'inactive', 'inactive', 'active', 'active', 'active', 'active', 'active', 'inactive', 'inactive', 'inactive', 'inactive', 'inactive', 'active', 'active', 'active', 'active', 'active', 'inactive', 'inactive']}

df = pl.DataFrame(df_content)
df

timestamp temperature indicator
2025-10-31 23:00:00 61.3 inactive
2025-10-31 23:01:00 61.39 inactive
2025-10-31 23:02:00 60.86 inactive
2025-10-31 23:03:00 61.95 active
2025-10-31 23:04:00 60.72 active
2025-10-31 23:05:00 60.72 active
2025-10-31 23:06:00 61.15 active
2025-10-31 23:07:00 59.97 active
2025-10-31 23:08:00 60.39 inactive
2025-10-31 23:09:00 60.46 inactive
2025-10-31 23:10:00 61.88 inactive
2025-10-31 23:11:00 61.04 inactive
2025-10-31 23:12:00 60.67 inactive
2025-10-31 23:13:00 60.31 active
2025-10-31 23:14:00 60.64 active
2025-10-31 23:15:00 60.64 active
2025-10-31 23:16:00 60.99 active
2025-10-31 23:17:00 61.49 active
2025-10-31 23:18:00 61.13 inactive
2025-10-31 23:19:00 60.2 inactive

The data can be plotted with altair using the following syntax:

df.plot.line(
    x="timestamp",
    y=alt.Y("temperature"),
).properties(width=1000)

We just get a single line for the temperature.

enter image description here

But I'd like this single line to have have two separate colors for different time spans. For example for the time span from "2025-10-31 23:00:00" to "2025-10-31 23:02:00" where the machine was "inactive" I'd like the line to be blue (as well as for all other time spans where the machine was inactive) while for the time span from "2025-10-31 23:03:00" to "2025-10-31 23:07:00" where the machine was "active" (as well for all other time spans where the machine was active) the color should be red.

Adding the "color" argument to the plot does not help. The following code leads to to separate lines instead of one line:

df.plot.line(
    x="timestamp",
    y=alt.Y("temperature"),
    color="indicator"
).properties(width=1000)

enter image description here


Solution

  • You can shift the data to create x2 and y2 channels to plot each segment individually.

    df = (
        pl.DataFrame(df_content)
        .with_columns(
            pl.col("timestamp").shift(-1).alias("timestamp_next"),
            pl.col("temperature").shift(-1).alias("temperature_next"),
        )
        .drop_nulls(["timestamp_next", "temperature_next"])
    )
    
    df.plot.line(
        x=alt.X("timestamp", title='timestamp'),
        x2="timestamp_next",
        y=alt.Y("temperature", title='temperature'),
        y2="temperature_next",
        color="indicator",
    ).properties(width=1000)
    

    Altair plot

    This colors each segment according to the value of indicator at the start of the interval.