Imagine having the following polars dataframe "df" that contains the temperature of a machine that is either "active" or "inactive":
import polars as pl
from datetime import datetime
df_content = {'timestamp': [datetime(2025, 10, 31, 23, 0), datetime(2025, 10, 31, 23, 1), datetime(2025, 10, 31, 23, 2), datetime(2025, 10, 31, 23, 3), datetime(2025, 10, 31, 23, 4), datetime(2025, 10, 31, 23, 5), datetime(2025, 10, 31, 23, 6), datetime(2025, 10, 31, 23, 7), datetime(2025, 10, 31, 23, 8), datetime(2025, 10, 31, 23, 9), datetime(2025, 10, 31, 23, 10), datetime(2025, 10, 31, 23, 11), datetime(2025, 10, 31, 23, 12), datetime(2025, 10, 31, 23, 13), datetime(2025, 10, 31, 23, 14), datetime(2025, 10, 31, 23, 15), datetime(2025, 10, 31, 23, 16), datetime(2025, 10, 31, 23, 17), datetime(2025, 10, 31, 23, 18), datetime(2025, 10, 31, 23, 19)],
'temperature': [61.3, 61.39, 60.86, 61.95, 60.72, 60.72, 61.15, 59.97, 60.39, 60.46, 61.88, 61.04, 60.67, 60.31, 60.64, 60.64, 60.99, 61.49, 61.13, 60.2],
'indicator': ['inactive', 'inactive', 'inactive', 'active', 'active', 'active', 'active', 'active', 'inactive', 'inactive', 'inactive', 'inactive', 'inactive', 'active', 'active', 'active', 'active', 'active', 'inactive', 'inactive']}
df = pl.DataFrame(df_content)
df
| timestamp | temperature | indicator |
|---|---|---|
| 2025-10-31 23:00:00 | 61.3 | inactive |
| 2025-10-31 23:01:00 | 61.39 | inactive |
| 2025-10-31 23:02:00 | 60.86 | inactive |
| 2025-10-31 23:03:00 | 61.95 | active |
| 2025-10-31 23:04:00 | 60.72 | active |
| 2025-10-31 23:05:00 | 60.72 | active |
| 2025-10-31 23:06:00 | 61.15 | active |
| 2025-10-31 23:07:00 | 59.97 | active |
| 2025-10-31 23:08:00 | 60.39 | inactive |
| 2025-10-31 23:09:00 | 60.46 | inactive |
| 2025-10-31 23:10:00 | 61.88 | inactive |
| 2025-10-31 23:11:00 | 61.04 | inactive |
| 2025-10-31 23:12:00 | 60.67 | inactive |
| 2025-10-31 23:13:00 | 60.31 | active |
| 2025-10-31 23:14:00 | 60.64 | active |
| 2025-10-31 23:15:00 | 60.64 | active |
| 2025-10-31 23:16:00 | 60.99 | active |
| 2025-10-31 23:17:00 | 61.49 | active |
| 2025-10-31 23:18:00 | 61.13 | inactive |
| 2025-10-31 23:19:00 | 60.2 | inactive |
The data can be plotted with altair using the following syntax:
df.plot.line(
x="timestamp",
y=alt.Y("temperature"),
).properties(width=1000)
We just get a single line for the temperature.
But I'd like this single line to have have two separate colors for different time spans. For example for the time span from "2025-10-31 23:00:00" to "2025-10-31 23:02:00" where the machine was "inactive" I'd like the line to be blue (as well as for all other time spans where the machine was inactive) while for the time span from "2025-10-31 23:03:00" to "2025-10-31 23:07:00" where the machine was "active" (as well for all other time spans where the machine was active) the color should be red.
Adding the "color" argument to the plot does not help. The following code leads to to separate lines instead of one line:
df.plot.line(
x="timestamp",
y=alt.Y("temperature"),
color="indicator"
).properties(width=1000)
You can shift the data to create x2 and y2 channels to plot each segment individually.
df = (
pl.DataFrame(df_content)
.with_columns(
pl.col("timestamp").shift(-1).alias("timestamp_next"),
pl.col("temperature").shift(-1).alias("temperature_next"),
)
.drop_nulls(["timestamp_next", "temperature_next"])
)
df.plot.line(
x=alt.X("timestamp", title='timestamp'),
x2="timestamp_next",
y=alt.Y("temperature", title='temperature'),
y2="temperature_next",
color="indicator",
).properties(width=1000)
This colors each segment according to the value of indicator at the start of the interval.