I'm new to polars and encountering a confusing error.
I'm trying to take several array columns and zip them into struct columns. When I try to do this with with_columns I encounter the error:
ValueError: can only call `.item()` if the dataframe is of shape (1, 1), or if explicit row/col values are provided; frame has shape (4, 2)
Here is code to reproduce this problem:
df = pl.DataFrame(
{
"a": [[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4]],
"b": [[1, 2, 3, 5],[1, 2, 3, 5],[1, 2, 3, 5],[1, 2, 3, 5]],
"c": [[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4]],
"d": ['a', 'b', 'c', 'd']
}
)
df.with_columns([
(df.explode('a', 'b')
.select(
"a",
"b",
"d",
pl.struct('a', 'b').alias("test_1"))
.group_by("d")
.agg("test_1")),
(df.explode('b', 'c')
.select(
"c",
"b",
"d",
pl.struct('b', 'c').alias("test_2"))
.group_by("d")
.agg("test_2")),
]
)
With a single struct column (and no list in the method call) this works just as expected and yields the output:
a b c d test_1
list[i64] list[i64] list[i64] str list[struct[2]]
[1, 2, … 4] [1, 2, … 5] [1, 2, … 4] "d" [{1,1}, {2,2}, … {4,5}]
[1, 2, … 4] [1, 2, … 5] [1, 2, … 4] "b" [{1,1}, {2,2}, … {4,5}]
[1, 2, … 4] [1, 2, … 5] [1, 2, … 4] "c" [{1,1}, {2,2}, … {4,5}]
[1, 2, … 4] [1, 2, … 5] [1, 2, … 4] "a" [{1,1}, {2,2}, … {4,5}]
However, even putting this single operation into a list in the method call creates this error:
df.with_columns([
(df.explode('a', 'b')
.select(
"a",
"b",
"d",
pl.struct('a', 'b').alias("test_1"))
.group_by("d")
.agg("test_1")),]
)
I'm sure this is some sort of simple error, but I cant' find any information on the cause and solution to this.
Compute test_1
and test_2
as separate DataFrames.
Use join
to combine test_1
and test_2
with the original DataFrame.
Avoid passing complete DataFrames to with_columns()
import polars as pl
df = pl.DataFrame(
{
"a": [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]],
"b": [[1, 2, 3, 5], [1, 2, 3, 5], [1, 2, 3, 5], [1, 2, 3, 5]],
"c": [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]],
"d": ['a', 'b', 'c', 'd']
}
)
test_1 = (
df.explode("a", "b")
.select(
"a",
"b",
"d",
pl.struct("a", "b").alias("test_1")
)
.group_by("d")
.agg(pl.col("test_1"))
)
test_2 = (
df.explode("b", "c")
.select(
"c",
"b",
"d",
pl.struct("b", "c").alias("test_2")
)
.group_by("d")
.agg(pl.col("test_2"))
)
result = df.join(test_1, on="d").join(test_2, on="d")
print(result)
Result is in graph.