How could I split a column of string into list of list?
Minimum example:
df = pl.DataFrame({'test': "A,B,C,1\nD,E,F,2\nG,H,I,3\nJ,K,L,4"})
I try the following, somehow I stop after the first split
df = df.with_columns(pl.col('test').str.split('\n'))
My desired result would be it return a list of list inside the dataframe, so that the list of list is readily to be read by other columns
result = pl.DataFrame({'test': [[["A","B","C",1], ["D","E","F",2], ["G","H","I",3], ["J","K","L",4]]]}, strict=False)
result = result.with_columns(
get_data = pl.col('test').list[2].list[3].cast(pl.Int64) # Answer = 3
)
result.glimpse()
Rows: 1
Columns: 2
$ test <list[list[str]]> [['A', 'B', 'C', '1'], ['D', 'E', 'F', '2'], ['G', 'H', 'I', '3'], ['J', 'K', 'L', '4']]
$ get_data <i64> 3
df.with_columns(
pl.col('test')
.str.split('\n')
.list.eval(
pl.element()
.str.split(",")
)
)
In your example you have a list of mixed strings and numbers which polars doesn't support so your output has to have the numbers as strings.
You say you want to use these lists from other columns readily so you might want to convert to a struct column and unnest it so that you have new flat columns.