pythonpython-polars

Polars `read_csv()` to read from string and not from file


Is it possible to read from string with pl.read_csv() ? Something like this, which would work :

content = """c1, c2
            A,1
            B,3
            C,2"""
pl.read_csv(content)

I know of course about this :

pl.DataFrame({"c1":["A", "B", "C"],"c2" :[1,3,2]})

But it is error-prone with long tables and you have to count numbers to know which value to modify.

I also know about dictionaries but I have more than 2 columns in my real life example.

Context: I used to fread() content with R data.table and it was very useful, especially when you want to convert a column with the help of a join, instead of complicated ifelse() statements

Thanks !


Solution

  • pl.read_csv() accepts IO as source parameter.

    source: str | Path | IO[str] | IO[bytes] | bytes

    So you can use io.StringIO:

    from io import StringIO
    
    content = """
    c1,c2
    A,1
    B,3
    C,2
    """
    
    data = StringIO(content)
    pl.read_csv(data)
    
    shape: (3, 2)
    ┌─────┬─────┐
    │ c1  ┆ c2  │
    │ --- ┆ --- │
    │ str ┆ i64 │
    ╞═════╪═════╡
    │ A   ┆ 1   │
    │ B   ┆ 3   │
    │ C   ┆ 2   │
    └─────┴─────┘
    

    As you can see above, you can also pass bytes as source parameter. You can use str.encode() method for that:

    content = """
    c1,c2
    A,1
    B,3
    C,2
    """
    
    pl.read_csv(content.encode())
    
    shape: (3, 2)
    ┌─────┬─────┐
    │ c1  ┆ c2  │
    │ --- ┆ --- │
    │ str ┆ i64 │
    ╞═════╪═════╡
    │ A   ┆ 1   │
    │ B   ┆ 3   │
    │ C   ┆ 2   │
    └─────┴─────┘