Considering, I have a json datafile named test_file.json
with the following content.
{"a": 1, "b": "hi", "c": 3}
{"a": 5, "b": null, "c": 7}
Here how I can read the file in With DataFrame API of DataFusion:
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let file_path = "datalayers/landing/test_file.json";
let mut ctx = SessionContext::new();
let df = ctx.read_json(file_path, NdJsonReadOptions::default()).await?;
df.show().await?;
Ok(())
I would like to do the following operation:
""
string either using fill na or case when statementcol("a") + col("b")
I have tried to went through the api documentation but could not find any function like with_column
which spark has to add a new column and also how to impute the null values.
To add two columns I can do that with column expression col("a").add(col("c")).alias("d")
but I was curious to know if it is possible to use something like with_column
to add a new column.
DataFusion's DataFrame does not currently have a with_column
method but I think it would be good to add it. I filed an issue for this - https://github.com/apache/arrow-datafusion/issues/2844
Until that is added, you could call https://docs.rs/datafusion/9.0.0/datafusion/dataframe/struct.DataFrame.html#method.select to select the existing columns as well as the new expression:
df.select(vec![col("a"), col("b"), col("c"), col("a").add(col("c")).alias("d")]);