So I loaded two datasets from a csv and then merged them using a leftjoin
:
using CSV
using DataFrames
using CodecZstd
df1 = CSV.read(joinpath(root, "data", "raw", "df1.csv"), DataFrame)
df2 = CSV.read(joinpath(root, "data", "raw", "df2.csv"), DataFrame)
merged = leftjoin(df1, df2, on=:id)
Now I want to write the merged dataframe to disk as a .zst
compressed file (Zstandard compression).
I was successful in first writing to .csv
then reading then writing again as .zst
but is there a way to directly convert a DataFrame
into an array of bytes to be able to save to disk?
There are several options. The one built-in into Julia is to serialize a data frame. You can achieve this by using the Serialialization
standard library. It offers two functions serialize
for serialization of streams and deserialize
for their deserialization. Then you can use CodecZstd.jl to compress the serialized stream and save it to disk.
Note that when you use serialization it is your responsibility to ensure that the Julia and package versions are consistent between the Julia session where you write data and where you read your data.