dataframecsvfile-uploadjuliapluto.jl

How to read an uploaded CSV file using Julia, Pluto.jl & PlutoUI.jl's FilePicker element


I am attempting to use the Filepicker element of the PlutoUI library

md""" Upload a Comma Separated Values (.csv) file to use: $(@bind user_csv FilePicker()) """

to allow the user to upload a CSV file for processing. Unfortunately, the data type isn't being detected and the data is being represented as a one dimensional Int64 array:

Dict("name"=>"mtg_binder.csv", "data"=>Int64[ 81 117 97 110 116 105 116 121 44 78 97 109 101 44 83 105 109 112 108 101 95 78 97 109 101 44 83 101 116 44 67 97 114 100 95 78 117 109 98 101 53 52 51 46 49 57 34 44 13 10], "type"=>"")

So, how do I handle/convert the Int64 array into something that I can push into a Dataframe?

Some things I've tried:

If I execute write(csv_path, user_csv["data"]) the csv file saves successfully, but I cannot read back the file with CSV.File(open(read, csv_path)) |> DataFrame; without getting empty rows between each row with data (not a big deal), and ArgumentError: Symbol name may not contain \0 errors. I can use normalizenames=true for the second problem, but the data becomes scrambled eggs and unusable.

I've also tried using StringEncodings to encode as UTF-8 and UTF-16 and no luck--it's still scrambled eggs.

Help?


Solution

  • Would this work in your use case?

    UInt8.(user_csv["data"]) |> IOBuffer |> CSV.File |> DataFrame
    

    This works by converting Int64s to bytes (UInt8). From there, user data can be put into an IOBuffer that can be fed to the CSV parser.

    The data you posted seems to have been truncated so I couldn't test on it. But on made up data (including UTF8 characters), this seems to work on my system. Here is an example outside Pluto:

    julia> d = [207,128,44,32,98,10,49,44,32,50,10]
    11-element Array{Int64,1}:
     207
     128
      44
     ...
    
    julia> using CSV, DataFrames
    
    julia> UInt8.(d) |> IOBuffer |> CSV.File |> DataFrame
    1×2 DataFrame
    │ Row │ π     │  b    │
    │     │ Int64 │ Int64 │
    ├─────┼───────┼───────┤
    │ 1   │ 1     │ 2     │