julia

How do I convert the type of a Dataframe column in Julia?


I have some data that is badly formatted. Specifically I have numeric columns that have some elements with spurious text in them (e.g. "8 meters" instead of "8"). I want to use readtable to read in the data, make the necessary fixes to the data and then convert the column to a Float64 so that it behaves correctly (comparison, etc).

There seems to have been a macro called @transform that would do the conversion but it has been deleted. How do I do this now?

My best solution at the moment is to clean up the data, write it out as a csv and then re-read it using readtable and specify eltypes. But that is horrible.

What else can I do?


Solution

  • There is no need to run things via a csv file. You can change or update the DataFrame directly.

    using DataFrames
    # Lets make up some data
    df=DataFrame(A=rand(5),B=["8", "9 meters", "4.5", "3m", "12.0"])
    
    # And then make a function to clean the data
    function fixdata(arr)
        result = DataArray(Float64, length(arr))
        reg = r"[0-9]+\.*[0-9]*"
        for i = 1:length(arr)
            m = match(reg, arr[i])
            if m == nothing
                result[i] = NA
            else
                result[i] = float64(m.match)
            end
        end
        result
    end
    
    # Then just apply the function to the column to clean the data
    # and then replace the column with the cleaned data.
    df[:B] = fixdata(df[:B])