juliapass-by-referencedataframes.jl

DataFrame Column Subset by Reference


I have a DataFrame x with two UInt32 columns. The following function searches for a value from the first column in a subset of the second column:

findfirst(==(x[y, 1]), x[1:(y - 1), 2])

y is a scalar, e.g. 10. If I understand https://dataframes.juliadata.org/stable/lib/indexing correctly, x[1:(y - 1), 2]) copies the region of the DataFrame. How can I have findfirst search that part of x in place, i.e. by reference?


Solution

  • You can use @view:

    findfirst(==(x[y, 1]), @view x[1:(y - 1), 2])
    

    Explanation

    Suppose you have df = DataFrame(rand(1:99,5,2), :auto); than indexing does copying (as in your question) while @view yields a reference to the data:

    julia> df[1:3, 2]
    3-element Vector{Int64}:
     62
     88
     11
    
    julia> @view df[1:3, 2]
    3-element view(::Vector{Int64}, 1:3) with eltype Int64:
     62
     88
     11
    

    Also note that if you want a reference to the entire column you can do df[!, 2]. Look at the two codes below (one gives the reference while the other copies the data and hence the number of allocated bytes).

    julia> @allocated df[!, 2]
    0
    
    julia> @allocated df[:, 2]
    96