juliajulia-dataframe

Julia - data slicing not working in for loop - but working without for loop


I am trying to extract data of specific stock symbol from the data of all stocks through for loop. When I use the code out of for loop the code is working while the same code is not working in for loop.

Below is the code -

Working -

df = fh_5[fh_5.symbol .== "GOOG", ["date","close"]]

Not working -

for s in unique!(fh_5.symbol)
    df = fh_5[fh_5.symbol .== s, ["date","close"]]
    date_range = leftjoin(date_range, df, on =:"dates" => :"date")
end

Error

ERROR: BoundsError: attempt to access 6852038×8 DataFrame at index [Bool[1, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0], ["date", "close"]]
Stacktrace:
 [1] getindex(df::DataFrame, row_inds::BitVector, col_inds::Vector{String})
   @ DataFrames ~\.julia\packages\DataFrames\3mEXm\src\dataframe\dataframe.jl:448
 [2] top-level scope
   @ .\REPL[349]:2

And after I run the for loop the code which was working outside the for loop it does not work, I have to re import the csv file - the the code outside the for loop works if I run it first. Am I changing the the base dataset fh_5 while I am running the for loop?

Just to add the reproducible example - Data for the example

Below is the code used -

using DataFrames
using DataFramesMeta
using CSV
using Dates
using Query


fh_5 = CSV.read("D:\\Julia_Dataframe\\JuliaCon2020-DataFrames-Tutorial\\fh_5yrs.csv", DataFrame)

min_date = minimum(fh_5[:, "date"])
max_date = maximum(fh_5[:, "date"])
date_seq = string.(collect(Dates.Date(min_date) : Dates.Day(1) : Dates.Date(max_date)))
date_range = df = DataFrame(dates = date_seq)
date_range.dates = Date.(date_range.dates, "yyyy-mm-dd")

for s in unique(fh_5.symbol)
    df = fh_5[fh_5.symbol .== s, ["date","close"]]
    date_range = leftjoin(date_range, df, on =:"dates" => :"date")
    rename!(date_range, Dict(:close => s))
end

Solution

  • Don't use unique! for this, because that mutates the fh_5.symbol column. In other words, unique! removes the duplicate values from that column, which will change the length of that column. Use unique instead. So, something like this:

    for s in unique(fh_5.symbol)
        df = fh_5[fh_5.symbol .== s, ["date","close"]]
        date_range = leftjoin(date_range, df, on =:"dates" => :"date")
    end
    

    In Julia, by convention, functions with names that end in ! will mutate (some of) their arguments.