I am new to R trying to rewrite an R code in sparkR. One of the operations on data.table named costTbl (which has 5 other columns) is
costTbl[,cost:=na.locf(cost,na.rm=FALSE),by=product_id]
costTbl[,cost:=na.locf(cost,na.rm=FALSE, fromLast=TRUE),by=product_id]
I am unable to find an equivalent operation in sparkR. I thought gapply can be used by grouping the df on product_id and performing this operation. But I am not able to make the code work.
Is gapply the right approach? Is there some other way for achieving this?
I was finally able to use SparkR UDFs to perform locf using the existing native R code.
We can use gapply for this use case, by grouping my dataframe on the column product_id
.
Have shared my findings here : https://shbhmrzd.medium.com/stl-and-holt-from-r-to-sparkr-1815bacfe1cc