rdataframefor-loopmapply

How to get a dataframe output using mapply


I have a for loop im trying to convert to a mapply, as I have read that it is faster than for (for loop takes about 2 minutes).

The loop does this: it creates subsets filtering by the unique names of column "OrdenFab" and then, it keeps the unrepeated values on the "Valor" column.

Then it adds this filtered subset to a new data frame, and it keeps adding them all as the loop goes on, getting a filtered dataframe with no repeated values in column "Valor" for each unique value of the column "OrdenFab".

i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
for (j in i){
  datapesomoldetemp<-datapesomolde%>%
    filter(OrdenFab==j)%>%
    filter(!duplicated(Valor))
  datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp)
}

Original dataframe is this one (first 20 rows, it has 20626):

    > datapesomolde
    PartNumber  OrdenFab     Valor      Fecha_Registro LimInf LimSup Nominal
1     11012501 226549204  14.50000 2022-04-25 07:18:00  12.65  14.71   13.68
2     11012501 226549204  14.50000 2022-04-25 07:18:00  12.65  14.71   13.68
3     11013610 226548648  47.30000 2022-04-25 05:52:00  42.38  49.26   45.82
4     11013047 226548234  15.20000 2022-04-23 02:47:00  14.43  16.77   15.60
5     11013047 226548234  15.20000 2022-04-23 02:47:00  14.43  16.77   15.60
6     11013047 226548234  15.20000 2022-04-23 02:48:00  14.43  16.77   15.60
7     11013047 226548234  15.20000 2022-04-23 02:48:00  14.43  16.77   15.60
8     11013052 226548332  16.30000 2022-04-23 01:49:00  15.63  18.17   16.90
9     11013052 226548332  16.30000 2022-04-23 01:49:00  15.63  18.17   16.90
10    11013052 226548332  16.30000 2022-04-23 01:49:00  15.63  18.17   16.90
11    11013052 226548332  16.30000 2022-04-23 01:49:00  15.63  18.17   16.90
12    11012501 226548204  14.70000 2022-04-23 01:44:00  12.65  14.71   13.68
13    11012501 226548204  14.70000 2022-04-23 01:44:00  12.65  14.71   13.68
14    11012501 226548200  14.55000 2022-04-23 01:43:00  12.65  14.71   13.68
15    11012501 226548200  14.55000 2022-04-23 01:43:00  12.65  14.71   13.68
16    11012501 226548201  14.65000 2022-04-23 01:42:00  12.65  14.71   13.68
17    11012501 226548201  14.65000 2022-04-23 01:42:00  12.65  14.71   13.68
18    11013943 226548154 134.00000 2022-04-23 00:07:00 131.76 153.13  142.44
19    11013943 226547066 144.00000 2022-04-22 23:31:00 131.76 153.13  142.44
20    11013050 226547200  15.10000 2022-04-22 23:27:00  14.34  16.66   15.50

Filtered result is this one (first 10 rows):

>datapesomoldefiltered    
PartNumber  OrdenFab     Valor      Fecha_Registro LimInf  LimSup Nominal
    1     11012501 226549204  14.50000 2022-04-25 07:18:00  12.65   14.71   13.68
    2     11013610 226548648  47.30000 2022-04-25 05:52:00  42.38   49.26   45.82
    3     11013047 226548234  15.20000 2022-04-23 02:47:00  14.43   16.77   15.60
    4     11013052 226548332  16.30000 2022-04-23 01:49:00  15.63   18.17   16.90
    5     11012501 226548204  14.70000 2022-04-23 01:44:00  12.65   14.71   13.68
    6     11012501 226548200  14.55000 2022-04-23 01:43:00  12.65   14.71   13.68
    7     11012501 226548201  14.65000 2022-04-23 01:42:00  12.65   14.71   13.68
    8     11013943 226548154 134.00000 2022-04-23 00:07:00 131.76  153.13  142.44
    9     11013943 226547066 144.00000 2022-04-22 23:31:00 131.76  153.13  142.44
    10    11013050 226547200  15.10000 2022-04-22 23:27:00  14.34   16.66   15.50

I'm struggling to convert it to mapply, I am getting a Matrix not a dataframe.

I have tried this:

i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
limpiarof<-function(i){
  subset<-filter(datapesomolde,OrdenFab==i)
  datapesomoldetemp<-filter(subset,!duplicated(subset$Valor))
 return(datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp))
  
}
datapesomoldefiltered<-mapply(limpiarof,i)

With this try I get a Matrix of 2.2GB, it just has the value of all the columns for each unique value of the "OrdenFab" column.

result of mapply

Can you help me please? Thanks in advance.


Solution

  • I would suggest solving this problem using a more abstract approach, using e.g. tidyverse:

    This should be much faster and clearer:

    library(tidyverse)
    
    datapesomoldefiltered <- 
      datapesomolde |>
      group_by(OrdenFab) |>
        distinct(Valor, .keep_all = TRUE) |>
      ungroup()
    
    datapesomoldefiltered