rautomationprompt

Automating a Script for Easier / Future Use


New user to R here. I have just completed a quality control script for some 2023 data I have for my job. I have attached the script below. What I want to do is make this a more automated process so someone in the future can easily use this script. The only thing that should be changing is the year selected. Ideally (not sure if this is possible / relatively easy) what I would want is to open the script, have a prompt asking for the year they are wanting to do QC on (i.e. enter for example 2021, 2019, etc.) and then run the script, plugging in the year into the appropriate section of the code (which I have put in asterix). I am new to R so not sure how easy this is to go about, but any help would be much much appreciated (along with some explanation if you don't mind).

# QC Check Function Building Blocks ---------------------------------------

#Bring in QC Results, QC Samples, and Results Tables and Filter Out Unneeded Columns
fn.importData(MDBPATH="C:/Users/h2edhmrs/Desktop/DASLER_TEST_COPY.mdb",
              TABLES=c("Analytes"))

"QC-Samples" <- `QC Samples` %>% select(LOC_ID, QC_SAMPLE, SAMPLE_DEPTH, QC_TYPE, ASSOC_SAMP)
"QC-Results" <- `QC Results` %>%  select(Loc_ID, QC_Sample, Units, Value, Text_Value, QC_Type, Storet_Num)
'Results_' <- `Results` %>% select(Loc_ID, Sample_Num, Units, Value, Storet_Num, Text_Value)

**#Only get 2023 records from QC Samples table
'QC-Samples' <- `QC-Samples` %>% filter(substr(QC_SAMPLE,1,4)=="2023")**

#Rename QC-Results QC_Sample column to match name in QC-Samples table
colnames(`QC-Results`)[2] <- 'QC_SAMPLE'

#Merge Results and Samples table to get full 2023 QC records
QC_Results <- merge(`QC-Results`, `QC-Samples`[ ,c("QC_SAMPLE", "ASSOC_SAMP")], by = "QC_SAMPLE")

#Now must get associated samples into table
#To do this, I will rename "Sample_Num" column in Results_ table to ASSOC_SAMP and then merge the two
colnames(Results_)[2] <- "ASSOC_SAMP"
QCandResults <- merge(QC_Results, Results_[,c("ASSOC_SAMP","Storet_Num", "Units", "Value", "Text_Value")], by = c("ASSOC_SAMP", "Storet_Num"))

#rename columns of QCandResults for clarity
colnames(QCandResults)[c(1,5,6,7,9,10,11)] <- c("Sample_Num", "Units_QC", "Value_QC", "Text_Value_QC", "Units_Results", "Value_Results", "Text_Value_Results")

#matching Storet_num to display the parameter name in the QCandResults table
colnames(Analytes)[1] <- "Storet_Num"
QCandResults <- (merge(Analytes[,c("Storet_Num", "anl_short")], QCandResults, by = "Storet_Num"))
QCandResults <- QCandResults[-1]

#Making only dups and splits in the table
QCandResults <- subset(QCandResults, QC_Type ==  c("DUP", "SPL"))


# Relative Percent Difference Function ------------------------------------

#Developing Relative Percent Difference Function
RPD = \(x1, x2) {
  x1[is.na(x1)] = 0L; x2[is.na(x2)] = 0L
  abs((x1 - x2) / ((x1 + x2) * 0.5)) * 100
}

QCandResults <- transform(QCandResults, RPD = RPD(Value_Results, Value_QC))

#Creating pass column and then creating stat for how many QC failed
QCandResults <- transform(QCandResults, Pass = if_else(RPD > 20, "N", ""))

(sum(QCandResults$Pass == "N", na.rm=T) / nrow(QCandResults))

#export to xl
write.xlsx(QCandResults, "QC2023.xlsx")

Solution

  • Lets take this example from the gapminder dataset. I am just creating a string to extract the year from since that seems to be what you are doing in your code. What you can do is just all your existing code in a function. The only thing that you would need to modify is what you are passing to filter. I added {{}} to sub where your variable name normally goes. The {{}} tell dplyr to evaluate whatever column you provide as an argument to year_col instead of the value of year_col. paste is going literally just copy and pasting the argument provided to year select.

    library(gapminder)
    library(dplyr)
    
    make_string = gapminder |>
      mutate(year_string = paste(country, "in", year))
    
    
    filter_year = \(data, year_col, year_to_select){
       
      filter_data = data |>
        filter(sub(".*([0-9]{4}).*", '\\1' , {{year_col}}) == paste(year_to_select)
    
      filter_data
    }
    
    examp = filter_year(make_string, year_col = year_string, year_to_select = '1952')
      
    check = make_string |>
      filter(year == 1952)
    
    identical(examp, check)
    #> [1] TRUE
    

    So in your example code it would look something like this

    
    cleaning_fun = \(year_select, year_col){
      "QC-Samples" <- `QC Samples` %>% 
    select(LOC_ID, QC_SAMPLE, SAMPLE_DEPTH, QC_TYPE, ASSOC_SAMP)
    "QC-Results" <- `QC Results` %>%  select(Loc_ID, QC_Sample, Units, Value, Text_Value, QC_Type, Storet_Num)
    'Results_' <- `Results` %>% 
    select(Loc_ID, Sample_Num, Units, Value, Storet_Num, Text_Value)
    
    
    'QC-Samples' <- `QC-Samples` %>% 
       filter(substr({{year_col}},1,4)== paste(year_select))
    
    ## the rest of your code goes below here 
    
    QCandResults
    }
    
    

    Created on 2024-07-31 with reprex v2.1.1