In R Studio (or somewhere else), is there the possibility to manually change entries in a data frame and then get the corresponding code of the changing steps for documentation?
I know the function edit
which nicely opens a window for editing. But it does not generate the corresponding code.
The reason is: I have a rather large data set where I need a bunch of manual edits. It would be great, if I could document them directly in the code rather than change the data in Excel or wherever. But it's a quite much work to write it down and I thought, there must be a functionality that generates this code automatically.
I agree quite strongly with Roland's comment, but if this is something you have to do, then documenting those changes is vital. You could do this by comparing the edited data to the original data and then producing the code that would get you to the new data from the original. Here's an example with the mtcars
data. First, make sure you've got an observation number in the data, I've called it obs
below. You'll need to modify the code if you call it something else. In the data, I created some character strings and factors to show how it would work with different kinds of variables.
library(dplyr)
data(mtcars)
mtcars <- mtcars %>%
mutate(obs = row_number(),
am = factor(mtcars$am, levels=c(0,1), labels = c("Auto", "Man")),
model = rownames(mtcars), .before="mpg")
rownames(mtcars) <- NULL
orig <- mtcars
I edited the data and saved it as new. I document the changes I made in the code chunk.
new <- edit(mtcars)
# changed model for obs 1 to "Mazda RX4 Cpe"
# changed mpg for obs 20 to 20
# changed cyl for obs 17 to 4
# changed hp for obs 3 to 150
# changed AM for obs 4 to "Man"
# changed AM for obs 2 to "Auto"
Finally, you could loop through the variable names, identify where the differences are in each variable and make the code that assigns the different value to the appropriate observation in the original dataset. Once you've got all the recodes, you could write them out to a separate file with something like cat(recodes, file="recodes.r", sep="\n")
recodes <- NULL
nms <- names(mtcars)
for(n in nms){
inds <- which(orig[[n]] != new[[n]])
if(length(inds) > 0){
if(inherits(orig[[n]], "factor") | inherits(orig[[n]], "character")){
recodes <- c(recodes, sapply(inds, \(i)paste0("orig$", n, "[",i,"] <- '", new[[n]][i], "'")))
}else{
recodes <- c(recodes, sapply(inds, \(i)paste0("orig$", n, "[",i,"] <- ", new[[n]][i])))
}
}
}
recodes
#> [1] "orig$model[1] <- 'Mazda RX4 Cpe'" "orig$mpg[20] <- 20"
#> [3] "orig$cyl[17] <- 4" "orig$hp[3] <- 150"
#> [5] "orig$am[2] <- 'Auto'" "orig$am[4] <- 'Man'"
Created on 2024-12-18 with reprex v2.1.0