for a project I needed to merge an Excel and an SPSS file with R.
Not sure if this was my best idea. I got the merge done, however in the process
I had to use the attribute(col)$label
as the name in order to work.
My final merged data.frame has thus way to long column names including special characters (e.g. :
).
Here are the first few examples how the colnames of the current merged df look like
colnames(combined_retro)
[1] "Zeitpunkt zu dem das Interview begonnen hat (Europe/Berlin)"
[2] "Studiencode: [01]"
[3] "Format"
[4] "Geschlecht"
[5] "Alter (direkt): Ich bin ... Jahre"
[6] "Staatsangehörigkeit"
So I created another data.frame Naming_Back
in which I have two columns: Name Label
Naming_Back
Name Label
1 CASE Interview-Nummer (fortlaufend)
2 SERIAL Seriennummer (sofern verwendet)
3 REF Referenz (sofern im Link angegeben)
4 QUESTNNR Fragebogen, der im Interview verwendet wurde
5 MODE Interview-Modus
So now I would like to iterate over the Columns of my merged data.frame combined_retro
and check if the current name of the column (e.g. "Zeitpunkt zu dem das Interview begonnen hat (Europe/Berlin)" is available in the Label column of the second (Naming_Back
) data.frame.
If it is I would like to exchange the current column name with the one provided by the Name
column.
My Current approach is the following loop:
for(i in 1:ncol(retro)) { # for-loop over columns
new_name_buffer <- Naming_Back %>%
filter(Label == colnames(retro[ , i]))
if(!(is_empty(new_name_buffer$Name))){
colnames(retro[ , i]) <- new_name_buffer$Name
print(colnames(retro[ , i]))
print(new_name_buffer$Name)
}
}
Examples for the print commands from the loop
[1] "Geschlecht"
[1] "SD02"
[1] "Staatsangehörigkeit"
[1] "SD04"
[1] "Staatsangehörigkeit: Anders"
[1] "SD04_04"
So obviously the problem is this line colnames(retro[ , i]) <- new_name_buffer$Name
as it does not change the column name. Has anyone a quick idea how to fix it?
EDIT: Found a solution, by creating a character vector and stepwise filling it with either the abbreviated name when available or the old name if not
new_col_names <- c()
for(i in 1:ncol(retro)) { # for-loop over columns
new_name_buffer <- Naming_Back %>%
filter(Label == colnames(retro[ , i]))
if(!(is_empty(new_name_buffer$Name))){
colnames(retro[ , i]) <- new_name_buffer$Name
new_col_names <- c(new_col_names, new_name_buffer$Name)
}
else{
new_col_names <- c(new_col_names, colnames(retro[ , i]))
}
}
colnames(retro) <- new_col_names
EDIT 2: Just found an alternative solution to overwrite a column name while iterating over the columns with a for loop, you can just do names(dataframe)[index]
and then just assign a new value with <- "newColName"
for(i in 1:ncol(retro)) { # for-loop over columns
new_name_buffer <- Naming_Back %>%
filter(Label == colnames(retro[ , i]))
if(!(is_empty(new_name_buffer$Name))){
names(retro)[i] <- new_name_buffer$Name
print(colnames(retro[ , i]))
print(new_name_buffer$Name)
}
}
The problem (as correctly identified in the comments to the OP by @IRTFM) with the original code was, that the assignment of the new column name: colnames(retro[ , i]) <- new_name_buffer$Name
was not working properly, as colnames does not work on an atomic vector.
I found work-around to overwrite a column, name while iterating over the columns of a data.frame with a for loop. One can just do call names(dataframe)[index]
and then just assign a new column name with <- "newColName"
in my example the important line would thus look like this:
Assigning a new column name
names(retro)[i] <- new_name_buffer$Name
The Complete solution with the for loop
for(i in 1:ncol(retro)) { # for-loop over columns
# Check if a row with the label is available in the Naming_Back dataframe
new_name_buffer <- Naming_Back %>%
filter(Label == colnames(retro[ , I]))
# When a Name matching the label is found, replace the old name
if(!(is_empty(new_name_buffer$Name))){
names(retro)[i] <- new_name_buffer$Name
}
}