I'm trying to automatize the following function for all columns of two data frames.
I have the data.frames position_sc and position_hit. Both have 18 columns (each column represents a subject ("PD.."). The rows represent the sentence numbers (sentence comprehension test with four conditions: sc, sf, lc, lf, total number of sentences 200) In position_sc the numbers in the rows represent the sentence numbers which had the condition sc (e.g. 3, 7, 19 --> these sentences were presented under condition sc) In position_hit the rows represent for which sentence of the 200 the subjects had a hit (e.g. 3, 22, 78 --> the subject gave the correct answer for these sentences). Now I want to get the sentences the subjects had correct (hit) in condition sc. So the sentence number are the same when they had a sc-sentence correct.
I want to have the number of equal values in position_hit$x and position_sc$x. I was able to do that for each column (=subject) seperately, but I can't figure out, how to automatize it over all columns (compare position_sc$PD02 with position_hit$PD02, position_sc$PD03 with position_hit$PD03, position_sc$PD04 with position_hit$PD04, position_sc$PD05 with position_hit$PD03, aso.
That works well:
test_single = length(intersect(position_sc$PD03, position_hit$PD03))
OUTPUT
[1] 40
--> (40 numbers are equal --> 40 sentences in condition sc were a hit)
I tried something like that:
#Create a list containing both data.frames.
list_position_sc_hit = list(position_hit, position_sc)
#Try to automatize intersect() first:
test_all = sapply(list_positions_sc_hit, function(x) intersect(position_sc$x, position_hit$x))
But OUTPUT:
$hit
NULL
$sc
NULL
Here is an example how my script looks like: REPRODUCIBLE EXAMPLE
#CREATE 2 DATAFRAMES
#Dataframe 1
df.sc = data.frame (PD02 = c(1,3,7,9,12),
PD03 = c(8,9,10,11,13),
PD04 = c(1,2,3,24,36),
PD05 = c(1,9,20,23,26))
df.sc
#Dataframe 2
#Create list of 4
list.hit = list(PD02 = c(1,3,7,8,11,13,14,15,16,17),
PD03 = c(2,4,6,10,11,13,18,19,21),
PD04 = c(1,2,3,4,6),
PD05 = c(18,19,20,23,25,26))
#Convert list with different number of rows into data.frame
df.hit = as.data.frame(t(stri_list2matrix(list.hit)))
#Rows and columns are switched
df.hit = data.frame(t(df.hit))
#Name columns like in df.sc
names(df.hit)[1:4] = names(df.sc)[1:4]
#Convert all values in data.frame from chr to int
df.hit = sapply(df.hit, as.integer)
# After converting from chr to int the data.frame became an array. In order to reshape it I had to switch row and column twice (not sure why)
df.hit = data.frame(t(df.hit))
df.hit = data.frame(t(df.hit))
df.hit
#Do intersection and length counting for one subject (column)
matches.hit_sc_PD02 = length(intersect(df.sc$PD02, df.hit$PD02))
matches.hit_sc_PD02
OUTPUT: [3] --> three integers are the same in df.sc$PD02 and df.hit$PD02
#Combine both dataframes into list and rename dataframes
df.hit_sc = list(df.hit, df.sc)
names(df.hit_sc) = c("hit", "sc")
df.hit_sc
#Just try intersect() function for all variables (columns) with sapply(). I left length() out just to minimize complexity.
matches.hit_sc_all = sapply(df.hit_sc, function(x) intersect(df.sc$x, df.hit$x))
matches.hit_sc_all
OUTPUT: $hit NULL
$sc NULL
I would be happy if anyone could help me.
If I'm understanding the question correctly, you want to test how many values are shared between two columns of the same name from two different data frames. Here is a simple solution:
# assert that variable names are identical and in the same order
stopifnot(names(df.sc) == names(df.hit))
map2(df.sc, df.hit, ~ length(intersect(.x, .y)))
This will return a list with one element per variable name:
$PD02
[1] 3
$PD03
[1] 3
$PD04
[1] 3
$PD05
[1] 3
If you want to return a tibble (data.frame), which may be easier to work with, you can use map2_dfc()
instead:
map2_dfc(df.sc, df.hit, ~ length(intersect(.x, .y)))
which returns:
# A tibble: 1 x 4
PD02 PD03 PD04 PD05
<int> <int> <int> <int>
1 3 3 3 3
If this solves your problem I'd appreciate an upvote and accepting the solution. Thanks and hope this helps!
Edit: the map*
functions are in the purrr package from the tidyverse.