I have a data frame similar to final_df
below, which I obtain like this:
datadf <- data.frame(time=1:100, A=rnorm(100), B=rnorm(100), C=rnorm(100))
datadf <- as.data.frame(tidyr::pivot_longer(datadf, -time, names_to = "letter", values_to = "value"))
calldf <- data.frame(call=sample(LETTERS[1:3], 10, replace=T),
time=seq(from=5, by=9, length=10),
callnum=1:10, callnum_rev=10:1)
final_df <- dplyr::left_join(datadf, calldf, by="time")
And it looks like this:
> final_df
time letter value call callnum callnum_rev
1 1 A 0.008684912 <NA> NA NA
2 1 B -0.229217147 <NA> NA NA
3 1 C 1.675954809 <NA> NA NA
4 2 A 0.011244567 <NA> NA NA
5 2 B 0.593363139 <NA> NA NA
6 2 C -1.124759422 <NA> NA NA
7 3 A 0.271118167 <NA> NA NA
8 3 B 1.768275352 <NA> NA NA
9 3 C 0.287273528 <NA> NA NA
10 4 A 1.083097563 <NA> NA NA
11 4 B 1.990573158 <NA> NA NA
12 4 C -0.953178506 <NA> NA NA
13 5 A -0.076200582 A 1 10
14 5 B 1.159319799 A 1 10
15 5 C -0.484462797 A 1 10
16 6 A 0.506393522 <NA> NA NA
17 6 B -0.847847907 <NA> NA NA
18 6 C -0.429095586 <NA> NA NA
19 7 A -0.591182884 <NA> NA NA
20 7 B 0.996488879 <NA> NA NA
...
The only thing I want to do is to be able to trim the n
top rows and/or the n
bottom rows according to the variables callnum
(counts from the top) and/or callnum_rev
(counts from the bottom).
Some examples:
Say I pass a value for top=3. I would like my resulting data frame to start on the first row where callnum
is ==3 (row 67) and all the way to the end.
Say I pass a value for bottom=4. I would like my resulting data frame to start at the top, and go all the way until the last row where callnum_rev
is ==4 (row 177).
Say I pass both a value for top=3 and a value for bottom=4. I would like my resulting data frame to go from row 67 to row 177.
What actually would be the absolute best, is to leave an amount of NA rows above the top row, and an amount of NA rows below the bottom row, so that amount is half the NA rows between that value and the previous/next.
In this case that is simple because I am always leaving 8 NA rows between two call values (seq
by 9)... So in the example of a value for top=3 and a value for bottom=4, I would like my resulting data frame to go from row 55 (67-12, 12=8/23, as in 3 letters) to row 189 (177+12, 12=8/23, as in 3 letters).
This can get convoluted, because in my real data, the lines with callnum
values can have a variable number of NA rows in between.
Any help?
head_tail <- function(data, top=NULL, bottom=NULL) {
if(!is.null(top))
s1 <- min(which(data[,'callnum']==top))
else
s1 <- 1
if(!is.null(bottom))
s2 <- max(which(data[,'callnum_rev']==bottom))
else
s2 <- nrow(data)
return(data[seq(s1, s2),])
}
library(data.table)
final_dt <- setDT(final_df)
head_tail(final_dt, top=3)
time letter value call callnum callnum_rev
<num> <char> <num> <char> <int> <int>
1: 23 A -1.15224027 A 3 8
2: 23 B -0.22485125 A 3 8
3: 23 C -0.36342374 A 3 8
4: 24 A 0.65140424 <NA> NA NA
5: 24 B 0.07724828 <NA> NA NA
---
230: 99 B -1.13818328 <NA> NA NA
231: 99 C -0.00105447 <NA> NA NA
232: 100 A 0.41389275 <NA> NA NA
233: 100 B -2.48077559 <NA> NA NA
234: 100 C 1.32573054 <NA> NA NA
head_tail(final_dt, bottom=4)
time letter value call callnum callnum_rev
<num> <char> <num> <char> <int> <int>
1: 1 A -0.8785872 <NA> NA NA
2: 1 B 0.6806222 <NA> NA NA
3: 1 C 1.5115759 <NA> NA NA
4: 2 A -0.9589247 <NA> NA NA
5: 2 B -0.7238646 <NA> NA NA
---
173: 58 B -0.3192178 <NA> NA NA
174: 58 C 1.4266444 <NA> NA NA
175: 59 A -2.1541144 A 7 4
176: 59 B -0.4405924 A 7 4
177: 59 C 0.2283863 A 7 4
head_tail(final_dt, top=3, bottom=4)
time letter value call callnum callnum_rev
<num> <char> <num> <char> <int> <int>
1: 23 A -1.15224027 A 3 8
2: 23 B -0.22485125 A 3 8
3: 23 C -0.36342374 A 3 8
4: 24 A 0.65140424 <NA> NA NA
5: 24 B 0.07724828 <NA> NA NA
---
107: 58 B -0.31921775 <NA> NA NA
108: 58 C 1.42664445 <NA> NA NA
109: 59 A -2.15411443 A 7 4
110: 59 B -0.44059240 A 7 4
111: 59 C 0.22838630 A 7 4