I received a txt data file that looks like this:
# test A response c
1 x1 1
2 x2 0
.. ..
324 x324 5
# test P response 8
1 x1 2
2 x2 1
.. ..
501 x501 4
# test 7 response t
1 x1 2
2 x2 1
.. ..
936 x936 4
It has two long columns separated by tab many subheaders (e.g. "test A response c"). Notice under each subheader, the number of rows is variable. The total number of rows is about 10,000. I want to tidy it up and drop all the subheaders, to as following:
x y Test Response
x1 1 test A response c
x2 0 test A response c
.. ..
x324 5 test A response c
x1 2 test P response 8
x2 1 test P response 8
.. ..
x501 4 test P response 8
x1 2 test 7 response t
x2 1 test 7 response t
.. ..
x936 4 test 7 response t
What's the best way to do this?
One way to solve your problem
library(tidyr)
library(dplyr)
df |>
mutate(Test = replace(x, !grepl("test", x), NA),
Response = replace(y, !grepl("response", y), NA)) |>
fill(Test, Response) |>
filter(!grepl("test", x))
x y Test Response
1 x1 1 test A response c
2 x2 0 test A response c
3 x324 5 test A response c
4 x1 2 test P response 8
5 x2 1 test P response 8
6 x501 4 test P response 8
7 x1 2 test 7 response t
8 x2 1 test 7 response t
9 x936 4 test 7 response t
# read file
df = read.delim("file.txt", header=FALSE, , col.names=c("x", "y"))
# toy example:
df = read.delim(text="test A response c
x1 1
x2 0
x324 5
test P response 8
x1 2
x2 1
x501 4
test 7 response t
x1 2
x2 1
x936 4", header=FALSE, , col.names=c("x", "y"))
df
x y
1 test A response c
2 x1 1
3 x2 0
4 x324 5
5 test P response 8
6 x1 2
7 x2 1
8 x501 4
9 test 7 response t
10 x1 2
11 x2 1
12 x936 4