Desired Output
Need the following output
df2 <-
data.frame(
v1 = c(1100001, 1100002, 1100003, 1100004, 1100005)
, v2 = c("A R", "W R", "A K", "M", "A C")
, v3 = c("P", "G P", "G P", "P", "P")
, v4 = c(110, 161, 129, 132, "Absent")
, v5 = c(55, 80.5, 64.5, 66, "Absent")
)
df2
v1 v2 v3 v4 v5
1 1100001 A R P 110 55
2 1100002 W R G P 161 80.5
3 1100003 A K G P 129 64.5
4 1100004 M P 132 66
5 1100005 A C P Absent Absent
Here is my original data.frame
.
df1 <-
structure(list(value = c(
"1100001 A R P 110 55",
"1100002 W R G P 161 80.5",
"1100003 A K G P 129 64.5",
"1100004 M P 132 66",
"1100005 A C P Absent Absent"
)), row.names = c(NA, -5L), class = c("data.frame")
)
df1
value
1 1100001 A R P 110 55
2 1100002 W R G P 161 80.5
3 1100003 A K G P 129 64.5
4 1100004 M P 132 66
5 1100005 A C P Absent Absent
Used cSplit
function from splitstackshape
, but could not manage to get the required output. Any points.
library(splitstackshape)
cSplit(indt = df1, splitCols = "value", sep = " ")
value_1 value_2 value_3 value_4 value_5 value_6 value_7
1: 1100001 A R P 110 55 NA
2: 1100002 W R G P 161 80.5
3: 1100003 A K G P 129 64.5
4: 1100004 M P 132 66 NA NA
5: 1100005 A P Absent Absent NA NA
1.Assuming the rule for separation are: a) more than one space b) numbers follwing a letter and space are to be split
2.We make sure all soon to be varibles are separated by more than one space (using gsub
to replace one space between a letter and a number with two spaces)
3.Then we use tidyr::separate
to separate the string into variables using two or more spaces as deliminiator
library(dplyr)
library(tidyr)
df1 %>%
mutate(value = gsub("([A-z])( )([0-9])", "\\1 \\3", value)) %>%
separate(value, c(paste0("v", 1:5)),"[ ]{2,}")
Returns:
v1 v2 v3 v4 v5 1 1100001 A R P 110 55 2 1100002 W R G P 161 80.5 3 1100003 A K G P 129 64.5 4 1100004 M P 132 66 5 1100005 A P Absent Absent
Regarding the new constraint that was not included in the original example (only one space between a number and a letter):
Suggested solution:
We just repeat the command for adding an extra space with "reversed" regex. So that any single space between a number and a letter will get an extra space and will in then be separated by the separate
call
df1 %>%
mutate(value = gsub("([A-z])( )([0-9])", "\\1 \\3", value)) %>%
mutate(value = gsub("([0-9])( )([A-z])", "\\1 \\3", value)) %>%
separate(value, c(paste0("v", 1:5)),"[ ]{2,}")