rstringtidyverse

How to collapse all strings from a dataframe into a single column?


I have a dataset where cells are occupied by strings of varied length or NAs

I need to produce one column (say a dataframe consisting of one column) with all strings. Essentially stack all valid string answers filtering NAs out in the process.

df.input <- data.frame(col1 = c("green potatoes","read avocados","white pepper","wise master"),
                      col2 = c("white seagull","black tank","creative pigeon","crazy socks"),
                      col3 = c("constant turmoil","ready fan",NA,"interesting collapse"),
                      col4 = c("awesome lettuce","jiggedy cabbage",NA,NA),
                      col5 = c("green potatoes","read avocados",NA,NA),
                      col6 = c("green potatoes",NA,NA,NA),
                      col7 = c(NA,NA,NA,NA)
                      )

The df for output should be like:

df.output <- data.frame(colOnlyOne = c("green potatoes","read avocados","white pepper","wise master",
                                       "white seagull","black tank","creative pigeon","crazy socks",
                                       "constant turmoil","ready fan","interesting collapse",
                                       "awesome lettuce","jiggedy cabbage",
                                       "and so on for all non-NA string values...")
                        )

How do I achieve that? Preferably, using tidyverse family of functions.

Thanks to Edward and ThomasIsCoding I understood the problem and was able to solve it pragmatically in 2 ways:

stack(df.input) %>% na.omit() %>% select(values) %>% filter(values != "")
# and
pivot_longer(df.input, everything()) %>% na.omit() %>% select(value) %>% filter(value!="")
# and Thomas' answer is just great out of the box:
data.frame(colOnlyOne = na.omit(unlist(df.input, use.names = FALSE))) %>% filter(colOnlyOne !="")

P.S: I also wish to say that I disagree with the decision to close my question, because even though the programmatic method employed in the answer is the same in the referenced question, being a non programmer I would find it hard to even see that the question is the same. One of the strengths of StackExchange is that it helps many non-programmers to understand something useful, hence closing this feels a bit "elitist".


Solution

  • > data.frame(colOnlyOne = na.omit(unlist(df.input, use.names = FALSE)))
                 colOnlyOne
    1        green potatoes
    2         read avocados
    3          white pepper
    4           wise master
    5         white seagull
    6            black tank
    7       creative pigeon
    8           crazy socks
    9      constant turmoil
    10            ready fan
    11 interesting collapse
    12      awesome lettuce
    13      jiggedy cabbage
    14       green potatoes
    15        read avocados
    16       green potatoes