rtidyversedata-wrangling

How to lengthen data in one column separated by semicolons, and repeat elements from the other column?


I have received a dataset in a .csv table. The first three lines of the table looks like this:

Species,Methods
Chlamydomonas pisiformis; Stichococcus bacillaris; Stichococcus subtilis; Pleurococcus vulgaris; Scenedesmus sp.; Kirchneriella sp.; Chlorococcum humicola; Protosiphon botryoides; Stigeoclonium tenue; Oedogonium sp.; Vaucheria sp.; Conjugales; Botrydium granulatum; Botrydiopsis sp.; ,Micropicking; Subculturing
Eutreptiella gymnastica; Cryptomonas sp; Rhodomonas sp ,Phototaxis
Symbiodinium sp.,Amoebic digestion

Here, the names of multiple species are stored in a single cell separated by semicolons. I want to modify this so that the final output.csv looks something like:

Species,Methods
Chlamydomonas pisiformis,Micropicking; Subculturing
 Stichococcus bacillaris,Micropicking; Subculturing
 Stichococcus subtilis,Micropicking; Subculturing
 Pleurococcus vulgaris,Micropicking; Subculturing
 Scenedesmus sp.,Micropicking; Subculturing
 Kirchneriella sp.,Micropicking; Subculturing
 Chlorococcum humicola,Micropicking; Subculturing
 Protosiphon botryoides,Micropicking; Subculturing
 Stigeoclonium tenue,Micropicking; Subculturing
 Oedogonium sp.,Micropicking; Subculturing
 Vaucheria sp.,Micropicking; Subculturing
 Conjugales,Micropicking; Subculturing
 Botrydium granulatum,Micropicking; Subculturing
 Botrydiopsis sp.,Micropicking; Subculturing
Eutreptiella gymnastica,Phototaxis
 Cryptomonas sp,Phototaxis
 Rhodomonas sp ,Phototaxis
Symbiodinium sp.,Amoebic digestion

The elements of the Species column that are separated by a semicolon (;) need to be in individual rows, while the corresponding Methods are simply repeated for each species in the adjacent column.

Is it possible to do this in R, preferably in a tidyverse compatible manner?


Solution

  • library(tidyr)
    
    df |>
      separate_longer_delim(Species, delim = "; ")
    #                     Species                    Methods
    # 1  Chlamydomonas pisiformis Micropicking; Subculturing
    # 2   Stichococcus bacillaris Micropicking; Subculturing
    # 3     Stichococcus subtilis Micropicking; Subculturing
    # 4     Pleurococcus vulgaris Micropicking; Subculturing
    # 5           Scenedesmus sp. Micropicking; Subculturing
    # 6         Kirchneriella sp. Micropicking; Subculturing
    # 7     Chlorococcum humicola Micropicking; Subculturing
    # 8    Protosiphon botryoides Micropicking; Subculturing
    # 9       Stigeoclonium tenue Micropicking; Subculturing
    # 10           Oedogonium sp. Micropicking; Subculturing
    # 11            Vaucheria sp. Micropicking; Subculturing
    # 12               Conjugales Micropicking; Subculturing
    # 13     Botrydium granulatum Micropicking; Subculturing
    # 14         Botrydiopsis sp. Micropicking; Subculturing
    # 15                          Micropicking; Subculturing
    # 16  Eutreptiella gymnastica                 Phototaxis
    # 17           Cryptomonas sp                 Phototaxis
    # 18           Rhodomonas sp                  Phototaxis
    # 19         Symbiodinium sp.          Amoebic digestion
    

    Data

    df <- read.csv(text = "Species,Methods
    Chlamydomonas pisiformis; Stichococcus bacillaris; Stichococcus subtilis; Pleurococcus vulgaris; Scenedesmus sp.; Kirchneriella sp.; Chlorococcum humicola; Protosiphon botryoides; Stigeoclonium tenue; Oedogonium sp.; Vaucheria sp.; Conjugales; Botrydium granulatum; Botrydiopsis sp.; ,Micropicking; Subculturing
    Eutreptiella gymnastica; Cryptomonas sp; Rhodomonas sp ,Phototaxis
    Symbiodinium sp.,Amoebic digestion", header = T)