rtextmining

Extracting part of string based on two conditions


I have a character column in my data set from which I want to extract part of a string based on two conditions:

a) if the string starts with "Therapist:", split the string to two columns: one column with the word "Therapist" and the other column with the remaining text.

b) if it is a "Patient:", split the string to two columns: one column with the word "Patient" and the other column with the remaining text.

The problem that I have been having is that I don't know how to create if statements in R. I'm a newbie but very willing to learn. Even after googling (stackoverflow, etc.) and trying different functions, I'm still at a loss.

Example of the data I have:

> data$speech[1:5]

[1] "Therapist: Okay, we’re back…"

[2] "Patient: Hmm-hmm."

[3] "Therapist: … after a couple of hours…"

[4] "Patient: Hmm-hmm."

[5] "Therapist: Hmm… Catch me up on what you’ve found yourself thinking and feeling after the session."

I really appreciate it.

Thank you!


Solution

  • This command creates a two-columns data frame:

    as.data.frame(do.call(rbind, strsplit(data$speech, ": ")))
    

    The result:

             V1                                                                                     V2
    1 Therapist                                                                      Okay, we’re back…
    2   Patient                                                                               Hmm-hmm.
    3 Therapist                                                             … after a couple of hours…
    4   Patient                                                                               Hmm-hmm.
    5 Therapist Hmm… Catch me up on what you’ve found yourself thinking and feeling after the session.