In each of the dataframe, I have a row entry that looks like:
[{'degree': 0.1, 'name': 'Colins Dental Clinic', 'rating': 9, 'level': 'fourth'}]
It is a direct import from read.csv
of a file.
I would like to create two columns from each entry that extracts two numeric values, from above, 0.1, and 9, which are degree and rating information of a dental clinic.
I tried substring and using sep=","
with quotes, but I am just not getting things correctly.
How do I do it correctly?
It looks like JSON data not correctly formated (because of the single quotes).
You didn't provide a sample of your dataframe but you can try something like this (2 options) :
### Packages
library(dplyr)
library(stringr)
library(jsonlite)
### Data
df=data.frame(col1="dummy",
col2="dummy",
col3="[{'degree': 0.1, 'name': 'Colins Dental Clinic', 'rating': 9, 'level': 'fourth'}]")
### Option 1 : Transform your column data into a valid JSON then extract data from it
df1=df %>% mutate(col3=str_replace_all(col3,"'","\\\""),
col3=str_replace_all(col3,"(?<=: )None(?=[,}])","\\\"None\\\""),
degree=fromJSON(col3)$degree,
rating=fromJSON(col3)$rating)
### Option 2 : Use regex to get directly what you need (less secure)
df2=df %>% mutate(degree=str_extract(col3,"(?<=degree': )\\d\\.\\d"),
rating=str_extract(col3,"(?<=rating': )\\d"))
Output :