[SOLVED] extracting the instances of numeric value in dataframe

extracting the instances of numeric value in dataframe

In each of the dataframe, I have a row entry that looks like:

[{'degree': 0.1, 'name': 'Colins Dental Clinic', 'rating': 9, 'level': 'fourth'}]

It is a direct import from read.csv of a file.

I would like to create two columns from each entry that extracts two numeric values, from above, 0.1, and 9, which are degree and rating information of a dental clinic.

I tried substring and using sep="," with quotes, but I am just not getting things correctly.

How do I do it correctly?

Solution

It looks like JSON data not correctly formated (because of the single quotes).
You didn't provide a sample of your dataframe but you can try something like this (2 options) :

### Packages
library(dplyr)
library(stringr)
library(jsonlite)

### Data
df=data.frame(col1="dummy",
              col2="dummy",
              col3="[{'degree': 0.1, 'name': 'Colins Dental Clinic', 'rating': 9, 'level': 'fourth'}]")

### Option 1 : Transform your column data into a valid JSON then extract data from it
df1=df %>% mutate(col3=str_replace_all(col3,"'","\\\""),
                  col3=str_replace_all(col3,"(?<=: )None(?=[,}])","\\\"None\\\""),
                  degree=fromJSON(col3)$degree,
                  rating=fromJSON(col3)$rating)

### Option 2 : Use regex to get directly what you need (less secure)
df2=df %>% mutate(degree=str_extract(col3,"(?<=degree': )\\d\\.\\d"),
                 rating=str_extract(col3,"(?<=rating': )\\d"))

Output :