Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1

Using the tidytext package, I want to transform my tibble into a one-token-per-document-per-row. I transformed the text column of my tibble from factor to character but I still get the same error.

text_df <- tibble(line = 1:3069, text = text)

My tibble looks like this, with a column as character:

# A tibble: 3,069 x 2
line text$text  
<int> <chr>

However when I try to apply unnest_tokens:

text_df %>%
  unnest_tokens(word, text$text)

I always get the same error:

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

What is the issue in my code?

Solution

Your text column is probably a data frame itself with a single text column :

library(tibble)
library(dplyr,warn.conflicts = FALSE)
library(tidytext)

text <- data.frame(text= c("hello world", "this is me"), stringsAsFactors = FALSE)
text_df <- tibble(line = 1:2, text = text)

text_df
#> # A tibble: 2 x 2
#>    line text$text  
#>   <int> <chr>      
#> 1     1 hello world
#> 2     2 this is me

text_df %>% 
  unnest_tokens(word, text$text)

Error in check_input(x) :

Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

Modify it to extract the text column and proceed :

text_df <- mutate(text_df, text = text$text)
# or if your text is stored as factor
# text_df <- mutate(text_df, text = as.character(text$text))

text_df
#> # A tibble: 2 x 2
#>    line text       
#>   <int> <chr>      
#> 1     1 hello world
#> 2     2 this is me

text_df %>% 
  unnest_tokens(word, text)
#> # A tibble: 5 x 2
#>    line word 
#>   <int> <chr>
#> 1     1 hello
#> 2     1 world
#> 3     2 this 
#> 4     2 is   
#> 5     2 me

It's a good idea to use str(), or sometimes summary(), names() or unclass() to diagnose this sort of issues :

text <- data.frame(text= c("hello world", "this is me"), stringsAsFactors = FALSE)
text_df <- tibble(line = 1:2, text = text)
str(text_df)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    2 obs. of  2 variables:
#>  $ line: int  1 2
#>  $ text:'data.frame':    2 obs. of  1 variable:
#>   ..$ text: chr  "hello world" "this is me"