[SOLVED] read_delim for multiple files with different number of columns

read_delim for multiple files with different number of columns

I am trying to read in multiple text files with read_delim. However, these text files differ in how many columns they have. I am only interested in some of the columns which are common in all text files.

However, when I try to specify the columns with col_select, it still throws the error that the amount of columns are different. Here is a minimal example:

> df = read_delim(c('file1.txt', 'file2.txt'), col_select = 1)
Error: Files must all have 3 columns:
* File 2 has 2 columns

However, this works and only reads in the first column:

> df = read_delim('file1.txt', col_select = 1)
New names:                                                                                                                                                          
• `test2` -> `test2...2`
• `test2` -> `test2...3`
Rows: 1 Columns: 1
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
dbl (1): test1

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Content of file1.txt:

test1 test2 test3
1 2 3

Content of file2.txt:

test1 test2
1 2

Does anyone have any ideas how to read in text files which differ in the number of columns that they have?

Solution

As it seems to check the number of columns are equal and will error before column selection happens, you likely need to read each in separately and bind them:

library(readr)
library(purrr)

set_names(c('file1.txt', 'file2.txt')) %>%
  map(read_delim, col_select = 1, show_col_types = FALSE) %>%
  list_rbind(names_to = "file_id")

# A tibble: 2 × 2
  file_id   test1
  <chr>     <dbl>
1 file1.txt     1
2 file2.txt     1