rtxt

Why does the list.files() function return files that can not be found?


I am trying to use the base R function list.files() to return a list of file names (.txt files) so that I can import them in an automated way instead of writing a read.table() line per file.

Here is the problem that I'm encountering:

Using the list.files() function, I get the following output:

> list.files(pattern = "\\.txt$")
[1] "~$ DD - maskedfiletitle.txt"                   
[2] "~$ KP - maskedfiletitle.txt"
[3] "TF DD - maskedfiletitle.txt"                   
[4] "TL XF - maskedfiletitle.txt"                             
[5] "UR FG - maskedfiletitle.txt"                         
[6] "VB PD - maskedfiletitle.txt"                
[7] "VS KP - maskedfiletitle.txt"

The desired output is the following:

[1] "TF DD - maskedfiletitle.txt"                   
[2] "TL XF - maskedfiletitle.txt"                             
[3] "UR FG - maskedfiletitle.txt"                         
[4] "VB PD - maskedfiletitle.txt"                
[5] "VS KP - maskedfiletitle.txt"

It seems to always return the first and last file in the folder with the first two characters replaced by "~$". Obviously, if the next step is to read these files it will give an error message saying that the "~$" file does not exist.

For now I have worked around this by simply removing the first two elements. However, I have no answer as to why this behaviour occurs.

I have tried removing all non .txt files from the folder and rewriting the function to use different arguments:

> list.files(all.files = FALSE, no.. = TRUE)
[1] "~$ DD - maskedfiletitle.txt"                   
[2] "~$ KP - maskedfiletitle.txt"
[3] "TF DD - maskedfiletitle.txt"                   
[4] "TL XF - maskedfiletitle.txt"                             
[5] "UR FG - maskedfiletitle.txt"                         
[6] "VB PD - maskedfiletitle.txt"                
[7] "VS KP - maskedfiletitle.txt"

This, however, also gives me the first and last file with the first two characters changed to "~$".

Now, it's not a critical error or anything, but I'm interested in learning where this behaviour comes from. I have read through the help section of the function and I've searched a bit on the web but I cannot find anything that explains it and I am quite stumped.

Do let me know if I need to provide more information!


Solution

  • Usually any file that starts with ~$ is a temporary file (at least on Windows). If any of your .txt files are currently open, try closing them first.

    Otherwise, I thought you might be able to combine the files that end with .txt and files that don't start with ~$ conditions into one regex to use in the pattern argument (using "^(?!~\\$).*\\.txt$"). However, the pattern argument in list.files doesn't support negative lookahead directly, so you need to do it in two steps:

    my_files <- list.files(pattern = "\\.txt$")
    txt_files <- my_files[!grepl("^~\\$", my_files)]