When extracting file paths, not all but a few results are returned that contain special characters ~$ at the start of the file name. I am looking to compare these file paths with another list, thus the special characters prevent the ability to find a proper match.
The current code:
import os
for path, sub_dirs, files in os.walk(root):
for name in files:
# For each file we find, we need to ensure it is a .docx file before adding
# it to our list
if os.path.splitext(os.path.join(path, name))[1] == ".docx":
document_list.append(os.path.join(path, name))
The majority of results are satisfactory, for example:
X:/Serial Numbers/6200\Test Company\6275 Documents\6275rA_Order_TEST_120221.docx
however there are occasional results of special characters that do not exist in file name:
X:/Serial Numbers/6200\Test Company\6275 Documents\~$75rA_Order_MERZ_120221.docx
Preferably seeking a solution that does not rely on a string replace method.
As has been pointed out in another answer, files beginning with "~$" are probably Microsoft temporary files.
The pathlib module (preferred over os nowadays) offers are more OO approach to interacting with your filesystem.
In this case I would suggest using a generator in preference to the current structure.
Something like this:
from pathlib import Path
from collections.abc import Iterable
def genpaths(directory: Path) -> Iterable[Path]:
ignore = ("~$", ) # add any additional filename prefixes to be ignored here
for fullpath in directory.rglob("*.docx"):
if not fullpath.name.startswith(ignore):
yield fullpath
root = Path("the_root_directory")
for c in genpaths(root):
print(c) # the files of interest