Given:
Files in a working directory:
WKDIR = "/scratch/project_2004072/Nationalbiblioteket/dataframes"
$ ls -l
nikeX_docworks_lib_helsinki_fi_access_log_07_02_2021_lemmaMethod_stanza_27450_vocabs.json
nikeX_docworks_lib_helsinki_fi_access_log_07_02_2021_lemmaMethod_stanza_tfidf_matrix_RF_large.gz
nikeX_docworks_lib_helsinki_fi_access_log_07_02_2021_lemmaMethod_stanza_tfidf_vectorizer_large.gz
nikeX_docworks_lib_helsinki_fi_access_log_07_02_2021_lemmaMethod_stanza_user_tokens_df_27452_BoWs.gz
nikeY_docworks_lib_helsinki_fi_access_log_07_02_2021_lemmaMethod_stanza_26042_vocabs.json
nikeY_docworks_lib_helsinki_fi_access_log_07_02_2021_lemmaMethod_stanza_tfidf_matrix_RF_large.gz
nikeY_docworks_lib_helsinki_fi_access_log_07_02_2021_lemmaMethod_stanza_tfidf_vectorizer_large.gz
nikeY_docworks_lib_helsinki_fi_access_log_07_02_2021_lemmaMethod_stanza_user_tokens_df_26050_BoWs.gz
Goal:
I'd like to create a customized path using regular expression fr
to only read files with endings of user_tokens_df_XXXX_BoWs.gz
and load them via some helper function later in my code.
Right now, I have a python script with f-string and regex which does not work:
import re
import os
fprefix = f"nikeY_docworks_lib_helsinki_fi_access_log_07_02_2021_lemmaMethod_stanza_"
fpath = os.path.join(WKDIR, f'{fprefix}_user_token_sparse_df'fr'_user_tokens_df_(\d+)_BoWs.gz')
#fpath = os.path.join(WKDIR, f'{fprefix}_user_token_sparse_df'fr'(_user_tokens_df_(\d+)_BoWs.gz)') # did not work either!
print(fpath) # >>>> it's wrong! <<<<
try:
# load via helper function
df = load_pickle(fpath)
except:
# do something else
Is there any better approach to fix this? Do I have a wrong understanding that using re.search()
is not helping since this fpath
is fed into another function in try except
block in my code.
Cheers,
Based on the description of the filename pattern given in the question, how about:
from glob import glob
PATTERN = "/scratch/project_2004072/Nationalbiblioteket/dataframes/*user_tokens_df_*_BoWs.gz"
for file in glob(PATTERN):
...