rpython-3.xpandaslistread-data

How to read csv files from different paths folders, with configuration files attached for each dataset in Python?


I have worked in R until now. And I am trying to replicate my work in python. Thus, in R I have been able to read different datasets from different path folders into a list of list. On this list, I have applied a function that has standardised the data using configuration files that were stored into the same list where data was stored. But for now I am just trying to read the data and configuration files into a list of list in python and do not know how? Can someone please help with this?

This is what I have done in R and this is how I read the data.

import_list <- list(list(data_path = "../data/A.csv",
                         config_path = "/data/config/A/"),
                    list(data_path = "../data/B.csv",
                         config_path =  "/data/config/B/"),
                    list(data_path = "../data/C.csv", 
                         config_path = "../data/C/")))

I hope I can read the data I have in this format, but this time in python. Is there a simple way to read multiple csv data files from different paths folder into this format? This is how it should look like.

 import_list List of 3   
    :List of 2   
      data_path : chr "../data/A.csv  
      config_path: chr "config/A" 
    :List of 2   
      data_path: chr "../data/B.csv 
      config_path: chr "config/B"

Solution

  • This might be enough:

    In [1]: from os import listdir
    
    In [2]: from os.path import isfile, join
    
    In [3]: from re import sub
    
    In [4]: mypath = "test"
    
    In [5]: onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath, f)) ]
    
    In [6]: configs = [ join(mypath, "config", sub("\.csv$", "", f)) for f in onlyfiles ]
    
    In [7]: list(zip(onlyfiles, configs))
    Out[7]: [('B.csv', 'test/config/B'), ('A.csv', 'test/config/A')]