pythonpandasdataframeloopsdynamic-variables

Nested Dynamic Variables for Dataframes in Loop


I have multiple pandas dataframe, with same columns but different values. Ind I need to run an analysis from values of specific columns.

I have 7 dataframes to work with, but let's suppose I had only two.

df1 = pd.DataFrame({'a': [0, 0.5, 0.2],
                   'b': [1,1,0.3], 'c':['A','A','B']})

df2 = pd.DataFrame({'a': [4, 1, 6],
                   'b': [6.2,0.3,0.3], 'c': ['B','A','A']})

I opted to use global variables in a for loop.

I created:

Data need to be taken out from df in dflist, elaborated and finally will be passed on sumlist.

To do not get lost, I want my dynamic variables to get names from values in sumlist.

Here's where I get stuck. The variables I want to create are based on columns of dataframes df1, df2. However the output for each dynamic variable will contain all values from all columns.

dflist= [df1, df2]
sumlist= ['name1', 'name2']

for i in dflist:
    for name in sumlist:
        globals()['var{name}'] = i['c'].to_list()

On this dummy example, for some reasons, I get the following error:

varname1
NameError: name 'varname1' is not defined

In the case of the original dataframe, my list varname1 will give the following result:

['A','A','B','B','B','A']

Instead I should have had:

varname1 = ['A','A','B']
varname2 = ['B','B','A']

What puzzles me is that with the very same code, it "works" (albeit wrongly) in a case while it gives error in the other.

I need to overcome the issue or I will be forced to manually write every single variable.


Solution

  • Well, my suggestion would be to use a dictionary instead of using an unsafe globals command. So instead of:

    for i in dflist:
        for name in sumlist:
            globals()['var{name}'] = i['c'].to_list()
    

    You should do:

    d = {}
    for i, name in zip(dflist, sumlist):
        d[f'var{name}'] = i['c'].tolist()
    

    Notice I am using a zip function to iterate the two lists in parallel.