I have multiple pandas dataframe, with same columns but different values. Ind I need to run an analysis from values of specific columns.
I have 7 dataframes to work with, but let's suppose I had only two.
df1 = pd.DataFrame({'a': [0, 0.5, 0.2],
'b': [1,1,0.3], 'c':['A','A','B']})
df2 = pd.DataFrame({'a': [4, 1, 6],
'b': [6.2,0.3,0.3], 'c': ['B','A','A']})
I opted to use global variables in a for loop.
I created:
dflist > list of original dataframes [df1, df2, ...]
sumlist > future dataframes names including summaries ['name1','name2']
Data need to be taken out from df in dflist, elaborated and finally will be passed on sumlist.
To do not get lost, I want my dynamic variables to get names from values in sumlist.
Here's where I get stuck. The variables I want to create are based on columns of dataframes df1, df2. However the output for each dynamic variable will contain all values from all columns.
dflist= [df1, df2]
sumlist= ['name1', 'name2']
for i in dflist:
for name in sumlist:
globals()['var{name}'] = i['c'].to_list()
On this dummy example, for some reasons, I get the following error:
varname1
NameError: name 'varname1' is not defined
In the case of the original dataframe, my list varname1 will give the following result:
['A','A','B','B','B','A']
Instead I should have had:
varname1 = ['A','A','B']
varname2 = ['B','B','A']
What puzzles me is that with the very same code, it "works" (albeit wrongly) in a case while it gives error in the other.
I need to overcome the issue or I will be forced to manually write every single variable.
Well, my suggestion would be to use a dictionary instead of using an unsafe globals
command. So instead of:
for i in dflist:
for name in sumlist:
globals()['var{name}'] = i['c'].to_list()
You should do:
d = {}
for i, name in zip(dflist, sumlist):
d[f'var{name}'] = i['c'].tolist()
Notice I am using a zip
function to iterate the two lists in parallel.