I received a code I'm trying to reduce and make more flexible.
The code is for obtaining climate values from a .csv
with many entries (+1 M).
Since I don't want to overextend the code, I've made so that variables are selected by the user, and therefore, when the user selects this variables via terminal, a new variable is created.
This worked well, but the problem comes when trying to store the climatic median variable for each point (ie, 85 stations).
If I do it like this:
for a in range(len(ciudad)):
nombre = ciudad[a] # We obtain the name of each city (~85)
datoa = localidades.loc[localidades['Nombre'] == nombre] # We focus on the "a" city each time
if 'datos_TMax' in locals(): # If variable has been created as the user requested it exists
tmax=datoa.loc[:,['TMax']] # We obtain the "Tmax" for each day in the selected period & iteration city
tmax_m = tmax.mean() # Average for the period in the "a" city
datos_TMax.append(tmax_m) # "datos_TMax" is a list created dynamically, and for each city the value is appended
The example above works perfectly. At the end, I obtain a file with the city name and it's max temperature average for the period user chose.
However, this way of coding it has problems: I've got to repeat an "if" statement for each possible variable, and then, when I transform to pd.DataFrame
, I have to make tons of "if" possibilities and combinations so that whatever the case, no error is raised.
Therefore, I decided to do it using a dict where lists (ie: dictionary{'Tmax' : [1, 2, 3, ...]}
) would store all the values for each selected variable.
The code for the loop looks like this:
dicvar = {}
for a in range(len(ciudad)):
nombre = ciudad[a] # We obtain the name of each city (~85)
datoa = localidades.loc[localidades['Nombre']==nombre] # We focus on the "a" city each time
for b in bucles: # For each selected user variable, iterate
if b in ['TMax', 'TMin', 'TMed', 'Racha', 'Dir', 'Velmedia', 'Sol', 'Presmax', 'Presmin']: # This is because some variables require different treatment
valor = datoa.loc[:,[b]] # Obtain the "b" value for each day in city "a" (valor is generic, as each iteration would have a different name)
valor_m = valor.mean() # Make the mean for the "b" variable of this iteration
print(valor_m) # Just to check
dicvar[b].append(valor_m) # In theory, append to key "b" (ie: "TMax") the value "valor_m" from each iteration
Well, the code runs and indeed 85 values are stored. However, when outputting, this is the result:
x | Cityname | Tmax
0 | Coruña | **TMax 19.78 dtype: float64**
Where it should be
x | Cityname | Tmax
0 | Coruña | 19.78
Any ideas on how to solve the problem? I've been trying to fix it for hours, but I don't see how.
Thanks!
As you see in the documentation here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html
the .mean()
method returns a series or dataframe. To retrieve the values you can write:
dicvar[b].append(valor_m.values[0])
The .values
property returns a numpy array of the dataframe values; the [0]
index retrieves the first value.