pythonpandas

Problem with iterations and dataframes to store variables in dict


I received a code I'm trying to reduce and make more flexible.

The code is for obtaining climate values from a .csv with many entries (+1 M).

Since I don't want to overextend the code, I've made so that variables are selected by the user, and therefore, when the user selects this variables via terminal, a new variable is created.

This worked well, but the problem comes when trying to store the climatic median variable for each point (ie, 85 stations).

If I do it like this:

for a in range(len(ciudad)):
    nombre = ciudad[a] # We obtain the name of each city (~85)
    datoa = localidades.loc[localidades['Nombre'] == nombre] # We focus on the "a" city each time

       if 'datos_TMax' in locals(): # If variable has been created as the user requested it exists
           tmax=datoa.loc[:,['TMax']] # We obtain the "Tmax" for each day in the selected period & iteration city
           tmax_m = tmax.mean() # Average for the period in the "a" city
           datos_TMax.append(tmax_m) # "datos_TMax" is a list created dynamically, and for each city the value is appended

The example above works perfectly. At the end, I obtain a file with the city name and it's max temperature average for the period user chose.

However, this way of coding it has problems: I've got to repeat an "if" statement for each possible variable, and then, when I transform to pd.DataFrame, I have to make tons of "if" possibilities and combinations so that whatever the case, no error is raised.

Therefore, I decided to do it using a dict where lists (ie: dictionary{'Tmax' : [1, 2, 3, ...]}) would store all the values for each selected variable.

The code for the loop looks like this:

dicvar = {}
for a in range(len(ciudad)):
        nombre = ciudad[a] # We obtain the name of each city (~85)
        datoa = localidades.loc[localidades['Nombre']==nombre] # We focus on the "a" city each time
    
        for b in bucles: # For each selected user variable, iterate
            if b in ['TMax', 'TMin', 'TMed', 'Racha', 'Dir', 'Velmedia', 'Sol', 'Presmax', 'Presmin']: # This is because some variables require different treatment
                valor = datoa.loc[:,[b]] # Obtain the "b" value for each day in city "a" (valor is generic, as each iteration would have a different name)
                valor_m = valor.mean() # Make the mean for the "b" variable of this iteration
                print(valor_m) # Just to check
                
                dicvar[b].append(valor_m) # In theory, append to key "b" (ie: "TMax") the value "valor_m" from each iteration

Well, the code runs and indeed 85 values are stored. However, when outputting, this is the result:

x | Cityname | Tmax
0 | Coruña   | **TMax    19.78 dtype: float64**

Where it should be

x | Cityname | Tmax
0 | Coruña   | 19.78

Any ideas on how to solve the problem? I've been trying to fix it for hours, but I don't see how.

Thanks!


Solution

  • As you see in the documentation here:

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html

    the .mean() method returns a series or dataframe. To retrieve the values you can write:

    dicvar[b].append(valor_m.values[0])
    

    The .values property returns a numpy array of the dataframe values; the [0] index retrieves the first value.