[START OF BACKGROUND INFO]
I am conducting research on the optimization of fixed solar panels. I have real irradiance data that has been gathered over 15-minute intervals over the course of several years.
I am running a program in python that is supposed to test different cases of tilt & azimuth of the solar panel in order to maximize irradiance from the Sun. If you're curious or don't know PV terminology:
Below is a simple case where I test the % Energy Gain from only a single day's readings at different azimuths only. In this simple case, I test azimuths at 45-degree intervals: 90 (East), 135 (Southeast), 180 (South), 225 (Southwest), and 270 (West) degrees. This code isn't perfect, but it works.
Code:
# For the AFTERNOON THUNDERSTORM Dataset from above:
# Now, I will model the gain in energy due to transposition from GHI --> POA
#NOTE: Changing only AZIMUTH ANGLE at Fixed Tilt
df_hyp_gain_az = pd.DataFrame()
def calculate_poa_hyp(rawdata,solar_position,surface_tilt,surface_azimuth):
poa = pvlib.irradiance.get_total_irradiance(
surface_tilt=surface_tilt,
surface_azimuth=surface_azimuth,
dni=dirint_dni_hyp, # calculated from before
ghi=df_hyp['Solar Radiation(W/m^2)'], # this is the raw data
dhi=calculated_dhi_hyp, # calculated from before
dni_extra=dni_et_hyp, # calculated from before
solar_zenith=solpos_hyp['apparent_zenith'], # calculated from before
solar_azimuth=solpos_hyp['azimuth'], # calculated from before
surface_type='grass',
model='haydavies')
return poa['poa_global'] # returns the total in-plane irradiance
for azimuth in range(90,271,45): # scans from east(90) to west(270)
# NOTE: Hardcoding Tilt=FLAT for all cases
poa_irradiance_hyp_az = calculate_poa_hyp(
rawdata=df_hyp,
solar_position=solpos_hyp,
surface_tilt=alma.latitude,
surface_azimuth=azimuth)
column_name_hyp_az = f"AZ-{azimuth}"
df_hyp_gain_az[column_name_hyp_az] = poa_irradiance_hyp_az
# calculate the % difference from GHI
ghi_hyp = df_hyp['Solar Radiation(W/m^2)']
df_hyp_gain_az = 100 * (df_hyp_gain_az.divide(ghi_hyp, axis=0)-1)
plt.figure()
df_hyp_gain_az.plot().get_figure().set_facecolor('white')
plt.xlabel('Hour of Day')
plt.ylabel('Hourly Transposition Gain [%]')
plt.title('Aftn. Thunderstorm - Energy Gain From Changing Surface Azimuth',size='x-large',weight='demibold');
plt.xlim('1997-07-07 06:00:00-04:00','1997-07-07 21:00:00-04:00');
Output: Energy Gain From Changing Surface Azimuth
I also tested both tilt and azimuth simultaneously, with the tilt at 10-degree intervals and again with the azimuth at 45-degree intervals. In order to find which orientation of tilt & azimuth would return the highest amount of irradiance, I used from scipy import integrate
successfully. The two nested-for loops and integration for each case works great, no problems here:
Code:
# Integrating over both change in AZIMUTH and TILT
df_hyp_gain_both = pd.DataFrame() # this tests % Energy Gain
for tilt in range(0,91,10):
for azimuth in range(0,316,45):
poa_irradiance_hyp_both = calculate_poa_hyp(
rawdata=df_hyp,
solar_position=solpos_hyp,
surface_tilt=tilt,
surface_azimuth=azimuth)
column_name_hyp_both = f"AZ={azimuth}|FT={tilt}"
df_hyp_gain_both[column_name_hyp_both] = poa_irradiance_hyp_both
df_hyp_gain_both = 100 * (df_hyp_gain_both.divide(ghi_hyp, axis=0)-1)
df_hyp_gain_both_sec = df_hyp_gain_both
df_hyp_gain_both_sec.index = df_hyp_gain_both.index.astype(np.int64)//10**9
df_hyp_gain_both_sec = df_hyp_gain_both_sec.fillna(0)
df_hyp_integral = df_hyp_gain_both_sec.iloc[:,1:].apply(lambda x: integrate.trapz(x,dx=900))
df_hyp_poa = pd.DataFrame() # this tests the raw irradiance readings
for tilt in range(0,91,10):
for azimuth in range(0,316,45):
poa_hyp = calculate_poa_hyp(df_hyp,solpos_hyp,tilt,azimuth)
column_name_poa_hyp = f"AZ={azimuth}|FT={tilt}"
df_hyp_poa[column_name_poa_hyp] = poa_hyp
df_hyp_poa_sec = df_hyp_poa
df_hyp_poa_sec.index = df_hyp_poa.index.astype(np.int64)//10**9
df_hyp_poa_sec = df_hyp_poa_sec.fillna(0)
df_hyp_poa_integral = df_hyp_poa_sec.iloc[:,1:].apply(lambda y: integrate.trapz(y,dx=900))
print('Integrating the flat POA Irradiance [W/m^2]:')
display(df_hyp_poa_integral.sort_values(ascending=False))
print('--------------------------------------------\nIntegrating the Energy Gain [%]:')
display(df_hyp_integral.sort_values(ascending=False))
print('--------------------------------------------\nAzimuth facing East (90 degrees) and Fixed Tilt between 30-50 degrees will maximize the energy produced from a solar panel.')
Output:
Integrating the flat POA Irradiance [W/m^2]:
AZ=90|FT=40 2.075620e+07
AZ=90|FT=50 2.055457e+07
AZ=90|FT=30 2.048603e+07
AZ=90|FT=60 1.988685e+07
AZ=90|FT=20 1.975235e+07
...
AZ=270|FT=80 5.838635e+06
AZ=315|FT=80 5.648596e+06
AZ=225|FT=90 5.409111e+06
AZ=270|FT=90 5.291405e+06
AZ=315|FT=90 5.225395e+06
Length: 79, dtype: float64
--------------------------------------------
Integrating the Energy Gain [%]:
AZ=90|FT=50 1.218214e+06
AZ=90|FT=40 1.206950e+06
AZ=90|FT=60 1.107202e+06
AZ=90|FT=30 1.074007e+06
AZ=45|FT=40 9.597149e+05
...
AZ=315|FT=80 -2.598201e+06
AZ=180|FT=90 -2.720371e+06
AZ=225|FT=90 -2.777403e+06
AZ=270|FT=90 -2.790663e+06
AZ=315|FT=90 -2.802186e+06
Length: 79, dtype: float64
--------------------------------------------
Azimuth facing East (90 degrees) and Fixed Tilt between 30-50 degrees will maximize the energy produced from a solar panel.
The above codes took less than 10 seconds to execute properly, as they were for a single day's worth of data.
Now, onto my problem: Because I only analyzed tilt-angles with 10-degree intervals and azimuth-angles with 45-degree intervals, I wanted more accurate results. So, I lowered the both the tilt & azimuth intervals to analyze for every 1-degree.
Code (only difference from previous code is changing the range()
parameters):
df_hyp_gain_both = pd.DataFrame() # this tests % Energy Gain
for tilt in range(0,91,1):
for azimuth in range(0,360,1):
poa_irradiance_hyp_both = calculate_poa_hyp(
rawdata=df_hyp,
solar_position=solpos_hyp,
surface_tilt=tilt,
surface_azimuth=azimuth)
column_name_hyp_both = f"AZ={azimuth}|FT={tilt}"
df_hyp_gain_both[column_name_hyp_both] = poa_irradiance_hyp_both
df_hyp_gain_both = 100 * (df_hyp_gain_both.divide(ghi_hyp, axis=0)-1)
df_hyp_gain_both_sec = df_hyp_gain_both
df_hyp_gain_both_sec.index = df_hyp_gain_both.index.astype(np.int64)//10**9
df_hyp_gain_both_sec = df_hyp_gain_both_sec.fillna(0)
df_hyp_integral = df_hyp_gain_both_sec.iloc[:,1:].apply(lambda x: integrate.trapz(x,dx=900))
df_hyp_poa = pd.DataFrame() # this tests the raw irradiance readings
for tilt in range(0,91,1):
for azimuth in range(0,360,1):
poa_hyp = calculate_poa_hyp(df_hyp,solpos_hyp,tilt,azimuth)
column_name_poa_hyp = f"AZ={azimuth}|FT={tilt}"
df_hyp_poa[column_name_poa_hyp] = poa_hyp
df_hyp_poa_sec = df_hyp_poa
df_hyp_poa_sec.index = df_hyp_poa.index.astype(np.int64)//10**9
df_hyp_poa_sec = df_hyp_poa_sec.fillna(0)
df_hyp_poa_integral = df_hyp_poa_sec.iloc[:,1:].apply(lambda y: integrate.trapz(y,dx=900))
print('Integrating the flat POA Irradiance [W/m^2]:')
display(df_hyp_poa_integral.sort_values(ascending=False))
print('--------------------------------------------\nIntegrating the Energy Gain [%]:')
display(df_hyp_integral.sort_values(ascending=False))
print('--------------------------------------------\nAzimuth facing East (90 degrees) and Fixed Tilt between 30-50 degrees will maximize the energy produced from a solar panel.')
Output:
C:\Users\jmand\AppData\Local\Temp\ipykernel_10576\4287560089.py:13: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df_hyp_gain_both[column_name_hyp_both] = poa_irradiance_hyp_both
And this output repeats over-and-over for a very long time with no results. I've tried searching for a solution on this website (found PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance and many similar ones). I know I need to use pd.concat()
in some form or fashion for my df_hyp_gain_both
DataFrame. But, I'm having trouble even setting it up. I need to somehow use column_name_hyp_both
AND poa_irradiance_hyp_both
.
With correct syntax, how would I incorporate pd.concat()
into my for-loop
in order to avoid this PerformanceWarning: DataFrame is highly fragmented
warning?
After removing all the solar stuff, the problem essentially boils down to this:
import pandas as pd
df = pd.DataFrame()
dummy_data = pd.Series(0, index=pd.date_range('2019-01-01', freq='h', periods=8760))
for i in range(200): # 200 just as an example
df[f'col_{i}'] = dummy_data.copy()
An alternative that is one or two orders of magnitude faster, on my computer anyway, is to accumulate the columns into a dictionary and only convert to a DataFrame after the loop:
results = {}
for i in range(200):
results[f'col_{i}'] = dummy_data.copy()
df = pd.DataFrame(results)
A similar approach is useful when building a DataFrame by rows (instead of columns like above) -- rather than appending rows onto a DataFrame, which forces unnecessary reallocations and memory copies, it is better to accumulate row info in a list and convert to a DataFrame all at once. For example, consider these three ways of building a DataFrame from chunks of rows:
empty_df = pd.DataFrame({i: [0]*10 for i in range(20)})
def pandas_concat(N):
df = empty_df
for i in range(1, N):
df = pd.concat([df, empty_df])
return df
def pandas_append(N):
df = empty_df
for i in range(1, N):
df = df.append(empty_df)
return df
def list_append(N):
lis = []
for i in range(N):
lis.append(empty_df)
df = pd.concat(lis)
return df
Here's how the timings compare as a function of N
(dots indicate timings, lines are estimated asymptotic behavior). So there's a clear speed up by accumulating things in normal python data structures and only building the final DataFrame once at the end.