I am facing problems in being able to plot a single graph that presents the clear sky model for a specific location, along with the global irradiation measured for the same respective day.
I need help, because I've been studying for a relatively short time and I've been stuck on it for two weeks now. I've already researched many sources, tried to make several changes to the code, but I still haven't succeeded.
Basically I collected data provided by a collection station, I checked through pandas which day would be the max ghi day for the location. So, my goal would be to model what a clear sky day would be like, using the Ineichein and Solis models, and compare them with the measured the ghi data for the day.
Basically, either I get stuck in a chain of errors, or the generated graph does not match what would be expected, or simply the measured data is not plotted.
I will leave the entire sequence of my code, as well as the csv file with the data. I would be very grateful if someone could help me.
**csv file:**https://drive.google.com/file/d/1Dfrx1Mj7hcb4p5eRnyZYRTLBRhmESxfn/view?usp=sharing
my code:
import pandas as pd
import pvlib
from pvlib import clearsky, atmosphere, solarposition
from pvlib.location import Location
from pvlib.iotools import read_tmy3
import matplotlib.pyplot as plt
location = Location(latitude=-15.60083, longitude=-47.71306, tz="America/Sao_Paulo", altitude=1023, name="Brasilia station")
data = pd.read_csv('brjan19.csv', index_col=0, sep=';', usecols = ['id', 'year', 'day', 'glo'], low_memory=False)
data = data.rename(columns={'glo': 'ghi'})
data['ghi'] = pd.to_numeric(data['ghi'], errors='coerce')`
# Create the date column by specifying the year (2019), month (January) and day
data['date'] = pd.to_datetime(data['year'].astype(str) + '-01-' + data['day'].astype(str), format='%Y-%m-%d')
# Select data only for the 13th and 14th
data_subset = data[(data['date'].dt.day == 13) | (data['date'].dt.day == 14)]
# Set 'date' as the index
data_subset.set_index('date', inplace=True)
# Resample to 2 minutes and calculate the average
ghi = data_subset['ghi'].resample('2T').mean()
# Get Clear Sky values
cs = location.get_clearsky(ghi.index)
cs = cs.loc[ghi.index]
#print(data_subset)
# Plot
plt.figure(figsize=(10, 4))
cs['ghi'].plot(label='Clear Sky GHI')
ghi.plot(label='Measured GHI')
plt.title('Ineichen, Brasília station GHI (Days 13 and 14)')
plt.ylabel('Irradiance $(W/m^2)$')
plt.xlabel('Time')
plt.legend(loc='lower right')
plt.show()
When I got home to analyze more calmly and do tests, I noticed that there was another problem. The highest ghi values recorded for this day exceed the value of 1000 W/m^2, however these values are not being plotted on the graph.
I tried just removing the decimal points to see if that would solve the problem. However, making this change only the measured ghi data was plotted and the clear sky model resulted in 0.0
I tried creating a new separate variable in the dataframe to try to plot the ghi values separately, but I was also unsuccessful. I'll leave a printout of what the expected result would be like for the ghi.
[3
To understand what is wrong, try going step by step through the code and examining the value of each variable. By doing this, you can see that the ghi
variable is messed up:
In [5]: ghi
Out[5]:
date
2019-01-13 00:00:00 208.038157
2019-01-13 00:02:00 NaN
2019-01-13 00:04:00 NaN
2019-01-13 00:06:00 NaN
2019-01-13 00:08:00 NaN
...
2019-01-13 23:52:00 NaN
2019-01-13 23:54:00 NaN
2019-01-13 23:56:00 NaN
2019-01-13 23:58:00 NaN
2019-01-14 00:00:00 229.454473
Freq: 2T, Name: ghi, Length: 721, dtype: float64
Why is this? Well, it's because the dates you're using don't have hour/minute values, only daily. You have to read and make use of the minuto
column in the data file too. Here is a revision of your code that I think does what you want:
import pandas as pd
from pvlib.location import Location
import matplotlib.pyplot as plt
location = Location(latitude=-15.60083, longitude=-47.71306, tz="America/Sao_Paulo", altitude=1023, name="Brasilia station")
data = pd.read_csv(r"C:\Users\ksande\Downloads\brjan19.csv", sep=';')
data = data.rename(columns={'glo': 'ghi'})
# GHI values are floats, but the same symbol is used for the decimal place and
# the thousands marker, e.g. "1.262.000" == 1262.000
has_multiple_markers = data['ghi'].str.count("\.") > 1
data.loc[has_multiple_markers, 'ghi'] = data.loc[has_multiple_markers, 'ghi'].str.replace(".", "", n=1)
data['ghi'] = pd.to_numeric(data['ghi'])
data['date'] = pd.to_datetime(data['year'].astype(str) + '-01-' + data['day'].astype(str), format='%Y-%m-%d')
data['date'] = data['date'] + pd.to_timedelta(data['minuto'], unit='min')
data_subset = data[(data['date'].dt.day == 13) | (data['date'].dt.day == 14)]
data_subset.set_index('date', inplace=True)
ghi = data_subset['ghi']
cs = location.get_clearsky(ghi.index)
# Plot
plt.figure(figsize=(10, 4))
cs['ghi'].plot(label='Clear Sky GHI')
ghi.plot(label='Measured GHI')
plt.title('Ineichen, Brasília station GHI (Days 13 and 14)')
plt.ylabel('Irradiance $(W/m^2)$')
plt.xlabel('Time')
plt.legend(loc='lower right')
plt.show()