I have two dataframes (A and B). Dataframe A contains weather from 2010 to 2013. But the row which is suppoesd to contain data for the first day is missing for each of the years (e.g. '2013-01-01'). But I have a second dataframe B where I have saved the missing data. How do I create a new row in dataframe A and include the missing dates and data from dataframe B. I need to do this for all the years shown in dataframe B.
Dataframe_A:
TIMESTAMP SR TEMP2MMAX TEMP2MMIN PRECIP
0 2013-01-02 10.90 -6.33 -15.78 0.00
1 2013-01-03 10.93 0.20 -16.25 2.54
2 2013-01-04 11.19 -1.25 -15.91 0.51
3 2013-01-05 11.38 0.67 -19.19 0.00
4 2013-01-06 11.36 2.90 -13.49 0.00
... ... ... ... ... ...
359 2013-12-27 11.26 14.48 -6.43 0.00
360 2013-12-28 11.41 15.77 -6.73 0.00
361 2013-12-29 11.51 15.33 -6.30 0.00
362 2013-12-30 7.40 -3.06 -12.48 0.00
363 2013-12-31 11.33 11.21 -11.87 0.00
364 rows × 5 columns
Dataframe_B:
DATE SRAD TMAX TMIN RAIN
0 2010-01-01 6.7 5.8 -9.6 0.0
1 2011-01-01 7.9 3.3 -14.4 0.0
2 2012-01-01 9.9 5.0 -5.3 0.0
3 2013-01-01 6.6 -6.3 -15.8 0.0
My desired output is Dataframe_A as shown below:
TIMESTAMP SR TEMP2MMAX TEMP2MMIN PRECIP
**0 2010-01-01 6.7 5.8 -9.6 0.0**
1 2010-01-02 9.45 3.45 -11.23 0.0
2 2010-01-03 1.52 -7.49 -10.96 0.0
3 2010-01-04 6.33 2.03 -7.72 0.0
4 2010-01-05 4.66 2.16 -5.07 0.0
.. ... ... ... ... ...
**0 2013-01-01 6.6 -6.3 -15.8 0.00**
1 2013-01-02 10.90 -6.33 -15.78 0.00
2 2013-01-03 10.93 0.20 -16.25 2.54
3 2013-01-04 11.19 -1.25 -15.91 0.51
4 2013-01-05 11.38 0.67 -19.19 0.00
I tried the code below, but it doesn't work
for year in range(2010, 2014):
first_row = Dataframe_B[Dataframe_B['DATE']['TIMESTAMP'].dt.year == year].iloc[0]
for year in range(2010, 2014):
pd.concat([Dataframe_A[Dataframe_A['TIMESTAMP'].dt.year == year], first_row.to_frame()],
, ignore_index=True)
Output:
0 TIMESTAMP SR TEMP2MMAX TEMP2MMIN PRECIP
0 NaN 2013-01-02 10.90 -6.33 -15.78 0.00
1 NaN 2013-01-03 10.93 0.20 -16.25 2.54
2 NaN 2013-01-04 11.19 -1.25 -15.91 0.51
3 NaN 2013-01-05 11.38 0.67 -19.19 0.00
4 NaN 2013-01-06 11.36 2.90 -13.49 0.00
... ... ... ... ... ... ...
364 2013-01-01 00:00:00 NaT NaN NaN NaN NaN
365 6.6 NaT NaN NaN NaN NaN
366 -6.3 NaT NaN NaN NaN NaN
367 -15.8 NaT NaN NaN NaN NaN
368 0.0 NaT NaN NaN NaN NaN
I also tried a second code like this below:
for year in range(2010, 2014):
second_df = Dataframe_A[Dataframe_A['DATE'].dt.year == year].copy()
first_row_B = Dataframe_B[Dataframe_B['DATE'].dt.year == year].iloc[0]
for year in range(2010, 2014):
second_df = second_df[second_df['TIMESTAMP'] != pd.Timestamp(f'{year}-01-01')]
second_df = pd.concat([first_row_B, second_df], axis=0, ignore_index=True).sort_values(by='TIMESTAMP').reset_index(drop=True)
The second code (when year ==2013) produced the result below:
DATE SRAD TMAX TMIN RAIN TIMESTAMP SR TEMP2MMAX TEMP2MMIN PRECIP
0 NaT NaN NaN NaN NaN 2013-01-02 10.90 -6.33 -15.78 0.00
1 NaT NaN NaN NaN NaN 2013-01-03 10.93 0.20 -16.25 2.54
2 NaT NaN NaN NaN NaN 2013-01-04 11.19 -1.25 -15.91 0.51
3 NaT NaN NaN NaN NaN 2013-01-05 11.38 0.67 -19.19 0.00
4 NaT NaN NaN NaN NaN 2013-01-06 11.36 2.90 -13.49 0.00
... ... ... ... ... ... ... ... ... ... ...
363 NaT NaN NaN NaN NaN 2013-12-31 11.33 11.21 -11.87 0.00
364 2010-01-01 6.7 5.8 -9.6 0.0 NaT NaN NaN NaN NaN
365 2011-01-01 7.9 3.3 -14.4 0.0 NaT NaN NaN NaN NaN
366 2012-01-01 9.9 5.0 -5.3 0.0 NaT NaN NaN NaN NaN
367 2013-01-01 6.6 -6.3 -15.8 0.0 NaT NaN NaN NaN NaN
368 rows × 10 columns
What you tried went wrong because Dataframe_A
and Dataframe_B
have columns with different names. Try to rename the columns before concatenating.
Dataframe_B = Dataframe_B.rename(columns={'DATE': 'TIMESTAMP', 'SRAD': 'SR', 'TMAX': 'TEMP2MMAX', 'TMIN': 'TEMP2MMIN', 'RAIN': 'PRECIP'})
Dataframe_C = pd.concat([Dataframe_A, Dataframe_B])