I have two dataframes. One called SERVICES and one called TIMES.
I am joining them together like so:
servicesMerged = pd.merge(services, times, left_on='Ref_Id', right_on='Ref_ID')
This is fine and works, except some of the TIMES data is missing a ref_id.
This is service data for a booking system, so for example we might have this:
**TIMES**
Ref_Id | TIMES
1 | 30
2 | 15
3 | 10
**SERVICES**
Ref_ID | Name
1 | Mowing
2 | Raking
3 | Blowing
4 | Trimming
What is happening, is we're getting a nice merge, but the service Trimming
does not come into the new dataset, as it's missing the time in the times dataframe.
What we need it to do, is, if the time is missing (as per this example) that we add some data, so we'd add say 15 minutes.
Something you would traditionally do like so:
If not exists time:
Create a time and make it 15
I've tried how = inner, outer, left, right, but nothing works.
How can I, if a row is missing like above, force the data to be added to the merged data?
Thank you.
Creating the dfs like this:
times = pd.DataFrame({'Ref_Id':[1,2,3],
'TIMES':[30, 15, 10]})
services = pd.DataFrame({'Ref_ID':[1,2,3,4],
'Name':['Mowing', 'Raking', 'Blowing', 'Trimming']})
Then you should just be able to add how='left' to your code (note I had to swap your left_on
and right_on
, as the capital D in Ref_ID is in the left service table):
servicesMerged = pd.merge(services, times, left_on='Ref_ID', right_on='Ref_Id', how='left')
Ref_ID Name Ref_Id TIMES
0 1 Mowing 1.0 30.0
1 2 Raking 2.0 15.0
2 3 Blowing 3.0 10.0
3 4 Trimming NaN NaN
alternatively, you can write it like this:
servicesMerged = services.merge(times, left_on='Ref_ID', right_on='Ref_Id', how='left')
To fill in the blank times, you can use .fillna()
:
servicesMerged['TIMES'] = servicesMerged['TIMES'].fillna(15)
Ref_ID Name Ref_Id TIMES
0 1 Mowing 1.0 30.0
1 2 Raking 2.0 15.0
2 3 Blowing 3.0 10.0
3 4 Trimming NaN 15.0
NB:
If you were to have the Ref_Id column name to match in both tables (either both Ref_Id or both Ref_ID) you could then just use on='Ref_Id'
instead of both left and right on, and then you wouldn't get the second Ref_Id column in the output.