I have a kaggle notebook(link here). The notebook iterates over a list of number(candidate_max_leaf_nodes
) and then gets a mae for each number by massing it as a parameter(ignore the other parameters) to the get_mae
function.
The goal is to store the value from the list that generates the smallest value of mae.
Although, I have a condition to only change min_mae
and min_val
(the value from the list that produces smallest mae). It so happens that even when the condition for changing min_val
and min_mae
fails, which is curr_mae < min_mae
, it still changes the value.
The code that produces the issue.
candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]
# Write loop to find the ideal tree size from candidate_max_leaf_nodes
min_mae = float('inf')
min_val = 0
for s in candidate_max_leaf_nodes:
print(f"s = {s}")
min_val = s
curr_mae = get_mae(s,train_X, val_X, train_y, val_y)
print(f" Preloop || curr mae = {curr_mae}")
print(f" Preloop || min mae = {min_mae}")
print(f" Preloop || min val = {min_val}")
print(f" Preloop || cond = {curr_mae < min_mae}")
if(curr_mae < min_mae):
min_mae = curr_mae
min_val = s
else:
continue
print(f" Post || min_mae = { min_mae}")
print(f" Post || min_val = { min_val}")
# Store the best value of max_leaf_nodes (it will be either 5, 25, 50, 100, 250 or 500)
best_tree_size = min_val
# Check your answer
step_1.check()
Here are the logs or prints from the execution of the code:
s = 5
Preloop || curr mae = 35044.51299744237
Preloop || min mae = inf
Preloop || min val = 5
Preloop || cond = True
s = 25
Preloop || curr mae = 29016.41319191076
Preloop || min mae = 35044.51299744237
Preloop || min val = 25
Preloop || cond = True
s = 50
Preloop || curr mae = 27405.930473214907
Preloop || min mae = 29016.41319191076
Preloop || min val = 50
Preloop || cond = True
s = 100
Preloop || curr mae = 27282.50803885739
Preloop || min mae = 27405.930473214907
Preloop || min val = 100
Preloop || cond = True
s = 250
Preloop || curr mae = 27893.822225701646
Preloop || min mae = 27282.50803885739
Preloop || min val = 250
Preloop || cond = False
s = 500
Preloop || curr mae = 29454.18598068598
Preloop || min mae = 27282.50803885739
Preloop || min val = 500
Preloop || cond = False
Post || min_mae = 27282.50803885739
Post || min_val = 500
As you can see from the logs that the value for min_value
and min_mae
are altered when the value for s
is 250
but the condition is false, so they shouldn't change.
Note: I looked a the prints closely and it seems that the values for min_mae
and min_val
are updating even before the if condition.
If anyone is having issues running the notebook, then run all the cell before it.
Can someone please help me understand what I'm doing wrong here.
You are asserting min_val = s
near the start of your loop, hence the min_val
is changing even though the condition is false.
As for min_mae
, it appears to be behaving as expected.
When s = 100
, curr_val < min_mae
is True, so it sets min_mae=27282.50803885739
in that loop. It only displays that updated value in the next iteration of the loop. Hence, it may look like min_mae
change in loop s=250
, when it reality it was previously changed at the end of loop s=100
.
Might be worth printing out the results after each loop as opposed to the very end when all loops are finished. Will help you visualize what's happening to the data a bit easier.