pythonmachine-learningjupyter-notebookscopekaggle

Python 3 changes variable even after if condition fails


I have a kaggle notebook(link here). The notebook iterates over a list of number(candidate_max_leaf_nodes) and then gets a mae for each number by massing it as a parameter(ignore the other parameters) to the get_mae function.

The goal is to store the value from the list that generates the smallest value of mae.

Although, I have a condition to only change min_mae and min_val(the value from the list that produces smallest mae). It so happens that even when the condition for changing min_val and min_mae fails, which is curr_mae < min_mae, it still changes the value.

The code that produces the issue.

candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]
# Write loop to find the ideal tree size from candidate_max_leaf_nodes
min_mae = float('inf')
min_val = 0
for s in candidate_max_leaf_nodes:
    print(f"s  = {s}")
    min_val = s
    curr_mae = get_mae(s,train_X, val_X, train_y, val_y)
    print(f" Preloop || curr mae = {curr_mae}")
    print(f" Preloop || min mae = {min_mae}")
    print(f" Preloop || min val  = {min_val}")
    print(f" Preloop || cond = {curr_mae < min_mae}")
    if(curr_mae < min_mae):
            min_mae = curr_mae
            min_val = s
    else:
        continue

print(f" Post || min_mae = { min_mae}")   
print(f" Post || min_val = { min_val}")   

# Store the best value of max_leaf_nodes (it will be either 5, 25, 50, 100, 250 or 500)
best_tree_size = min_val

# Check your answer
step_1.check()

Here are the logs or prints from the execution of the code:

s  = 5
 Preloop || curr mae = 35044.51299744237
 Preloop || min mae = inf
 Preloop || min val  = 5
 Preloop || cond = True
s  = 25
 Preloop || curr mae = 29016.41319191076
 Preloop || min mae = 35044.51299744237
 Preloop || min val  = 25
 Preloop || cond = True
s  = 50
 Preloop || curr mae = 27405.930473214907
 Preloop || min mae = 29016.41319191076
 Preloop || min val  = 50
 Preloop || cond = True
s  = 100
 Preloop || curr mae = 27282.50803885739
 Preloop || min mae = 27405.930473214907
 Preloop || min val  = 100
 Preloop || cond = True
s  = 250
 Preloop || curr mae = 27893.822225701646
 Preloop || min mae = 27282.50803885739
 Preloop || min val  = 250
 Preloop || cond = False
s  = 500
 Preloop || curr mae = 29454.18598068598
 Preloop || min mae = 27282.50803885739
 Preloop || min val  = 500
 Preloop || cond = False
 Post || min_mae = 27282.50803885739
 Post || min_val = 500

As you can see from the logs that the value for min_value and min_mae are altered when the value for s is 250 but the condition is false, so they shouldn't change.

Note: I looked a the prints closely and it seems that the values for min_mae and min_val are updating even before the if condition.

If anyone is having issues running the notebook, then run all the cell before it.

Can someone please help me understand what I'm doing wrong here.


Solution

  • You are asserting min_val = s near the start of your loop, hence the min_val is changing even though the condition is false.

    As for min_mae, it appears to be behaving as expected.

    When s = 100, curr_val < min_mae is True, so it sets min_mae=27282.50803885739 in that loop. It only displays that updated value in the next iteration of the loop. Hence, it may look like min_mae change in loop s=250, when it reality it was previously changed at the end of loop s=100 .

    Might be worth printing out the results after each loop as opposed to the very end when all loops are finished. Will help you visualize what's happening to the data a bit easier.