algorithmdebuggingfacial-identification

Bug ommits data interval - possible causes?


I have encountered a strange bug and wanted to ask if someone has any idea what might be the cause.

The bug:

When I correlate the facial width-to-height ratio (FWHR) of NHL players with their penalty minutes per games played (PIM/GP), a section of the FWHR distribution is blank (between 1.98-2 and 2-2.022; see Figure 1). The FWHR is an int/int ratio where each int has two digits. It is extremely unlikely this reflects a true signal and is therefore most likely a bug in the code I am using.

The section between FWHR 1.98-2 and 2-2.022 is blank, for no apparent reason.

Context: I know my PIM/P data is correct (retrieved from NHL's website) but the FWHR was calculated using an algorithm. The problem most likely lies within this facial measuring algorithm. I have not been able to locate the bug and therefore turn to you for advice.

Question: While the code for the facial measuring algorithm is far too long to be presented here, I wanted to ask if someone might have any ideas on what might have caused it/ what I could check for?


Solution

  • The Nature of Ratio Distributions

    Idea: It should be impossible for a ratio of two 2-digit integers to fill all 2-decimal values between two integers. Could such impossible values be especially pronounced around 2.0? For example, maybe 1.99 can not be represented?

    Method: Loop through 2-digit ints and append the ratio to a list. Then check if the list lacks values around 2.0 (e.g., 1.99).

    import numpy as np 
    from matplotlib import pyplot as plt
    
    def int_ratio_generator():
        ratio_list = []
        for i in range(1,100):
            for j in range(1,100):
                ratio = i/j
                ratio_list.append(ratio)
        return ratio_list
        
    ratio_list = int_ratio_generator()
    key = 1.99 in ratio_list
    print('\nis 1.99 a possible ratio from 2-digit ints?', key)
    fig, ax = plt.subplots()
    X = ratio_list
    Y = np.random.rand(len(ratio_list),1)
    plt.scatter(X, Y, color='C0')
    plt.xlim(1.8, 2.2)
    plt.show()
    

    See output image here

    Conclusion: