pythonpython-3.xpandasxlrdxlwt

datetime.datetime type from xlsx file is blocking the process


I am trying to achieve an excel translator using thread, queue and semaphore. It was working perfectly and was I using it to translate very large XLSX file in a very short time.

Today I was trying to translate one of the document, but my program was not working anymore. After hours of debugging, I found out that when the type datetime.datetime is found in an excel document, it blocks everything

Here is my code and the output to give you a better understanding:

    print("feed raw aquired")
    for row_index in range(sheet.shape[0]):
        for col_index in range(sheet.shape[1]):
            print ("START --------------------")
            value = sheet.iat[row_index, col_index]
            print (type(value))
            print (value)
            if type(value) == str:
                print("str")
                raw_data.put({ "row" : row_index, "col" : col_index, "value" : value })
            elif math.isnan(value):
                print("nan")
                translated_data.put({ "row" : row_index, "col" : col_index, "value" : "" })
            elif type(value) == float:
                print("float")
                translated_data.put({ "row" : row_index, "col" : col_index, "value" : value })
            elif type(value) == numpy.int64 or type(value) == int:
                print("int")
                translated_data.put({ "row" : row_index, "col" : col_index, "value" : value })
            else:
                print("else")
                translated_data.put({ "row" : row_index, "col" : col_index, "value" : value })
            print ("END --------------------")
    sem.release()
    print("feed raw released")

The output is

START --------------------
<class 'float'>
nan
nan
END --------------------
START --------------------
<class 'numpy.float64'>
nan
nan
END --------------------
START --------------------
<class 'str'>
기저귀
str
END --------------------
START --------------------
<class 'str'>
M900009355
str
END --------------------
START --------------------
<class 'str'>
네이쳐러브메레 소프트핏 밴드 특대형 4팩
str
END --------------------
START --------------------
<class 'str'>
제조일자
str
END --------------------
START --------------------
<class 'str'>
2021-01-20
str
END --------------------
START --------------------
<class 'float'>
2.0
float
END --------------------
START --------------------
<class 'float'>
9110.0
float
END --------------------
START --------------------
<class 'int'>
18220
int
END --------------------
START --------------------
<class 'float'>
34900.0
float
END --------------------
START --------------------
<class 'str'>
리퍼상품
str
END --------------------
START --------------------
<class 'float'>
nan
nan
END --------------------
START --------------------
<class 'float'>
nan
nan
END --------------------
START --------------------
<class 'numpy.float64'>
nan
nan
END --------------------
START --------------------
<class 'str'>
기저귀
False 0
str
END --------------------
START --------------------
<class 'str'>
M900009357
str
END --------------------
START --------------------
<class 'str'>
네이쳐러브메레 소프트핏 팬티 특대형 4팩
str
END --------------------
START --------------------
<class 'str'>
유통기한
str
END --------------------
START --------------------
<class 'datetime.datetime'>
2024-01-12 00:00:00

When the function meet a datetime.datetime. it blocks everything it does not even continue and print the END--------------------

This is a very weird behavious I would like to understand, I don't think I have seen such a thing before. If you could help me that would be awesome :D

If you wanna see the full code of this translator here is it : https://github.com/mathias-vandaele/xlsx-translator

Thank you all


Solution

  • math.isnan(value) when value is datetime.datetime seems to break silently

    using pd.isna(value) froms pandas seems to have resolved the issue