python dataframe numpy unit-testing assert

assertEqual tests ok when numpy.ndarray vs str, is that expected? or what have I done wrongly?

My unittest returns ok, but when running my code in production, I found that my value is 'wrapped' with square brackets. Further investigation shows that, it lies under the df.loc[].values . I am expecting a single str value.

Using the sample by cs95 and doing some slight modification, I am able to reproduce it to illustrate the idea.

This is my first time deploying python unittest and sorry for the lengthy post/info.

# test class and test code
import pandas as pd
import numpy as np
import unittest

class myclass:
    def __init__(self):
        mux = pd.MultiIndex.from_arrays([
            list('aaaabbbbbccddddd'),
            list('tuvwtuvwtuvwtuvw')
        ], names=['one','two'])
        temp_a = np.arange(len(mux))
        str_a = [ str(a) for a in temp_a ]
        self.df = pd.DataFrame({'col': str_a}, mux)

    def get_aw(self):
        a = self.df.loc[('a','w')].values
        return a

class TestAssert(unittest.TestCase):
    def test_assert(self):
        myclass_obj = myclass()
        result = myclass_obj.get_aw()
        expect = '3'
        print(type(result))
        print(type(expect))
        print(f'result:{result}')
        print(f'expect:{expect}')
        self.assertEqual(result,expect)


if __name__ == '__main__':
    unittest.main(verbosity=2)

Results: (I expected it to FAIL)

test_assert (__main__.TestAssert) ... test_code.py:18: PerformanceWarning: indexing past lexsort depth may impact performance.
  a = self.df.loc[('a','w')].values
<class 'numpy.ndarray'>
<class 'str'>
result:[['3']]   <-- square brackets
expect:3
ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

I changed the code with .values[0][0] so I get the str and I added assertTrue(isinstance(...)

Changed code:

import pandas as pd
import numpy as np
import unittest

class myclass:
    def __init__(self):
        mux = pd.MultiIndex.from_arrays([
            list('aaaabbbbbccddddd'),
            list('tuvwtuvwtuvwtuvw')
        ], names=['one','two'])
        #self.df = pd.DataFrame({'col': np.arange(len(mux))}, mux)
        temp_a = np.arange(len(mux))
        str_a = [ str(a) for a in temp_a ]
        self.df = pd.DataFrame({'col': str_a}, mux)

    def get_aw(self):
        a = self.df.loc[('a','w')].values[0][0]   # updated
        return a

class TestAssert(unittest.TestCase):
    def test_assert(self):
        myclass_obj = myclass()
        result = myclass_obj.get_aw()
        expect = '3'
        print(type(result))
        print(type(expect))
        print(f'result:{result}')
        print(f'expect:{expect}')
        self.assertTrue(isinstance(result, str))  # added
        self.assertEqual(result,expect)


if __name__ == '__main__':
    unittest.main(verbosity=2)

Results:

test_assert (__main__.TestAssert) ... test_code.py:18: PerformanceWarning: indexing past lexsort depth may impact performance.
  a = self.df.loc[('a','w')].values[0][0]
<class 'str'>
<class 'str'>
result:3
expect:3
ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

Solution

Yes, I believe that is expected behavior. From the documentation, unittest.TestCase.assertEqual(a, b) checks that a == b. If you run

>>> if np.array([['3']]) == '3':
...     print("Here")
Here

It indeed prints "Here". This is because of the way broadcasting works in numpy. When comparing an array against a string, like in your example, numpy will infer it as an element-by-element comparison. In particular,

>>> np.array([['3']]) == '3'
array([[ True]])
>>> np.array([['3', '2', '3']]) == '3'
array([[ True, False,  True]])

Notice in the second case however,

>>> if np.array([['3', '2', '3']]) == '3':
...     print("Here")

Will raise an exception

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Because Python cannot cast a (1, 3) numpy array into a single boolean True or False.