My unittest returns ok, but when running my code in production, I found that my value is 'wrapped' with square brackets. Further investigation shows that, it lies under the df.loc[].values
. I am expecting a single str
value.
Using the sample by cs95 and doing some slight modification, I am able to reproduce it to illustrate the idea.
This is my first time deploying python unittest and sorry for the lengthy post/info.
# test class and test code
import pandas as pd
import numpy as np
import unittest
class myclass:
def __init__(self):
mux = pd.MultiIndex.from_arrays([
list('aaaabbbbbccddddd'),
list('tuvwtuvwtuvwtuvw')
], names=['one','two'])
temp_a = np.arange(len(mux))
str_a = [ str(a) for a in temp_a ]
self.df = pd.DataFrame({'col': str_a}, mux)
def get_aw(self):
a = self.df.loc[('a','w')].values
return a
class TestAssert(unittest.TestCase):
def test_assert(self):
myclass_obj = myclass()
result = myclass_obj.get_aw()
expect = '3'
print(type(result))
print(type(expect))
print(f'result:{result}')
print(f'expect:{expect}')
self.assertEqual(result,expect)
if __name__ == '__main__':
unittest.main(verbosity=2)
Results: (I expected it to FAIL)
test_assert (__main__.TestAssert) ... test_code.py:18: PerformanceWarning: indexing past lexsort depth may impact performance.
a = self.df.loc[('a','w')].values
<class 'numpy.ndarray'>
<class 'str'>
result:[['3']] <-- square brackets
expect:3
ok
----------------------------------------------------------------------
Ran 1 test in 0.002s
OK
I changed the code with .values[0][0]
so I get the str
and I added assertTrue(isinstance(...)
Changed code:
import pandas as pd
import numpy as np
import unittest
class myclass:
def __init__(self):
mux = pd.MultiIndex.from_arrays([
list('aaaabbbbbccddddd'),
list('tuvwtuvwtuvwtuvw')
], names=['one','two'])
#self.df = pd.DataFrame({'col': np.arange(len(mux))}, mux)
temp_a = np.arange(len(mux))
str_a = [ str(a) for a in temp_a ]
self.df = pd.DataFrame({'col': str_a}, mux)
def get_aw(self):
a = self.df.loc[('a','w')].values[0][0] # updated
return a
class TestAssert(unittest.TestCase):
def test_assert(self):
myclass_obj = myclass()
result = myclass_obj.get_aw()
expect = '3'
print(type(result))
print(type(expect))
print(f'result:{result}')
print(f'expect:{expect}')
self.assertTrue(isinstance(result, str)) # added
self.assertEqual(result,expect)
if __name__ == '__main__':
unittest.main(verbosity=2)
Results:
test_assert (__main__.TestAssert) ... test_code.py:18: PerformanceWarning: indexing past lexsort depth may impact performance.
a = self.df.loc[('a','w')].values[0][0]
<class 'str'>
<class 'str'>
result:3
expect:3
ok
----------------------------------------------------------------------
Ran 1 test in 0.002s
OK
Yes, I believe that is expected behavior. From the documentation, unittest.TestCase.assertEqual(a, b)
checks that a == b
. If you run
>>> if np.array([['3']]) == '3':
... print("Here")
Here
It indeed prints "Here". This is because of the way broadcasting works in numpy. When comparing an array against a string, like in your example, numpy will infer it as an element-by-element comparison. In particular,
>>> np.array([['3']]) == '3'
array([[ True]])
>>> np.array([['3', '2', '3']]) == '3'
array([[ True, False, True]])
Notice in the second case however,
>>> if np.array([['3', '2', '3']]) == '3':
... print("Here")
Will raise an exception
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Because Python cannot cast a (1, 3) numpy array into a single boolean True or False.