I am new to unit tests in general and Python's unittest
in particular.
When trying to validate a pandas dataframe df
, I typically:
df
is empty (using one of the methods detailed here).df
contains the expected columns.I would like to standardize the way I am running these tests.
The pandas documentation lists available assert functions (assert_frame_equal
, assert_series_equal
, assert_index_equal
and assert_extention_array_equal
), but as far as I understand I cannot use those to run the aforementioned tests.
I came up with the following class:
import pandas as pd
import unittest
class DataFrameTestCase(unittest.TestCase):
def test_if_dataframe_is_empty(self,df):
self.assertTrue(len(df) > 0)
def test_if_dataframe_contains_required_columns(self,df,columns):
self.assertTrue(set(df.columns.to_list()) == set(columns))
The following snippet...
data = [[412256, 142193, 4], [644402, 5208768 ,25]]
columns = ['easting', 'northing','elevation']
df = pd.DataFrame(data=data, columns=columns)
dataframetestcase = DataFrameTestCase()
dataframetestcase.test_if_dataframe_is_empty(df)
dataframetestcase.test_if_dataframe_contains_required_columns(df, columns)
...does not return any error.
On the other hand, passing an empty dataframe df
or a different columns
list raises an AssertionError: False is not true
error.
Is this the way to proceed or is there a built-in set of pandas
or unittest
assert functions that handle this in a better way?
I'll try to show you a standard use of unittest (at least on my opinion) for your goal.
The following is your code with some changes. The name of the script is: pandas_test_routine.py
:
import pandas as pd
import unittest
data = [[412256, 142193, 4], [644402, 5208768, 25]]
columns = ['easting', 'northing', 'elevation']
# this is a not desired data because is empty
data_empty = []
class DataFrameTestCase(unittest.TestCase):
# the method setUp() is executed before any test
def setUp(self):
self.data = data
self.columns = columns
self.sut = pd.DataFrame(data=self.data, columns=self.columns)
def test_if_dataframe_IS_NOT_empty(self):
self.assertFalse(self.sut.empty)
def test_if_dataframe_CONTAINS_required_columns(self):
self.assertTrue(set(self.sut.columns.to_list()) == set(self.columns))
def test_if_dataframe_IS_empty(self):
self.data = data_empty
self.sut = pd.DataFrame(data=self.data, columns=self.columns)
# We can set a custumize message error by failIf()
self.failIf(self.sut.empty, "data frame is empty")
if __name__ == '__main__':
unittest.main()
To execute the tests you can do (in a terminal):
/path/to/interpreter/python /path/to/script/pandas_test_routine.py
While the first and the second test are successfully executed, the execution of the third test stops with the following error:
AssertionError: True is not false : data frame is empty
Note that the instruction failIf()
is deprecated but I think it is suited for your needs.
setUp()
Useful the method setUp()
of the class TestCase
: it is execute before the execution of every tests.
In your case setUp()
create the object sut
with the correct data.
Note: sut
stands for System Under Test (in your case is an instance of the class DataFrame
).
unittest.main()
The snippet of code:
if __name__ == '__main__':
unittest.main()
executes all methods of the class DataFrameTestCase
with the name which starts with test