validationunit-testingtest-datadata-scrubbing

Unit Testing data?


Our software manages a lot of data feeds from various sources: real time replicated databases, files FTPed automatically, scheduled running of database stored procedures to cache snapshots of data from linked servers and numerous other methods of acquiring data.

We need to verify and validate this data:

In many ways this is like Unit Testing: there are many types of check to make, just add a new check to the list and just re-run each class of test in response to a particular event. There are already nice GUIs for running tests, perhaps even being able to schedule them.

Is this a good approach? Are there better, similarly generalised, patterns for data validation?

We're a .NET shop, would Windows Workflow (WF) be a better more flexible solution?


Solution

  • Unit testing is not analogous to what you need to do. Its more along the lines of integration testing or acceptance testing. But that's beside the point.

    Your system has a heavy requirement for validation of data coming into the system. Data comes into the system by various means, and I would assume it needs to be verified in different ways.

    Workflow is good for designing and controlling business processes (logic) that are apt to change or require human intervention. It is agnostic when it comes to the subject of validation. However, hosting your validation process as a workflow may be a good idea, as workflows are designed to be flexible, long living and capable of human intervention. Hosting your validation process within a workflow state machine framework would allow you to define validation strategies for different types of data import at runtime.

    You need to design a validation framework that relies heavily on composition over inheritance for its logic. Break apart all the different ways that data can be imported into the system and validated into atomic steps. Group those steps by responsibility and create interfaces with the barest, most minimum properties and methods required for an implementing object to perform each. Create base classes that are composed of these different interfaces. From this framework you can mix and match implementations that suit the particular import or validation step.

    One last thing. Workflows are serialized to xaml for long term storage. Your classes should be xaml serializable as well to make the transition from activity to repository and back again as smooth and simple as possible.