csvtextdata-quality

What software is availible for data quality checking


I'm looking to identify some possible software options that will allow for custom rules to manipulate bulk data files (.csv) For example, proper capitalization (allowing for states to remain capital and unique surnames), identifying the word count of specific words in a field, and some other custom rules. Any guidance would be appreciated.


Solution

  • You could use Talend Open Studio for this task. It is an Opensource ETL tool for data manipulation and integration. You can for example ImportCSV >> DATABASE >> perform transformations >> ExportCSV. The possibilities are endless.

    You can find it here: http://www.talend.com/products-data-integration/talend-open-studio.php

    It also sounds like you might be looking to create a profile of the data. For this you can use Talend Open Profiler, they recently added support for flat files such as your .csv. It is simple to use and you should be up and running in 30 mins.

    You can find the download here: http://www.talend.com/products-data-quality/talend-open-profiler.php

    You can find some tutorials here:http://www.talendforge.org/tutorials/menu.php

    On the tutorials choose the Data Quality tab, and scroll down until 'Talend Open Profiler'

    It is my first step in assessing data quality on a new dataset.