tensorflowtfx

TFX CSVExampleGen component: How to read data with "|" as separator?


I am trying my way at TensorFlow's TFX. I have a CSV that contains the pipe symbol (|) as a field separator instead of the default comma. How can I specify this when using the CsvExampleGen component?

tfx.components.CsvExampleGen(input_base=data_path)

I tried using

example_gen_pb2.Input

But I did not find any read options to specify


Solution

  • Using custom seperators is unfortunatly not supported by the default CsvExampleGen Component.

    You might want to open a GitHub Issue and implement the relatively small change yourself (if this is still an Issue for you).

    The delimiter is used here and hardcoded as comma. I suppose you could exapand the Input proto by a delimiter property and use it in the executor (with comma being the default).