streamsets

Not able to read data from Google Cloud Platform in StreamSets Data Collector


I am trying to create a pipeline in StreamSets Data Collector to read data from a Google Cloud Platform bucket and load the data into the same bucket with a different file name.

The data file in the bucket is in JSON form.

I used the Google Cloud Storage origin in StreamSets Data Collector and gave below properties:

Could someone correct or provide any alternative options?


Solution

  • This is documented in Common Prefix, Prefix Pattern, and Wildcards.

    Neither of these should contain the bucket name (since that is configured separately) or the protocol. In your case, it looks like you can use something like: