I am trying Cloud Data Fusion for the first time. I have this endpoint I'd like to consume testwise:
https://waidlife.com/backend/export/index/export.csv?feedID=1&hash=4ebfa063359a73c356913df45b3fbe7f (This is a shopware export)
The header row tells the following structure:
id,title,description,link,image_link,price,availability,condition,google_product_category
When configuring the HTTP Source (a plugin available in the Data Fusion Hub) I setup the following records (please note that I set the google_product_category
to be nullable)
I also configure it to have CSV as format and skip the header row:
Now if you look at the API endpoint URL (mentioned above) you realize that the column google_product_category
is empty. I'd think that this wouldn't be a problem because the Output Schema for Data Fusion simply could ignore the value there
2021-02-25 19:38:37,192 - ERROR [Executor task launch worker for task 0:o.a.s.u.Utils@91] - Aborting task
java.lang.RuntimeException: Cannot convert line '"10042","NeoShell Reliance Jacket","Das Filson NeoShell Reliance Jacket besteht aus Polartec NeoShell der aktuell atmungsaktivsten und wasserdichtesten Membrane die es gibt. Im Gegensatz zu gewöhnlichem Shell-Material, ist NeoShell besonders soft und geräuscharm und eignet sich somit auch perfekt für die Jagd. Die Nähte der wasserdichten Reißverschlüsse sind vollständig versiegelt. Die Reißverschlüsse unter den Achseln verhindern, dass sich bei hoher Aktivität Wärme anstaut und sorgen für die richtige Belüftung. Die...","https://www.waidlife.com/regenjacken/neoshell-reliance-jacket","https://www.waidlife.com/media/image/c8/ab/aa/NeoShellRelianceJacketLifestyle2.jpg","366.75 EUR","in stock","new",""' to a record. Reason: 'java.util.NoSuchElementException: null'
at io.cdap.plugin.http.source.batch.HttpBatchSource.transform(HttpBatchSource.java:109) ~[1614281902851-0/:na]
I tried every possible combination of configurations but could just not figure out why the whole thing just won't run successfully.
For reproduction here is the JSON export for the whole pipeline: https://pastebin.com/0qkvTSvh
This is happening because of having additional ,
characters within the quoted string. As of now we do not support CSV with quoted fields having delimiter. If this is just a test input, I suggest you to try with string values that do not have ,
within. Null values are supported and should work as expected.
I have created a bug for this.