apache-camelintegration

Apache Camel 2.17.3 - Exception unmarshalling CSV stream with bindy


I have written a simple route to read CSV file and save it in a new file in JSON format.

When I try to split and stream the body the unmarshal breaks with ".IllegalArgumentException: No records have been defined in the CSV".

However, it works well without split and streaming!

Unmarshal uses a BindyCsvDataFormat and a CustomCsvRecord defines the fields.

CSV Sample:
HEADER_1;HEADER_2;HEADER_3;HEADER_4;HEADER_5
data11;data12;data13;data14;data15
data21;data22;data23;data24;data25

Can you help me understand is this the correct behaviour and if so how can I control reading large files?

Please refer below:

public class MyRouteBuilder extends RouteBuilder {

    public void configure() {

        BindyCsvDataFormat bindy = new BindyCsvDataFormat(com.demo.camel.CustomCsvRecord.class);
        from("file://data?move=../completed/&include=.*.csv&charset=UTF-8")
            .log("Reading file..")
            // .split(body().tokenize("\n")).streaming()
            // .throttle(2)
            // .timePeriodMillis(3000)
            .unmarshal(bindy)
            .marshal().json(true)
            .log("writing to file")
            .to("file://target/messages?fileExist=Append");
        }
    }

    @CsvRecord(separator = ";", skipFirstLine = true )
    public class CustomCsvRecord implements Serializable{

    private static final long serialVersionUID = -1537445879742479656L;

    @DataField(pos = 1)
    private String header_1;

    @DataField(pos = 2)
    private String header_2;

    @DataField(pos = 3)
    private String header_3;

    @DataField(pos = 4)
    private String header_4;

    @DataField(pos = 5)
    private String header_5;
        public String getHeader_1() {
        return header_1;
    }

    public void setHeader_1(String header_1) {
        this.header_1 = header_1;
    }

    public String getHeader_2() {
        return header_2;
    }

    public void setHeader_2(String header_2) {
        this.header_2 = header_2;
    }

    public String getHeader_3() {
        return header_3;
    }

    public void setHeader_3(String header_3) {
        this.header_3 = header_3;
    }

    public String getHeader_4() {
        return header_4;
    }

    public void setHeader_4(String header_4) {
        this.header_4 = header_4;
    }

    public String getHeader_5() {
        return header_5;
    }

    public void setHeader_5(String header_5) {
        this.header_5 = header_5;
    }   
}

Solution

  • Could it be that you have set skipFirstLine = true ? But since you split with line break, skipping the first line means there are no lines to parse the CSV. Try this instead .split().tokenize("\n", 1000).streaming(). This basically means we want to split based on the token "\n" and we want to group N number of lines together. In this case it is 1000 so it will at the most group 1000 lines together in a split.

    So if you send 10 000 rows it will split them in 10 chunks.

    Now the issue is if you have skipFirstLine set it will skip the first line. Since you were previously splitting each line, when it came to the CSV parser it would skip that line since that is what it was told to do. So, then there is nothing to parse and it complained there are no records.

    The problem now is that what happens after you split by say every 1000 rows and you get 10 000 rows. Will it remove the first line in every split chunk? I would suspect so. I would think the best thing is to add a a processor before the split. Convert the body to a byte[]. Search for the first "\n" and simply delete that row or get the byteArray after that index. Then you can do normal split and remove skipFirstRow.

    Also, your output is in list but that is due to your mapping.