etlpipelinedata-pipelinebenthos

Is there any way where we can generate output file based on the input data in benthos?


For example: Input Data:

{"date":"03-11-22", "message":"This is message"},
{"date":"03-30-22", "message":"This is message"},
{"date":"04-03-22", "message":"This is message"},
{"date":"04-15-22", "message":"This is message"},
{"date":"08-18-22", "message":"This is message"},
{"date":"08-28-22", "message":"This is message"}

The output should generate the file name according to the month and push the data in that month's file.

Output: Given input should create 3 files,

032022_data.log
042022_data.log
082022_data.log 

Solution

  • The path field of the file output supports interpolation. Please try the following config:

    input:
      generate:
        mapping: |
          root = [
                   {"date":"03-11-22", "message":"This is message"},
                   {"date":"03-30-22", "message":"This is message"},
                   {"date":"04-03-22", "message":"This is message"},
                   {"date":"04-15-22", "message":"This is message"},
                   {"date":"08-18-22", "message":"This is message"},
                   {"date":"08-28-22", "message":"This is message"}
                 ]
        count: 1
        interval: 0s
      processors:
        - unarchive:
            format: json_array
    
    output:
      file:
        path: ${! json("date").replace_all("-", "") }_data.log
    

    It produces the following files:

    031122_data.log
    033022_data.log
    040322_data.log
    041522_data.log
    081822_data.log
    082822_data.log
    

    Update: Based on the comments, I believe this pipeline should do what you need:

    input:
      generate:
        mapping: |
          root = [
                   {"date":"03-11-22", "message":"This is message"},
                   {"date":"03-30-22", "message":"This is message"},
                   {"date":"04-03-22", "message":"This is message"},
                   {"date":"04-15-22", "message":"This is message"},
                   {"date":"08-18-22", "message":"This is message"},
                   {"date":"08-28-22", "message":"This is message"}
                 ]
        count: 1
        interval: 0s
      processors:
        - unarchive:
            format: json_array
        - mapping: |
            meta month = this.date.re_replace_all("-.*-","")
        - group_by_value:
            value: ${! meta("month") }
        - select_parts:
            parts:
              - 1
    
    output:
      file:
        path: ${! meta("month") }_data.log
    

    It creates these files:

    0322_data.log
    0422_data.log
    0822_data.log
    

    I'm not really sure where that 20 in between the month and the year is supposed to come from, so I left it out.