elasticsearchlogstashelastic-stacklogstash-configurationelk

csv data parsing issue with logstash


I'm trying to upload csv file into logstash.

My csv file has more than 200+ rows, in that one row for the reference given below.

$ cat report.csv

school_code,student_number,join-date,stu-id,school-type,other-details
12345,778890,30-12-2022,frank.van,Private,"[S-Name] Frank Van Puffelen.
[Area/Pin] San Francisco, CA.
[Locality/Status/Population] Northern California.
[City Zipcode#] 012345678
[Relevant Details] Center for flagship.
[Department] Staff services.
[Peninsula] Pacific Ocean.
[Services and resources] ADA information.
[Created By] Alvaro Alfaro.

Activities:Transportation,Parking:All_Access
***** Apply Online: https://stuedu.org/sfg/7894561123*****
"

And my logstash conf file as follows.

input {
  file {
    path => "/usr/share/logstash/pipeline/app.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline { 
    pattern => '^"'
    negate => "true"
    what => "next"
     }
  }
}
filter {
  csv {
      separator => ","
      columns => ["school_code", "student_number", "join-date", "stu-id", "school-type", "other-details"]
      }
    mutate {
      gsub => ["message", '"', '', "message", "\r\n", '']
      strip => ["message", "message"]
     }
}
output {
   elasticsearch {
     action => "index"
     hosts => "http://elasticsearch:9200"
     index => "school-data"
     document_id => "%{school_code}"
      }
stdout {}
}

In logstash log it shows CSV column header value instead of loading the actual value.

logstash         | {
logstash         |               "message" => "school_code,student_number,join-date,stu-id,school-type,other-details12345,778890,30-12-2022,frank.van,Private,[S-Name] Frank Van Puffelen.[Area/Pin] San Francisco, CA.[Locality/Status/Population] Northern California.[City Zipcode#] 012345678[Relevant Details] Center for flagship.[Department] Staff services.[Peninsula] Pacific Ocean.[Services and resources] ADA information.[Created By] Alvaro Alfaro.Activities:Transportation,Parking:All_Access***** Apply Online: https://stuedu.org/sfg/7894561123*****",
logstash         |            "@timestamp" => 2023-06-30T06:15:32.602789093Z,
logstash         |                  "host" => "06605347d1f9",
logstash         |                  "tags" => [
logstash         |         [0] "multiline"
logstash         |     ],
logstash         |         "school_code" => "school_code",
logstash         |          "join-date" => "join-date",
logstash         |     "student_number" => "student_number",
logstash         |              "@version" => "1",
logstash         |                  "path" => "/usr/share/logstash/pipeline/app.csv",
logstash         |                  "stu-id" => "stu-id",
logstash         |        "school-type" => "school-type",
logstash         |           "other-details" => "other-details"
logstash         | }

csv filter has parsed the values from the header row and ignored the remaining columns. It is not expected result.

skip_header => true in filter couldn't help.

Can you advise how to upload my csv file data into logstash? My other-details column contains of double quotes, spaces & special characters.


Solution

  • Tldr;

    It seems like on the first iteration you match multiline match the line with the header and the lines with the actual information for the first row.

    Solution;

    Set up

    Here is my log.csv file

    school_code,student_number,join-date,stu-id,school-type,other-details
    111111,778890,30-12-2022,frank.van,Private,"[S-Name] Frank Van Puffelen.
    [Area/Pin] San Francisco, CA.
    [Locality/Status/Population] Northern California.
    [City Zipcode#] 012345678
    [Relevant Details] Center for flagship.
    [Department] Staff services.
    [Peninsula] Pacific Ocean.
    [Services and resources] ADA information.
    [Created By] Alvaro Alfaro.
    
    Activities:Transportation,Parking:All_Access
    ***** Apply Online: https://stuedu.org/sfg/7894561123*****
    "
    222222,778890,30-12-2022,frank.van,Private,"[S-Name] Frank Van Puffelen.
    [Area/Pin] San Francisco, CA.
    [Locality/Status/Population] Northern California.
    [City Zipcode#] 012345678
    [Relevant Details] Center for flagship.
    [Department] Staff services.
    [Peninsula] Pacific Ocean.
    [Services and resources] ADA information.
    [Created By] Alvaro Alfaro.
    
    Activities:Transportation,Parking:All_Access
    ***** Apply Online: https://stuedu.org/sfg/7894561123*****
    "
    333333,778890,30-12-2022,frank.van,Private,"[S-Name] Frank Van Puffelen.
    [Area/Pin] San Francisco, CA.
    [Locality/Status/Population] Northern California.
    [City Zipcode#] 012345678
    [Relevant Details] Center for flagship.
    [Department] Staff services.
    [Peninsula] Pacific Ocean.
    [Services and resources] ADA information.
    [Created By] Alvaro Alfaro.
    
    Activities:Transportation,Parking:All_Access
    ***** Apply Online: https://stuedu.org/sfg/7894561123*****
    "
    
    

    Solution

    Here is the full pipeline configuration

    input {
       file {
        path => "/tmp/log.csv"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        codec => multiline {
          pattern => "^([0-9]+,)"
          negate => true
          what => "previous"
          auto_flush_interval => 1
        }
      }
    }
    filter {
      csv {
          separator => ","
          columns => ["school_code", "student_number", "join-date", "stu-id", "school-type", "other-details"]
          skip_header => true
      }
    }
    
    output {
        stdout { codec => rubydebug }
    }
    

    Results

    Which yield the following.

    {
                "stu-id" => "frank.van",
               "message" => "111111,778890,30-12-2022,frank.van,Private,\"[S-Name] Frank Van Puffelen.\n[Area/Pin] San Francisco, CA.\n[Locality/Status/Population] Northern California.\n[City Zipcode#] 012345678\n[Relevant Details] Center for flagship.\n[Department] Staff services.\n[Peninsula] Pacific Ocean.\n[Services and resources] ADA information.\n[Created By] Alvaro Alfaro.\nActivities:Transportation,Parking:All_Access\n***** Apply Online: https://stuedu.org/sfg/7894561123*****\n\"",
                   "log" => {
            "file" => {
                "path" => "/tmp/log.csv"
            }
        },
              "@version" => "1",
                  "host" => {
            "name" => "22966421108c"
        },
        "student_number" => "778890",
             "join-date" => "30-12-2022",
         "other-details" => "[S-Name] Frank Van Puffelen.\n[Area/Pin] San Francisco, CA.\n[Locality/Status/Population] Northern California.\n[City Zipcode#] 012345678\n[Relevant Details] Center for flagship.\n[Department] Staff services.\n[Peninsula] Pacific Ocean.\n[Services and resources] ADA information.\n[Created By] Alvaro Alfaro.\nActivities:Transportation,Parking:All_Access\n***** Apply Online: https://stuedu.org/sfg/7894561123*****\n",
           "school_code" => "111111",
           "school-type" => "Private",
                  "tags" => [
            [0] "multiline"
        ],
            "@timestamp" => 2023-07-03T15:26:06.499256031Z,
                 "event" => {
            "original" => "111111,778890,30-12-2022,frank.van,Private,\"[S-Name] Frank Van Puffelen.\n[Area/Pin] San Francisco, CA.\n[Locality/Status/Population] Northern California.\n[City Zipcode#] 012345678\n[Relevant Details] Center for flagship.\n[Department] Staff services.\n[Peninsula] Pacific Ocean.\n[Services and resources] ADA information.\n[Created By] Alvaro Alfaro.\nActivities:Transportation,Parking:All_Access\n***** Apply Online: https://stuedu.org/sfg/7894561123*****\n\""
        }
    }
    {
                "stu-id" => "frank.van",
               "message" => "222222,778890,30-12-2022,frank.van,Private,\"[S-Name] Frank Van Puffelen.\n[Area/Pin] San Francisco, CA.\n[Locality/Status/Population] Northern California.\n[City Zipcode#] 012345678\n[Relevant Details] Center for flagship.\n[Department] Staff services.\n[Peninsula] Pacific Ocean.\n[Services and resources] ADA information.\n[Created By] Alvaro Alfaro.\nActivities:Transportation,Parking:All_Access\n***** Apply Online: https://stuedu.org/sfg/7894561123*****\n\"",
                   "log" => {
            "file" => {
                "path" => "/tmp/log.csv"
            }
        },
              "@version" => "1",
                  "host" => {
            "name" => "22966421108c"
        },
        "student_number" => "778890",
             "join-date" => "30-12-2022",
         "other-details" => "[S-Name] Frank Van Puffelen.\n[Area/Pin] San Francisco, CA.\n[Locality/Status/Population] Northern California.\n[City Zipcode#] 012345678\n[Relevant Details] Center for flagship.\n[Department] Staff services.\n[Peninsula] Pacific Ocean.\n[Services and resources] ADA information.\n[Created By] Alvaro Alfaro.\nActivities:Transportation,Parking:All_Access\n***** Apply Online: https://stuedu.org/sfg/7894561123*****\n",
           "school_code" => "222222",
           "school-type" => "Private",
                  "tags" => [
            [0] "multiline"
        ],
            "@timestamp" => 2023-07-03T15:26:06.501637778Z,
                 "event" => {
            "original" => "222222,778890,30-12-2022,frank.van,Private,\"[S-Name] Frank Van Puffelen.\n[Area/Pin] San Francisco, CA.\n[Locality/Status/Population] Northern California.\n[City Zipcode#] 012345678\n[Relevant Details] Center for flagship.\n[Department] Staff services.\n[Peninsula] Pacific Ocean.\n[Services and resources] ADA information.\n[Created By] Alvaro Alfaro.\nActivities:Transportation,Parking:All_Access\n***** Apply Online: https://stuedu.org/sfg/7894561123*****\n\""
        }
    }
    {
                "stu-id" => "frank.van",
               "message" => "333333,778890,30-12-2022,frank.van,Private,\"[S-Name] Frank Van Puffelen.\n[Area/Pin] San Francisco, CA.\n[Locality/Status/Population] Northern California.\n[City Zipcode#] 012345678\n[Relevant Details] Center for flagship.\n[Department] Staff services.\n[Peninsula] Pacific Ocean.\n[Services and resources] ADA information.\n[Created By] Alvaro Alfaro.\nActivities:Transportation,Parking:All_Access\n***** Apply Online: https://stuedu.org/sfg/7894561123*****\n\"",
                   "log" => {
            "file" => {
                "path" => "/tmp/log.csv"
            }
        },
              "@version" => "1",
                  "host" => {
            "name" => "22966421108c"
        },
        "student_number" => "778890",
             "join-date" => "30-12-2022",
         "other-details" => "[S-Name] Frank Van Puffelen.\n[Area/Pin] San Francisco, CA.\n[Locality/Status/Population] Northern California.\n[City Zipcode#] 012345678\n[Relevant Details] Center for flagship.\n[Department] Staff services.\n[Peninsula] Pacific Ocean.\n[Services and resources] ADA information.\n[Created By] Alvaro Alfaro.\nActivities:Transportation,Parking:All_Access\n***** Apply Online: https://stuedu.org/sfg/7894561123*****\n",
           "school_code" => "333333",
           "school-type" => "Private",
                  "tags" => [
            [0] "multiline"
        ],
            "@timestamp" => 2023-07-03T15:26:07.972278609Z,
                 "event" => {
            "original" => "333333,778890,30-12-2022,frank.van,Private,\"[S-Name] Frank Van Puffelen.\n[Area/Pin] San Francisco, CA.\n[Locality/Status/Population] Northern California.\n[City Zipcode#] 012345678\n[Relevant Details] Center for flagship.\n[Department] Staff services.\n[Peninsula] Pacific Ocean.\n[Services and resources] ADA information.\n[Created By] Alvaro Alfaro.\nActivities:Transportation,Parking:All_Access\n***** Apply Online: https://stuedu.org/sfg/7894561123*****\n\""
        }
    }