elasticsearchlogstashkibanafilebeatlogstash-forwarder

Facing Issue while sending data from Filebeats to Multiple Logstash files


To be Precise, I am handling a log file which has almost millions of records. Since it is a Billing Summary log, Customer Information will be recorded in no particular order.
I am Using customized GROK Patterns and logstash XML filter plugin to extract the data which would be sufficient to track. To track the The Individual Customer Activities, I am using "Customer_ID" as a unique key. So Even though I am using Multiple Logstash Files, and Multiple GROK Patterns, All his Information could be bounded/Aggregated using his "Customer_ID" (Unique Key)

here is my sample of log file,
7-04-2017 08:49:41 INFO abcinfo (ABC_RemoteONUS_Processor.java52) - Customer_Entry :::<?xml version="1.0" encoding="UTF-8"?><ns2:ReqListAccount xmlns:ns2="http://vcb.org/abc/schema/"/"><Head msgId="1ABCDEFegAQtQOSuJTEs3u" orgId="ABC" ts="2017-04-27T08:49:51+05:30" ver="1.0"/><Cust id="ABCDVFR233cd662a74a229002159220ce762c" note="Account CUST Listing" refId="DCVD849512576821682" refUrl="http://www.ABC.org.in/" ts="2017-04-27T08:49:51+05:30"

My Grok Pattern,

grok {
patterns_dir => "D:\elk\logstash-5.2.1\vendor\bundle\jruby\1.9\gems\logstash-patterns-core-4.0.2\patterns"
match => [ "message" , "%{DATESTAMP:datestamp} %{LOGLEVEL:Logseverity}\s+%{WORD:ModuleInfo} \(%{NOTSPACE:JavaClass}\)%{ABC:Customer_Init}%{GREEDYDATA:Cust}"]add_field => { "Details" => "Request" }remove_tag => ["_grokparsefailure"]}  

My Customized pattern which is stored inside Pattern_dir,

ABC ( - Customer_Entry :::)

My XML Filter plugin,

xml {
source => "Cust"
store_xml =>false
xpath => [
  "//Head/@ts", "Cust_Req_time",
  "//Cust/@id", "Customer_ID",
  "//Cust/@note", "Cust_note", ]
  }  

So whatever the details comes behind ** - Customer_Entry :::**, I will be able to extract it using XML Plugin Filter (will be stored similar to multi-line codec). I have written 5 different Logstash files to extract different Activities of Customer with 5 different Grok Patterns. Which will tell,

1.Customer_Entry
2.Customer_Purchase
3.Customer_Last_Purchase
4.Customer_Transaction
5.Customer_Authorization

All the above Grok patterns has different set of Information, which will be grouped by Customer_ID as I said earlier.

I can able to Extract the Information and Visualize It clearly in Kibana without any flaw by using my Customized pattern with different log files.

Since I have 100's of Log files each and everyday to put into logstash, I opted for Filebeats, but Filebeats run with only one port "5044". I tried to run with 5 different ports for 5 different logstash files but that was not working, Only one logstash file of 5 was getting loaded rest of the config files were being Idle.
here is my sample filebeat output.prospector,

output.logstash:
hosts: ["localhost:5044"]

output.logstash:
hosts: ["localhost:5045"]

output.logstash:
hosts: ["localhost:5046"]

I couldn't add all the grok patterns in one logstash config file, because XML Filter plugin takes the source "GREEDYDATA". in such case I will be having 5 different Source=> for 5 different Grok pattern. I even tried that too but that was not working.

Looking for better approach.


Solution

  • Sounds like you're looking for scale, with parallel ingestion. As it happens, File beats supports something called load-balancing which sounds like what you're looking for.

    output.logstash:
      hosts: [ "localhost:5044", "localhost:5045", "localhost:5046" ]
      loadbalance: true
    

    That's for the outputs. Though, I believe you wanted multithreading on the input. FileBeats s supposed to track all files specified in the prospector config, but you've found limits. Globbing or specifying a directory will single-thread the files in that glob/directory. If your file-names support it, creative-globbing may get you better parallelism by defining multiple globs in the same directory.

    Assuming your logs are coming in by type:

    - input_type: log
      paths:
        - /mnt/billing/*entry.log
        - /mnt/billing/*purchase.log
        - /mnt/billing/*transaction.log
    

    Would enable prospectors on multiple threads reading in parallel files here.

    If your logs were coming in with random names, you could use a similar setup

    - input_type: log
      paths:
        - /mnt/billing/a*
        - /mnt/billing/b*
        - /mnt/billing/c*
        [...]
        - /mnt/billing/z*
    

    If you are processing lots of files with unique names that never repeat, adding the clean_inactive config config-option to your prospectors will keep your FileBeat running fast.

    - input_type: log
      ignore_older: 18h
      clean_inactive: 24h
      paths:
        - /mnt/billing/a*
        - /mnt/billing/b*
        - /mnt/billing/c*
        [...]
        - /mnt/billing/z*
    

    Which will remove all state for files older than 24 hours old, and won't bother processing any file more than 18 hours old.