javaregexregex-lookaroundsregex-negationrubular

Regex to handle all Multiline exception in rubular fluentd


I designed the regex to match the all multiline exception or warning message field for fluentd parser in rubular format as below

(SLF4J:\s.*|[a-zA-z_]*\..*\.*\s.*\s.*|Caused\sby:\s|\s+at\s.*|\s+\.\.\. (\d)+ more)

It matches unnecessary fields.

I want to match all start of exception or warning multiline. In short: The most recent multiline will be read from the beginning of the file unitl it gets a next line as JSON.JSON always starts with {" togather. when we see lines begings with {" we will stop reading multiline

one regex for both the cases or 2 regex for both the cases is fine

Demo link

regex is available at: https://rubular.com/r/O26Wm6mc7z51re

regex is available at: https://rubular.com/r/v6Q7iwZqmNDAAx

Test Strings is :

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more
{"log_timestamp": "2021-02-18T11:33:23.114+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled)", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "PeerState set to LOOKING"}
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.intam.svc.cluster.local"}
java.net.UnknownHostException: zk-2.zk-headless.intam.svc.cluster.local
        at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
        at java.net.InetAddress.getAllByName(InetAddress.java:1193)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at java.net.InetAddress.getByName(InetAddress.java:1077)
        at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)
        at org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:764)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:699)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:618)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
        at java.lang.Thread.run(Thread.java:748)
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.sxc.svc.cluster.local"}

Expected Match : For demo1: https://rubular.com/r/O26Wm6mc7z51re

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more

For demo2 :https://rubular.com/r/v6Q7iwZqmNDAAx

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type 

Solution

  • You might get both parts using a single pattern with a capture group and a backreference

    ^(SLF4J:|java\.lang\.InterruptedException:).*(?:\R(?!\1|{).*)*
    

    The pattern matches:

    Regex demo

    See the rubular match for the first part and the second part.

    Note that in Java to double the backslashes

    String regex = "^(SLF4J:|java\\.lang\\.InterruptedException:).*(?:\\R(?!\\1|\\{).*)*";
    

    To not cross SLF4J or different types of Exceptions denoted as dot separated strings at the start of the string:

    ^(?:SLF4J:|\w+(?:\.\w+)+).*(?:\R(?!(?:SLF4J:|\w+(?:\.\w+)+)|{).*)*
    

    Regex demo