I want to make a Syslog parser for me to transform my Syslog, which has JSON information in a key=value format and the output file to be a .txt for me to import into FortiSIEM, which is really picky with the compatible syslogs, and I can't get to working parsing the "Original" syslog, thus this idea for me to simplify the log before it reaches the SIEM.
I have made some testing with PyParsing but I really don't know how to use it, My output file is being created but it's coming out blank
I think I can't share the syslog, so here a very rough example on how the syslog looks like:
<140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation - {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}
The script that I have come up with:
from pyparsing import Word, Suppress, alphanums, CharsNotIn, ZeroOrMore, Dict
# Define header
priority = Suppress("<") + Word(alphanums) + Suppress(">")
version = Word(alphanums) + Suppress(" ")
timestamp = CharsNotIn(" ") + Suppress(" ")
hostname = CharsNotIn(" ") + Suppress(" ")
appname = CharsNotIn(" ") + Suppress(" ")
procid = CharsNotIn(" ") + Suppress(" ")
msgid = CharsNotIn("\n")
header = priority + version + timestamp + hostname + appname + procid + msgid
# Define key-value pairs
key = Word(alphanums + "_")
value = CharsNotIn("\n")
pair = key + Suppress("=") + value
kv_pairs = Dict(pair + ZeroOrMore(Suppress(",") + pair))
# Define message format
message = header + Suppress(" ") + kv_pairs
# Open input and output files
with open("syslog.txt") as input_file, open("syslog_output.txt", "w") as output_file:
for line in input_file:
try:
# Convert to key-value format
parsed_message = message.parseString(line.strip())
kv_message = " ".join([f"{key}={value}" for key, value in parsed_message.items()])
# Write the message to the output file
output_file.write(parsed_message + "\n")
except Exception as e:
print(f"Failed to parse line: {line} with error: {e}")
continue
I get 2 Exceptions when I run the script and I printed the header
and message
outputs:
Failed to parse line: "Whole Syslog Text"
with error: Expected ' ', found '2022' (at char 7), (line:1, col:8)
Failed to parse line:
with error: Expected '<' (at char 0), (line:1, col:1)
Header: {Suppress:('<') W:(0-9A-Za-z) Suppress:('>') W:(0-9A-Za-z) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:(
)}
Message: {Suppress:('<') W:(0-9A-Za-z) Suppress:('>') W:(0-9A-Za-z) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:(
) Suppress:(' ') Dict:({W:(0-9A-Z_a-z) Suppress:('=') !W:(
) [{Suppress:(',') W:(0-9A-Z_a-z) Suppress:('=') !W:(
)}]...})}
I want to my output_file to look like this:
<140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation -
key=value
key=value
key=value
...
I need to have the header for me to identify which type of log is on FortiSIEM.
As I mentioned in the comment, pyparsing skips whitespace by default, so all the + Suppress(" ")
terms should be removed.
CharsNotIn
is an exception to the whitespace-skipping rule, I find Word(printables)
works better.
I replaced your timestamp
, hostname
, etc. terms with Word(printables)
, as this:
timestamp = Word(printables)
hostname = Word(printables)
appname = Word(printables)
procid = Word(printables)
msgid = rest_of_line
header = priority + version + timestamp + hostname + appname + '-' + procid + '-' + msgid
I used this code to test the parser:
header.run_tests("""\
<140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation - {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}
""")
and got this:
<140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation - {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}
['140', '1', '2022-05-02T08:31:22.478Z', 'platform', 'dataexport', '-', 'syslog_variation', '-', ' {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}']
You'll have to refine your definition of the key-value pairs. Use pyparsing's QuotedString('"')
for the key
, since it is some value in quotes. For value
, you'll need to be more careful to just read up to the next comma or }
, not all the way to the \n
at end of line.