Goal: I want compare two Suricata rule files and comment out the same lines (alerts "SIDs") from file1 in file2 unless it already commented out. I understand there is better way to do this with the Suricata threshold file but I unfortunately don't have that luxury beyond what I can explain here. This is to facilitate updating the rules where the rule may get updated but the commonality the "SID" will be the same across both files.
I'm not sure where to start.
Sample file1 text:
alert $home_net any > $External_net any (msg: example; content: something; sid: 12345; rev:1)
#alert $home_net any > $External_net any (msg: example; content: something; sid: 67895; rev:1)
alert $home_net any > $External_net any (msg: example; content: something; sid: 18975; rev:1)
Sample file2 text:
alert $home_net any > $External_net any (msg: example; content: something; sid: 12345; rev:1)
<insert #>alert $home_net any > $External_net any (msg: example; content: something; sid: 67895; rev:1)
alert $home_net any > $External_net any (msg: example; content: something; sid: 18975; rev:1)
Edit: Provided solution works with initial sample data I provided above however, it doesn't work with actual signatures. So I'm providing actual signatures below. Also rules may or may not have white-space between each line.
Sample file1 text:
#alert tcp $EXTERNAL_NET any -> $HOME_NET 2200 (msg:"ET EXPLOIT CA BrightStor ARCserve Mobile Backup LGSERVER.EXE Heap Corruption"; flow:established,to_server; content:"|4e 3d 2c 1b|"; depth:4; isdataat:2891,relative; reference:cve,2007-0449; reference:url,doc.emergingthreats.net/bin/view/Main/2003369; classtype:attempted-admin; sid:2003369; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)
alert udp $EXTERNAL_NET any -> $HOME_NET 111 (msg:"ET EXPLOIT Computer Associates Brightstor ARCServer Backup RPC Server (Catirpc.dll) DoS"; content:"|00 00 00 00|"; offset:4; depth:4; content:"|00 00 00 03|"; distance:8; within:4; content:"|00 00 00 08|"; distance:0; within:4; content:"|00 00 00 00|"; distance:0; within:4; content:"|00 00 00 00|"; distance:4; within:4; content:"|00 00 00 00 00 00 00 00|"; distance:8; within:32; reference:url,www.milw0rm.com/exploits/3248; reference:url,doc.emergingthreats.net/bin/view/Main/2003370; classtype:attempted-dos; sid:2003370; rev:3; metadata:created_at 2010_07_30, updated_at 2020_08_20;)
#alert tcp $EXTERNAL_NET any -> $HOME_NET 1900 (msg:"ET EXPLOIT Computer Associates Mobile Backup Service LGSERVER.EXE Stack Overflow"; flow:established,to_server; content:"0000033000"; depth:10; isdataat:1000,relative; reference:url,www.milw0rm.com/exploits/3244; reference:url,doc.emergingthreats.net/bin/view/Main/2003378; classtype:attempted-admin; sid:2003378; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)
Sample file2 text:
#alert tcp $EXTERNAL_NET any -> $HOME_NET 2200 (msg:"ET EXPLOIT CA BrightStor ARCserve Mobile Backup LGSERVER.EXE Heap Corruption"; flow:established,to_server; content:"|4e 3d 2c 1b|"; depth:4; isdataat:2891,relative; reference:cve,2007-0449; reference:url,doc.emergingthreats.net/bin/view/Main/2003369; classtype:attempted-admin; sid:2003369; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)
alert udp $EXTERNAL_NET any -> $HOME_NET 111 (msg:"ET EXPLOIT Computer Associates Brightstor ARCServer Backup RPC Server (Catirpc.dll) DoS"; content:"|00 00 00 00|"; offset:4; depth:4; content:"|00 00 00 03|"; distance:8; within:4; content:"|00 00 00 08|"; distance:0; within:4; content:"|00 00 00 00|"; distance:0; within:4; content:"|00 00 00 00|"; distance:4; within:4; content:"|00 00 00 00 00 00 00 00|"; distance:8; within:32; reference:url,www.milw0rm.com/exploits/3248; reference:url,doc.emergingthreats.net/bin/view/Main/2003370; classtype:attempted-dos; sid:2003370; rev:3; metadata:created_at 2010_07_30, updated_at 2020_08_20;)
< insert #>alert tcp $EXTERNAL_NET any -> $HOME_NET 1900 (msg:"ET EXPLOIT Computer Associates Mobile Backup Service LGSERVER.EXE Stack Overflow"; flow:established,to_server; content:"0000033000"; depth:10; isdataat:1000,relative; reference:url,www.milw0rm.com/exploits/3244; reference:url,doc.emergingthreats.net/bin/view/Main/2003378; classtype:attempted-admin; sid:2003378; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)
First, examine the first file and find out what sids are commented out:
sed -En '/^#/ s/.*sid:([0-9]+).*/\1/p' file1
The above command prints out the sid of the lines that begin with a #
, one sid per line. Now let's aggregate those lines and build a list of sids separated with |
:
sed -En '/^#/ s/.*sid:([0-9]+).*/\1/p' file1 | paste -sd '|'
Fine, now we have sid1|sid2|...|sidN. As it is written, this can be used as a regex to identify the lines in file2 that need to be commented out. Let's put this regex in a variable:
sid_regex=$(sed -En '/^#/ s/.*sid:([0-9]+).*/\1/p' file1 | paste -sd '|')
Now, we can modify file2 so that every line 1) with a sid that matches the regex and 2) that doesn't already begin with #
is commented out:
sed -E "/sid:($sid_regex);/ s/^[^#]/#&/" file2 > file2.new
Voilà! To sum it up:
$ sid_regex=$(sed -En '/^#/ s/.*sid:([0-9]+).*/\1/p' file1 | paste -sd '|')
$ sed -E "/sid:($sid_regex);/ s/^[^#]/#&/" file2 > file2.new
[update] You have so many commented lines that the resulting huge regex makes the command too big ("Argument list too long"). Let us try another approach: instead of building a one-line sed program with a gigantic regex, we will build a multi-line sed program, with one line for each sid.
This first sed command generates the second sed program:
sed -En '/^#/ s|.*(sid:[0-9]+;).*|/\1/ s/^[^#]/#\&/|p' file1
The result should be something like:
/sid:111;/ s/^[^#]/#&/
/sid:222;/ s/^[^#]/#&/
...
/sid:123456;/ s/^[^#]/#&/
Now we feed a second sed with that program in order to process file2:
sed -En '/^#/ s|.*(sid:[0-9]+;).*|/\1/ s/^[^#]/#\&/|p' file1 | sed -f - file2 > file2.new