linuxbashsortingawk

bash + how to sort log according to specific field


The goal is to extract the numeric part of cost:xxxms and sort the entire log lines

example of log

/var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:377ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:507ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:337ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:407ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data

example of expected output

/var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:337ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:377ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:407ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:507ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data

we have tried with the following commands but without success

grep 'cost:' log_file.txt | awk -F'cost:' '{print $0, $2}' | awk -F'ms' '{print $0, $1}' | sort -t' ' -k2,2nr
grep 'cost:' log_file.txt | awk -F'cost:' '{gsub("ms", "", $2); print $0, $2}' | sort -t' ' -k2,2nr
grep 'cost:' log_file.txt | awk -F'cost:' '{gsub("ms", "", $2); print $0, $2}' | sort -t' ' -k2,2nr | cut -d' ' -f1-

Solution

  • Using any awk, sort, and cut to implement a Decorate/Sort/Undecorate approach if there can be varying numbers of :s before cost: in your input:

    $ awk -v OFS='\t' 'match($0,/ cost:[0-9]+ms /) {print substr($0,RSTART+6)+0, $0}' file |
        sort -k1,1n | cut -f2-
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:95ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:337ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:377ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:407ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:507ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
    

    The above was run on this sample input, with an additional cost:95ms line added to the end of the OPs posted sample input as that's necessary to test numeric instead of alphabetic sorting of the cost numbers:

    $ cat file
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:377ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:507ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:337ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:407ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data
    /var/log/hadoop/hdfs/hadoop-hdfs-datanode-datanode01.star.com.log.7:2025-04-24 11:56:57,334 WARN  datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:95ms (threshold=300ms), volume=/data/sdg/hadoop/hdfs/data