linuxbashshellunixsh

How can extract data single line from file and other process in each that line?


i have log file with contents inside. i.e file name is convert.20231010.log. The content inside file as below

2024-05-17 00:14:02.447 Success ABCXYZ15 on hard disk
2024-05-17 00:14:02.447 Fail at /home/category1/sub1/ABCXYZ01 00054 is not found
2024-05-17 00:14:02.447 Success ABCXYZ16 00030 on hard disk
2024-05-17 00:14:02.447 Fail at /home/category1/sub2/ABCXYZ02 is not found
2024-05-17 00:14:02.447 Success ABCXYZ17 000110 on hard disk

I want to check file and do some work on that.

a) find line in log with condition "Fail" and "is not found", get path and file name.

grep -E 'Fail.*is not found' convert.20231010.log

found 2 line with right condition

2024-05-17 00:14:02.447 Fail at /home/category1/sub1/ABCXYZ01 00054 is not found
2024-05-17 00:14:02.447 Fail at /home/category1/sub2/ABCXYZ02 is not found

b) pickup file from path and check file content

the path and file name will pickup from category2 based on founding at step a

/home/category2/sub1/ABCXYZ01 00054
/home/category2/sub2/ABCXYZ02

c) move file if content is good

Based on step b, read content file "ABCXYZ01 00054" and "ABCXYZ02" . If content is good , move file to category3. The content checking inside file is "CARD", if really have value, then file is good. If "CARD" is blank or empty, don't move that file.

ID=ABCXYZ01 00054
NAME=JOIN
DEPT=ACC
CARD=1234
LOC=NY

ID=ABCXYZ02
NAME=CINDY
DEPT=LOG
CARD=6789
LOC=LA

mv -f '/home/category2/sub1/ABCXYZ01 00054' '/home/category3/sub1/'
mv -f '/home/category2/sub2/ABCXYZ02' '/home/category3/sub2/'

d) remove line in log file

this one will remove 2 line in log file at step a

sed -i '/Fail.*is not found/d' convert.20231010.log

How can i do with bash shell in linux and write new log from step from a to d ? I'm not familiar with shell, appreciate for your help.

#!/bin/bash
MYSELF=$0
MYNAME=`echo $0 | gawk '{print substr($0, match($0, /[^\/]+$/))}'`
MYNAME_NOEXT=`echo "$MYNAME" | gawk '{print gensub(/\.[^\.]*$/, "", "g")}'`
MYPATH=`echo $0 | gawk '{print substr($0, 1, match($0, /[^\/]+$/) - 1)}'`
if [ $MYPATH == "./" ]; then MYPATH="`pwd`/"; fi
cd $MYPATH

logfile="/home/category4/logs/"$MYNAME_NOEXT.`date +"%Y%m%d"`.log

(
if [ `ps -ef | grep -i "$MYNAME" | wc -l` -gt 4 ]
then 
    echo $(date +%Y-%m-%d) $(date +%H:%M:%S)" - Script '$MYNAME' is running. Force exit as re-entry not allowed."
    exit
fi

filename="convert".`date +"%Y%m%d"`.log
while read -r line; do
    linetext="$line"
    echo "single line in log - $linetext"
done 

echo $(date) " - Script '$MYNAME' is ended normally."
) >> $logfile 2>&1

Solution

  • Untested since I don't have and don't want to create the directory structure your script and input file would require for testing:

    Using GNU awk for the non-POSIX extensions of a 3rd arg to match() plus nextfile since the OP has now told us their log file is huge:

    #!/usr/bin/env bash
    
    while IFS= read -r file; do
        [[ -f "$file" ]] &&
        >&2 echo mv -f -- "$file" '/home/category3/sub1/'
    done < <(
        awk '
            NR == FNR {
                if ( match($0,/Fail at (.*) is not found/,a) ) {
                    ARGV[ARGC++] = a[1]
                }
                else {
                    print > "newlog.txt"
                }
                next
            }
            /^CARD=[0-9]/ {
                print FILENAME
                nextfile
            }
        ' convert.20231010.log
    )
    

    If you don't have GNU awk then do this instead with any awk which will just run a bit slower if your awk doesn't support nextfile (few other awks support a 3rd arg to match() but many support nextfile and the nextfile statement will do nothing if not supported so it's harmless):

    #!/usr/bin/env bash
    
    while IFS= read -r file; do
        [[ -f "$file" ]] &&
        >&2 echo mv -f -- "$file" '/home/category3/sub1/'
    done < <(
        awk '
            BEGIN {
                beg = "Fail at "
                end = " is not found"
                begLgth = length(beg)
                allLgth = length(beg end)
            }
            NR == FNR {
                if ( match($0,beg ".*" end) ) {
                    ARGV[ARGC++] = substr($0,RSTART+begLgth,RLENGTH-allLgth)
                }
                else {
                    print > "newlog.txt"
                }
                next
            }
            /^CARD=[0-9]/ {
                print FILENAME
                nextfile
            }
        ' convert.20231010.log
    )
    

    The [[ -f "$file" ]] && in the shell loop protects us from any issue if the same file name appears multiple times in the input, or /^CARD=[0-9]/ matches multiple times in 1 file and your awk doesn't support nextfile, and so we may have already moved it at that point in the shell loop.

    Original answer:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    regexp='Fail at (.*) is not found'
    while IFS= read -r line; do
        file=""
        [[ $line =~ $regexp ]] && file="${BASH_REMATCH[1]}"
        if [[ -f "$file" ]] && ! grep -q '^CARD=[0-9]' "$file"; then
            >&2 echo mv -f -- "$file" '/home/category3/sub1/'
        else
            printf '%s\n' "$line"
        fi
    done < convert.20231010.log > newlog.txt
    

    Remove the >&2 echo when you're done testing and sure the script will do what you want.