I have 1 text file, which is test1.txt.
text1.txt contain as following:
Input:
##[A1] [B1] [T1] [V1] [T2] [V2] [T3] [V3] [T4] [V4]## --> headers
1 1000 0 100 10 200 20 300 30 400
40 500 50 600 60 700 70 800
1010 0 101 10 201 20 301 30 401
40 501 50 601
2 1000 0 110 15 210 25 310 35 410
45 510 55 610 65 710
1010 0 150 10 250 20 350 30 450
40 550
Condition:
A1 and B1 -> for each A1 + (B1 + [Tn + Vn])
A1 should be in 1 column.
B1 should be in 1 column.
T1,T2,T3 and T4 should be in 1 column.
V1,V2,V3 and V4 should be in 1 column.
How do I sort it become like below?
Desire Output:
## A1 B1 Tn Vn ## --> headers
1 1000 0 100
10 200
20 300
30 400
40 500
50 600
60 700
70 800
1010 0 101
10 201
20 301
30 401
40 501
50 601
2 1000 0 110
15 210
25 310
35 410
45 510
55 610
65 710
1010 0 150
10 250
20 350
30 450
40 550
Here is my current code:
First Attempt:
Input
cat test1.txt | awk ' { a=$1 b=$2 } { for(i=1; i<=5; i=i+1) { t=substr($0,11+i*10,5) v=substr($0,16+i*10,5) if( t ~ /^\ +[0-9]+$/ || t ~ /^[0-9]+$/ || t ~ /^\ +[0-9]+\ +$/ ){ printf "%7s %7d %8d %8d \n",a,b,t,v } }}' | less
Output:
1 1000 400 0
40 500 800 0
1010 0 401 0
2 1000 410 0
1010 0 450 0
I'm trying using simple awk command, but still can't get the result.
Can anyone help me on this?
Thanks,
Am
This is a rather tricky problem that can be handled a number of ways. Whether bash
, perl
or awk
, you will need to handle to number of fields in some semi-generic way instead of just hardcoding values for your example.
Using bash, so long as you can rely on an even-number of fields in all lines (except for the lines with the sole initial value (e.g. 1010
), you can accommodate the number of fields is a reasonably generic way. For the lines with 1, 2
, etc.. you know your initial output will contain 4-fields
. For lines with 1010
, etc.. you know the output will contain an initial 3-fields
. For the remaining values you are simply outputting pairs.
The tricky part is handling the alignment. Here is where printf
which allows you to set the field-width with a parameter using the form "%*s"
where the conversion specifier expects the next parameter to be an integer
value specifying the field-width followed by a parameter for the string conversion itself. It takes a little gymnastics, but you could do something like the following in bash itself:
(note: edit to match your output header format)
#!/bin/bash
declare -i nfields wd=6 ## total no. fields, printf field-width modifier
while read -r line; do ## read each line (preserve for header line)
arr=($line) ## separate into array
first=${arr[0]} ## check for '#' in first line for header
if [ "${first:0:1}" = '#' ]; then
nfields=$((${#arr[@]} - 2)) ## no. fields in header
printf "## A1 B1 Tn Vn ## --> headers\n" ## new header
continue
fi
fields=${#arr[@]} ## fields in line
case "$fields" in
$nfields ) ## fields -eq nfiles?
cnt=4 ## handle 1st 4 values in line
printf " "
for ((i=0; i < cnt; i++)); do
if [ "$i" -eq '2' ]; then
printf "%*s" "5" "${arr[i]}"
else
printf "%*s" "$wd" "${arr[i]}"
fi
done
echo
for ((i = cnt; i < $fields; i += 2)); do ## handle rest
printf "%*s%*s%*s\n" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}"
done
;;
$((nfields - 1)) ) ## one less than nfields
cnt=3 ## handle 1st 3 values
printf " %*s%*s" "$wd" " "
for ((i=0; i < cnt; i++)); do
if [ "$i" -eq '1' ]; then
printf "%*s" "5" "${arr[i]}"
else
printf "%*s" "$wd" "${arr[i]}"
fi
done
echo
for ((i = cnt; i < $fields; i += 2)); do ## handle rest
if [ "$i" -eq '0' ]; then
printf "%*s%*s%*s\n" "$((wd+1))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}"
else
printf "%*s%*s%*s\n" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}"
fi
done
;;
* ) ## all other lines format as pairs
for ((i = 0; i < $fields; i += 2)); do
printf "%*s%*s%*s\n" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}"
done
;;
esac
done
Rather than reading from a file, just use redirection to redirect the input file to your script (if you want to just provide a filename, then redirect the file to feed the output while read...
loop)
Example Use/Output
$ bash text1format.sh <dat/text1.txt
## A1 B1 Tn Vn ## --> headers
1 1000 0 100
10 200
20 300
30 400
40 500
50 600
60 700
70 800
1010 0 101
10 201
20 301
30 401
40 501
50 601
2 1000 0 110
15 210
25 310
35 410
45 510
55 610
65 710
1010 0 150
10 250
20 350
30 450
40 550
As between awk
and bash
, awk
will generally be faster, but here with formatted output, it may be closer than usual. Look things over and let me know if you have questions.