linuxperlawksed

How to sort or rearrange numbers from multiple column into multiple row [fixed into 4 columns]?


I have 1 text file, which is test1.txt.

text1.txt contain as following:
Input:

##[A1] [B1] [T1]  [V1] [T2]  [V2] [T3]  [V3] [T4]  [V4]## --> headers
    1  1000    0   100   10   200   20   300   30   400
              40   500   50   600   60   700   70   800
       1010    0   101   10   201   20   301   30   401
              40   501   50   601  
    2  1000    0   110   15   210   25   310   35   410
              45   510   55   610   65   710
       1010    0   150   10   250   20   350   30   450
              40   550  

Condition:
A1 and B1 -> for each A1 + (B1 + [Tn + Vn])
A1 should be in 1 column.
B1 should be in 1 column.
T1,T2,T3 and T4 should be in 1 column.
V1,V2,V3 and V4 should be in 1 column.

How do I sort it become like below?
Desire Output:

##   A1    B1   Tn    Vn ## --> headers

      1  1000    0   100
                10   200
                20   300
                30   400
                40   500
                50   600
                60   700
                70   800
         1010    0   101
                10   201
                20   301
                30   401
                40   501
                50   601
      2  1000    0   110
                15   210
                25   310
                35   410
                45   510
                55   610
                65   710
         1010    0   150
                10   250
                20   350
                30   450
                40   550

Here is my current code:
First Attempt:
Input

cat test1.txt | awk ' { a=$1 b=$2 } { for(i=1; i<=5; i=i+1) { t=substr($0,11+i*10,5) v=substr($0,16+i*10,5) if( t ~ /^\ +[0-9]+$/ || t ~ /^[0-9]+$/ || t ~ /^\ +[0-9]+\ +$/ ){ printf "%7s %7d %8d %8d \n",a,b,t,v } }}' | less

Output:

      1    1000      400        0 
     40     500      800        0 
   1010       0      401        0 
      2    1000      410        0 
   1010       0      450        0

I'm trying using simple awk command, but still can't get the result.
Can anyone help me on this?

Thanks,
Am


Solution

  • This is a rather tricky problem that can be handled a number of ways. Whether bash, perl or awk, you will need to handle to number of fields in some semi-generic way instead of just hardcoding values for your example.

    Using bash, so long as you can rely on an even-number of fields in all lines (except for the lines with the sole initial value (e.g. 1010), you can accommodate the number of fields is a reasonably generic way. For the lines with 1, 2, etc.. you know your initial output will contain 4-fields. For lines with 1010, etc.. you know the output will contain an initial 3-fields. For the remaining values you are simply outputting pairs.

    The tricky part is handling the alignment. Here is where printf which allows you to set the field-width with a parameter using the form "%*s" where the conversion specifier expects the next parameter to be an integer value specifying the field-width followed by a parameter for the string conversion itself. It takes a little gymnastics, but you could do something like the following in bash itself:

    (note: edit to match your output header format)

    #!/bin/bash
    
    declare -i nfields wd=6     ## total no. fields, printf field-width modifier
    
    while read -r line; do      ## read each line  (preserve for header line)
        arr=($line)             ## separate into array
        first=${arr[0]}         ## check for '#' in first line for header
        if [ "${first:0:1}" = '#' ]; then
            nfields=$((${#arr[@]} - 2))     ## no. fields in header
            printf "##   A1    B1   Tn    Vn ## --> headers\n"  ## new header
            continue
        fi
        fields=${#arr[@]}                   ## fields in line
        case "$fields" in
            $nfields )                      ## fields -eq nfiles?
                cnt=4                       ## handle 1st 4 values in line
                printf " "
                for ((i=0; i < cnt; i++)); do
                    if [ "$i" -eq '2' ]; then
                        printf "%*s" "5" "${arr[i]}"
                    else
                        printf "%*s" "$wd" "${arr[i]}"
                    fi
                done
                echo
                for ((i = cnt; i < $fields; i += 2)); do    ## handle rest
                    printf "%*s%*s%*s\n" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}"
                done
                ;;
            $((nfields - 1)) )              ## one less than nfields
                cnt=3                       ## handle 1st 3 values
                printf " %*s%*s" "$wd" " "
                for ((i=0; i < cnt; i++)); do
                    if [ "$i" -eq '1' ]; then
                        printf "%*s" "5" "${arr[i]}"
                    else
                        printf "%*s" "$wd" "${arr[i]}"
                    fi
                done
                echo
                for ((i = cnt; i < $fields; i += 2)); do    ## handle rest
                    if [ "$i" -eq '0' ]; then
                        printf "%*s%*s%*s\n" "$((wd+1))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}"
                    else
                        printf "%*s%*s%*s\n" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}"
                    fi
                done
                ;;
            * )     ## all other lines format as pairs
                for ((i = 0; i < $fields; i += 2)); do
                    printf "%*s%*s%*s\n" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}"
                done
                ;;
        esac
    done
    

    Rather than reading from a file, just use redirection to redirect the input file to your script (if you want to just provide a filename, then redirect the file to feed the output while read... loop)

    Example Use/Output

    $ bash text1format.sh <dat/text1.txt
    ##   A1    B1   Tn    Vn ## --> headers
          1  1000    0   100
                    10   200
                    20   300
                    30   400
                    40   500
                    50   600
                    60   700
                    70   800
             1010    0   101
                    10   201
                    20   301
                    30   401
                    40   501
                    50   601
          2  1000    0   110
                    15   210
                    25   310
                    35   410
                    45   510
                    55   610
                    65   710
             1010    0   150
                    10   250
                    20   350
                    30   450
                    40   550
    

    As between awk and bash, awk will generally be faster, but here with formatted output, it may be closer than usual. Look things over and let me know if you have questions.