bashawkredminetextile

How to extract tabular data from a redmine/textile page with AWK in order to declare variables in a shell script?


I need to extract data from a page in redmine formatted in textile in order to set variables in a bash script. I want to use AWK to do so. Here is the content of the page:

$ cat mypage.redmine
h1. My Awesome page

h2. A section

hello
there

table(metadata).
|TITLE       |An awesome title! |
|VERSIONNUM  |1                 |
|VERSIONDATE |2017-06-16        |
|AUTHOR      |Me!               |

table(otherthing).
|RECORD1     |A value.      |
|RECORD2     |Another value |

h2. Another section

We say things.

The information of interest is in the table of class "metadata".

I would like the output to be:

TITLE="An awesome title!"
VERSIONNUM="1"
VERSIONDATE="2017-06-16"
AUTHOR="Me!"

... so that I can directly call declare in my shell script on this output to set variables TITLE, VERSIONNUM, etc.

Here is what I got so far:

$ awk 'BEGIN { FS = "|" } { if(NF == 4) print $2 "=" "\"" $3 "\"" }' < mypage.redmine

Which renders:

TITLE       ="An awesome title! "
VERSIONNUM  ="1                 "
VERSIONDATE ="2017-06-16        "
AUTHOR      ="Me!               "
RECORD1     ="A value.      "
RECORD2     ="Another value "

Which is not what I look for... I need the one liner to work only on the table(metadata) and to get rid of trailing spaces.

How can I do so?

Edit: I forgot the quotes in the rendering of my attempt.


Solution

  • There are two things to address here: selecting the range of lines and picking the proper data from within those lines.

    To extract lines between two patterns is addressed in How to select lines between two patterns? and has an easy solution by using the Print lines between PAT1 and PAT2 - not including PAT1 and PAT2 solution:

    awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' file
    

    In your case, from "table(metadata)" up to a blank line:

    $ awk '/table\(metadata\)/ {flag=1; next} /^$/ {flag=0} flag' file
    |TITLE       |An awesome title! |
    |VERSIONNUM  |1                 |
    |VERSIONDATE |2017-06-16        |
    |AUTHOR      |Me!               |
    

    Then, you want to remove extra characters. For this, I followed your same approach: set | as the FS and print based on that:

    awk -F"|" '{sub(/[[:space:]]*$/,"",$2);sub(/[[:space:]]*$/,"",$3); printf "%s=\"%s\"\n", $2, $3}' file
    

    That is, extract the 2nd and 3rd field, remove all trailing spaces with sub(/[[:space:]]*$/, "", field) and finally print a line with the desired format.

    Note the use of [[:space:]] to match either tabs or spaces. It is the POSIX character class equivalent for \s, which you could use with GNU-awk.

    All together:

    $ awk -F"|" '/table\(metadata\)/ {flag=1; next} /^$/ {flag=0} flag {sub(/[[:space:]]*$/,"",$2);sub(/[[:space:]]*$/,"",$3); printf "%s=\"%s\"\n", $2, $3}' file
    TITLE="An awesome title!"
    VERSIONNUM="1"
    VERSIONDATE="2017-06-16"
    AUTHOR="Me!"
    

    Or put it in a script sc.awk:

    BEGIN{FS="|"}
    /table\(metadata\)/ {flag=1; next}
    /^$/ {flag=0}
    flag {
       sub(/[[:space:]]*$/,"",$2);
       sub(/[[:space:]]*$/,"",$3);
       printf "%s=\"%s\"\n", $2, $3
    }
    

    And execute it with:

    awk -f sc.awk file