I need to extract data from a page in redmine formatted in textile in order to set variables in a bash script. I want to use AWK to do so. Here is the content of the page:
$ cat mypage.redmine
h1. My Awesome page
h2. A section
hello
there
table(metadata).
|TITLE |An awesome title! |
|VERSIONNUM |1 |
|VERSIONDATE |2017-06-16 |
|AUTHOR |Me! |
table(otherthing).
|RECORD1 |A value. |
|RECORD2 |Another value |
h2. Another section
We say things.
The information of interest is in the table of class "metadata".
I would like the output to be:
TITLE="An awesome title!"
VERSIONNUM="1"
VERSIONDATE="2017-06-16"
AUTHOR="Me!"
... so that I can directly call declare
in my shell script on this output to set variables TITLE
, VERSIONNUM
, etc.
Here is what I got so far:
$ awk 'BEGIN { FS = "|" } { if(NF == 4) print $2 "=" "\"" $3 "\"" }' < mypage.redmine
Which renders:
TITLE ="An awesome title! "
VERSIONNUM ="1 "
VERSIONDATE ="2017-06-16 "
AUTHOR ="Me! "
RECORD1 ="A value. "
RECORD2 ="Another value "
Which is not what I look for... I need the one liner to work only on the table(metadata) and to get rid of trailing spaces.
How can I do so?
Edit: I forgot the quotes in the rendering of my attempt.
There are two things to address here: selecting the range of lines and picking the proper data from within those lines.
To extract lines between two patterns is addressed in How to select lines between two patterns? and has an easy solution by using the Print lines between PAT1 and PAT2 - not including PAT1 and PAT2 solution:
awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' file
In your case, from "table(metadata)" up to a blank line:
$ awk '/table\(metadata\)/ {flag=1; next} /^$/ {flag=0} flag' file
|TITLE |An awesome title! |
|VERSIONNUM |1 |
|VERSIONDATE |2017-06-16 |
|AUTHOR |Me! |
Then, you want to remove extra characters. For this, I followed your same approach: set |
as the FS and print based on that:
awk -F"|" '{sub(/[[:space:]]*$/,"",$2);sub(/[[:space:]]*$/,"",$3); printf "%s=\"%s\"\n", $2, $3}' file
That is, extract the 2nd and 3rd field, remove all trailing spaces with sub(/[[:space:]]*$/, "", field)
and finally print a line with the desired format.
Note the use of [[:space:]]
to match either tabs or spaces. It is the POSIX character class equivalent for \s
, which you could use with GNU-awk.
All together:
$ awk -F"|" '/table\(metadata\)/ {flag=1; next} /^$/ {flag=0} flag {sub(/[[:space:]]*$/,"",$2);sub(/[[:space:]]*$/,"",$3); printf "%s=\"%s\"\n", $2, $3}' file
TITLE="An awesome title!"
VERSIONNUM="1"
VERSIONDATE="2017-06-16"
AUTHOR="Me!"
Or put it in a script sc.awk
:
BEGIN{FS="|"}
/table\(metadata\)/ {flag=1; next}
/^$/ {flag=0}
flag {
sub(/[[:space:]]*$/,"",$2);
sub(/[[:space:]]*$/,"",$3);
printf "%s=\"%s\"\n", $2, $3
}
And execute it with:
awk -f sc.awk file