This is basically a awk question but it is about processing data for the Moodle Gift format, thus the tags.
I want to format html code in a question (Moodle "test" activity) but I need to replace < and > with the corresponding entities, as these will be interpreted as "real" html, and not printed. However, I want to be able to type the question with regular code and post-process the file before importing it as gift into Moodle.
I thought awk would be the perfect tool to do this.
Say I have this (invalid as such) Moodle/gift question:
::q1::[html]This is a question about HTML:
<pre>
<p>some text</p>
</pre>
and some tag:<code><img></code>
{T}
What I want is a script that translates this into a valid gift question:
::q1::[html]This is a question about HTML:
<pre>
<p>some text</p>
</pre>
and some tag:<code><img></code>
{T}
key point: replace < and > with <
and >
when:
<pre>
-</pre>
bloc (assuming those tags are alone on a line)<code>
and </code>
, with arbitrary string in between.For the first part, I'm fine. I have a shell script calling awk (gawk, actually).
awk -f process_src2gift.awk $1.src >$1.gift
with process_src2gift.awk:
BEGIN { print "// THIS IS A GENERATED FILE !" }
{
if( $1=="<pre>" ) # opening a "code" block
{
code=1;
print $0;
}
else
{
if( $1=="</pre>" ) # closing a "code" block
{
code=0;
print $0;
}
else
{ # if "code block", replace < > by html entities
if( code==1 )
{
gsub(">","\\>");
gsub("<","\\<");
}
print $0;
}
}
}
END { print "// END" }
However, I'm stuck with the second requirement..
Questions:
Is it possible to add to my awk script code to process the hmtl code inside the <code>
tags? Any idea ? I thought about using sed but I didn't see how to do that.
Maybe awk isn't the right tool for that ? I'm open for any suggestion on other (standard Linux) tool.
Answering own question.
I found a solution by doing a two step awk process:
<code>
or </code>
as field delimiter, using a regex, and process the string replacement on second argument ($2).The shell file becomes:
echo "Step 1"
awk -f process_src2gift.awk $1.src >$1.tmp
echo "Step 2"
awk -f process_src2gift_2.awk $1.tmp >$1.gift
rm $1.tmp
And the second awk file (process_src2gift_2.awk
) will be:
BEGIN { FS="[<][/]?[c][o][d][e][>]"; }
{
gsub(">","\\>",$2);
gsub("<","\\<",$2);
if( NF >= 3 )
print $1 "<code>" $2 "</code>" $3
else
print $0
}
Of course, there are limitations:
<code>
tag<code></code>
in the line