I am trying to get all indented lines in a markdown file in bash. I need their position in the file in order to be able to later extract or insert them again at their original position.
Below an example of markdown file for which I want to get all indented lines.
# Example bloc code
This is a bloc code
function display_results() {
awk '{print $0; system("sleep .5");}' $1
rm $1
}
This code displays results.
below an other example of bloc code
echo "------------------------------------------"
echo " TEST RESULTS"
echo "------------------------------------------"
Or just one line:
System.out.println("foo");
blablablab
Because I want the position of the bloc I parse the file line by line and look if the line is indented by using a regex.
Note: It is here mentionned that regex is not the right tool to get bloc code because it can happen that a bloc code be nested . I don´t have to handle this use case, and getting only normal bloc code as presented in the example above will be sufficient.
my code is:
# One of the regex I have tested
regex='^[[:blank:]]+' #Not find any line
while read line; do
# Try to find indented lines by using regex
if [[ $line =~ $regex ]]; then
echo "INDENTED: $line"
else
echo "TEXT: $line"
fi
done < $testFile
where $testFile
is the markdown file that I parse.
For now the best regex that I wrote (based on this answer and this one) match only some lines but not all of them.
With the following regex for example, I only get some of the lines but not all:
regexblank="[^a-zA-Z#]+[[:blank:]]"
regexspace="[^a-zA-Z#]+[[:space:]]"
blank="[^a-zA-Z#]+[[:blank:]]"
With the regex above the result is:
TEXT: # Example bloc code
TEXT:
TEXT: This is a bloc code
TEXT:
INDENTED: function display_results() {
INDENTED: awk '{print main.sh; system("sleep .5");}'
TEXT: rm
TEXT: }
TEXT:
TEXT: This code displays results.
TEXT:
TEXT: below an other example of bloc code
TEXT:
TEXT: echo "------------------------------------------"
INDENTED: echo " TEST RESULTS"
TEXT: echo "------------------------------------------"
TEXT:
TEXT: Or just one line:
TEXT:
TEXT: System.out.println("foo");
TEXT:
TEXT: blablablab
As you can see I have to specify in the three regex that the line must not begin with a letter or a #
otherwise some lines as the title are detected as indented.
Using awk as follow gives me all indented lines
awk '/^(\t|\s)+/' $mdFile
But awk works only on file and I need to have the position of each bloc.
How to parse a file and get all the lines that are indented? As I explained I am trying with regex, but any solution to get the indented lines and their respective position in the file will be great.
You can find the code and all the regex that I wrote here
Look at what line
contains on each line:
$ cat infile
line
indented
line
$ while read line; do echo "<$line>"; done < infile
<line>
<indented>
<line>
This is because of this behaviour of read
(emphasis mine):
One line is read from the standard input [...], split into words as described above in Word Splitting, and the first word is assigned to the first name, [...]
To prevent that, set IFS
to the empty string (and add -r
for good measure to avoid backslash interpretation):
$ while IFS= read -r line; do echo "<$line>"; done < infile
<line>
< indented>
<line>