I have a table of content in the form of indention to track the hierarchy like:
- title1
-- title1-1
-- title1-2
--- title1-2-1
--- title1-2-2
- title2
-- title2-1
-- title2-2
- title3
- title4
I want to translate them with a numbering format like:
1 title1
1.1 title1-1
1.2 title1-2
1.2.1 title1-2-1
1.2.2 title1-2-2
2 title2
2.1 title2-1
2.2 title2-2
3 title3
4 title4
This is just an example where the string "title-*" could be any heading text. Also the size of an indent could get greater than in this example.
This comes from my real work, where I collect headings, or manually hand-written headings, in a Word document and reformat these possible headings from beginning to end aiming to correct any wrong order and indention.
I have tried this myself, and while mostly these headings were transformed into the desired format, for some it did not work out. How should this be done?
You could use the replacer callback of re.sub
to implement the logic. In that callback use a stack (that is maintained across multiple replacements) to track the chapter numbers of upper "levels".
Code:
import re
def add_numbers(s):
stack = [0]
def replacer(s):
indent = len(s.group(0)) - 1
del stack[indent+1:]
if indent >= len(stack):
stack.append(0)
stack[indent] += 1
return ".".join(map(str,stack))
return re.sub(r"^-+", replacer, s, flags=re.M)
Here is how you would call it on your example:
message_string = """- title1
-- title1-1
-- title1-2
--- title1-2-1
--- title1-2-2
- title2
-- title2-1
-- title2-2
- title3
- title4"""
res = add_numbers(message_string)
print(res)
This prints:
1 title1
1.1 title1-1
1.2 title1-2
1.2.1 title1-2-1
1.2.2 title1-2-2
2 title2
2.1 title2-1
2.2 title2-2
3 title3
4 title4