pythonstring

Algorithm to indent and de-indent lines


I have a long document with text like this:

paragraph = '''
• Mobilising independently with a 4-wheel walker
• Gait: Good quality with good cadence, good step length, and adequate foot clearance
• Lower limb examination:
- Tone: Normal bilaterally
- No clonus in either leg
- Passive dorsiflexion: Possible to -10 to -5 degrees bilaterally
- Power: 5/5 in all major muscle groups of lower limbs bilaterally
- Sensation: Intact to gross touch bilaterally
- ROM:
o Right hip flexion: 0-120 degrees
o Left hip flexion: 10-120 degrees (fixed flexion deformity present)
• Feet: Not broad-based

2. Post-stroke mobility and spasticity
• Improvement noted in right leg stiffness
• Continuing with current exercise regimen:
- Neuro Group twice weekly
- Gym group (including bike riding)
- Home exercises
• Plan:
- Continue current exercise programme
- Maintain current baclofen dose

'''

I wish to correctly format the lines by doing proper indentation. Basically I wish to convert the above to this:

paragraph = '''
• Mobilising independently with a 4-wheel walker
• Gait: Good quality with good cadence, good step length, and adequate foot clearance
• Lower limb examination:
  - Tone: Normal bilaterally
  - No clonus in either leg
  - Passive dorsiflexion: Possible to -10 to -5 degrees bilaterally
  - Power: 5/5 in all major muscle groups of lower limbs bilaterally
  - Sensation: Intact to gross touch bilaterally
  - ROM:
    o Right hip flexion: 0-120 degrees
    o Left hip flexion: 10-120 degrees (fixed flexion deformity present)
• Feet: Not broad-based

2. Post-stroke mobility and spasticity
• Improvement noted in right leg stiffness
• Continuing with current exercise regimen:
  - Neuro Group twice weekly
  - Gym group (including bike riding)
  - Home exercises
• Plan:
  - Continue current exercise programme
  - Maintain current baclofen dose

'''

I wrote the following code, but it's not properly formatting the strings:

add_indent = ""; corpus = []; bullet_point = ""
for line in paragraph.split("\n"):
    if line.strip().endswith(":") and len(line.split(" ")[0])==1: add_indent += "  "; bullet_point = line.split(" ")[0]
    elif not line.strip().endswith(":") and bullet_point == line.split(" ")[0]: add_indent = add_indent[:-2]
    elif not line: add_indent = ""
    corpus.append(add_indent+line)

for line in corpus: print(line)

Where am I going wrong?


Solution

  • Well for one, just some friendly advice, don't format your Python like some other language and don't introduce semi-colons unless you must. Looking at your code:

    add_indent = ""
    corpus = []
    bullet_point = ""
    for line in paragraph.split("\n"):
        if line.strip().endswith(":") and len(line.split(" ")[0]) == 1:
            add_indent += "  "
            bullet_point = line.split(" ")[0]
        elif not line.strip().endswith(":") and bullet_point == line.split(" ")[0]:
            add_indent = add_indent[:-2]
        elif not line:
            add_indent = ""
        corpus.append(add_indent + line)
    
    for line in corpus: 
        print(line)
    

    Your code increases the current indent add_indent when it detects a : at the end, but then applies it to the line it is currently analysing, which causes the indentation to start one line too early.

    More generally, you look for a :, but wouldn't it make more sense to detect the change in bullet-point style?

    What about:

    result = []
    indent = 0
    indent_size = 2
    bullet_points = []
    for line in paragraph.split("\n"):
        if not line:
            result.append(line)
            continue
        first = line.split()[0]
        # this may be a bit weak, consider predefined valid bullet point characters
        if len(first) == 1:
            if first in bullet_points:
                bullet_points = bullet_points[:bullet_points.index(first) + 1]
            else:
                bullet_points.append(first)
            indent = (len(bullet_points) - 1) * indent_size
        else: 
            indent = 0
        result.append(' ' * indent + line.strip())
    
    for line in result:
        print(line)