I have a long document with text like this:
paragraph = '''
• Mobilising independently with a 4-wheel walker
• Gait: Good quality with good cadence, good step length, and adequate foot clearance
• Lower limb examination:
- Tone: Normal bilaterally
- No clonus in either leg
- Passive dorsiflexion: Possible to -10 to -5 degrees bilaterally
- Power: 5/5 in all major muscle groups of lower limbs bilaterally
- Sensation: Intact to gross touch bilaterally
- ROM:
o Right hip flexion: 0-120 degrees
o Left hip flexion: 10-120 degrees (fixed flexion deformity present)
• Feet: Not broad-based
2. Post-stroke mobility and spasticity
• Improvement noted in right leg stiffness
• Continuing with current exercise regimen:
- Neuro Group twice weekly
- Gym group (including bike riding)
- Home exercises
• Plan:
- Continue current exercise programme
- Maintain current baclofen dose
'''
I wish to correctly format the lines by doing proper indentation. Basically I wish to convert the above to this:
paragraph = '''
• Mobilising independently with a 4-wheel walker
• Gait: Good quality with good cadence, good step length, and adequate foot clearance
• Lower limb examination:
- Tone: Normal bilaterally
- No clonus in either leg
- Passive dorsiflexion: Possible to -10 to -5 degrees bilaterally
- Power: 5/5 in all major muscle groups of lower limbs bilaterally
- Sensation: Intact to gross touch bilaterally
- ROM:
o Right hip flexion: 0-120 degrees
o Left hip flexion: 10-120 degrees (fixed flexion deformity present)
• Feet: Not broad-based
2. Post-stroke mobility and spasticity
• Improvement noted in right leg stiffness
• Continuing with current exercise regimen:
- Neuro Group twice weekly
- Gym group (including bike riding)
- Home exercises
• Plan:
- Continue current exercise programme
- Maintain current baclofen dose
'''
I wrote the following code, but it's not properly formatting the strings:
add_indent = ""; corpus = []; bullet_point = ""
for line in paragraph.split("\n"):
if line.strip().endswith(":") and len(line.split(" ")[0])==1: add_indent += " "; bullet_point = line.split(" ")[0]
elif not line.strip().endswith(":") and bullet_point == line.split(" ")[0]: add_indent = add_indent[:-2]
elif not line: add_indent = ""
corpus.append(add_indent+line)
for line in corpus: print(line)
Where am I going wrong?
Well for one, just some friendly advice, don't format your Python like some other language and don't introduce semi-colons unless you must. Looking at your code:
add_indent = ""
corpus = []
bullet_point = ""
for line in paragraph.split("\n"):
if line.strip().endswith(":") and len(line.split(" ")[0]) == 1:
add_indent += " "
bullet_point = line.split(" ")[0]
elif not line.strip().endswith(":") and bullet_point == line.split(" ")[0]:
add_indent = add_indent[:-2]
elif not line:
add_indent = ""
corpus.append(add_indent + line)
for line in corpus:
print(line)
Your code increases the current indent add_indent
when it detects a :
at the end, but then applies it to the line it is currently analysing, which causes the indentation to start one line too early.
More generally, you look for a :
, but wouldn't it make more sense to detect the change in bullet-point style?
What about:
result = []
indent = 0
indent_size = 2
bullet_points = []
for line in paragraph.split("\n"):
if not line:
result.append(line)
continue
first = line.split()[0]
# this may be a bit weak, consider predefined valid bullet point characters
if len(first) == 1:
if first in bullet_points:
bullet_points = bullet_points[:bullet_points.index(first) + 1]
else:
bullet_points.append(first)
indent = (len(bullet_points) - 1) * indent_size
else:
indent = 0
result.append(' ' * indent + line.strip())
for line in result:
print(line)