say I have a string like
'1 - hello.mp3'
'22 - hellox.mp3'
'223 - hellox.mp3'
'hellox.mp3'
I hope to output to be
'001 - hello.mp3'
'022 - hellox.mp3'
'223 - hellox.mp3'
'hellox.mp3'
that is if the starting is number, appending 0 to make it three digits.
Is there a way to achieve using regex in python?
Yes, regexes can do that. Use re.sub()
with a callback function:
import re
def pad_number(match):
number = int(match.group(1))
return format(number, "03d")
fixed_text = re.sub(r"^(\d+)", pad_number, text)
The pattern I used, ^(\d+)
matches 1 or more digits (\d
is a digit, +
will match at least one time but will encompass all following digits), but only at the start of the string (^
is the 'start of text' anchor here).
Then, for each matched pattern, the pad_number()
function is called, and the string that that function returns is used to replace the matched pattern. Because the pattern uses a capturing group (everything between (
and )
is such a group) the function can access the matched digits by calling match.group(1)
.
The function turns the digits into an integer, then uses the format()
function to turn that integer back into text, but this time as a 0-padded number 3 characters wide; that's what the 03
formatting instruction tells format()
to produce.
Note that the pattern can match more digits, but limiting them doesn't make much sense unless there is a strict upper number you want to limit to (at which point you need to also add a restriction on the next character not being a digit). The format(number, "03d")
instruction produces a number at least 3 digits wide but can handle longer values.
Demo:
>>> import re
>>> samples = [
... '1 - hello.mp3',
... '22 - hellox.mp3',
... '223 - hellox.mp3',
... 'hellox.mp3',
... ]
>>> def pad_number(match):
... number = int(match.group(1))
... return format(number, "03d")
...
>>> for sample in samples:
... result = re.sub(r"^(\d+)", pad_number, sample)
... print(f"{sample!r:20} -> {result!r:20}")
...
'1 - hello.mp3' -> '001 - hello.mp3'
'22 - hellox.mp3' -> '022 - hellox.mp3'
'223 - hellox.mp3' -> '223 - hellox.mp3'
'hellox.mp3' -> 'hellox.mp3'
Again, take into account that this method doesn't special case strings with 4 or more digits at the start; you simply get a longer sequence of digits:
>>> re.sub(r"^(\d+)", pad_number, "4281 - 4 digits")
'4281 - 4 digits'
>>> re.sub(r"^(\d+)", pad_number, "428117 - 6 digits")
'428117 - 6 digits'
This would happen even if we limited the \d
pattern to only match up to 3 digits (e.g. with \d{1,3}
).
If you wanted to make the padding width configurable, you can put everything in a nested function and use string formatting. You don't really need
import re
def pad_leading_number(text, width):
def pad_number(match):
number = int(match.group(1))
return format(number, f"0{width}d")
return re.sub(fr"^(\d+)", pad_number, text)
Demo:
>>> pad_leading_number("22 - hellox.mp3", 3)
'022 - hellox.mp3'
>>> pad_leading_number("22 - hellox.mp3", 7)
'0000022 - hellox.mp3'