[SOLVED] Replace a series of the same character by its number of occurrences in the series

Replace a series of the same character by its number of occurrences in the series

I get a string like this:

AABBBB$CCCDEEE$AABADEE

And I want a result like this:

2A4B$3CD3E$2ABAD2E

To do that, I made a for loop on the string array. It works well:

import re

string = "AABBBB$CCCDEEE$AABADEE"
out_string = string[:]
k = 1
c_old = ""
for c in string:
    if c_old==c :
        k += 1
    else:
        if k>1:
            s= ""
            for i in range(k):
                s += c_old
            chg = str(k) + c_old
            out_string = re.sub(s, chg, out_string, 1)
        k = 1
    c_old = c

print(out_string)

But with very long strings, it can take a long time.

Is there a way to do what I want without iterating all the string, especially with the re module?

Solution

Not sure why you think re.sub() is appropriate for this. You just need a fairly trivial iteration over the source string.

Something like this:

s = "AABBBB$CCCDEEE$AABADEE"

r = ""
c = 1
p = s[0]

for x in s[1:]:
    if x == p:
        c += 1
    else:
        if c == 1:
            r += p
        else:
            r += f"{c}{p}"
            c = 1
        p = x
else:
    r += p if c == 1 else f"{c}{p}"

print(r)

Output:

2A4B$3CD3E$2ABAD2E