pythonregexstringsplit

Python - Split string on first occurrence of non-allowable characters


I have some python code where i want to scan and split string on first occurrence of non-allowable characters.

import re,string
mystring="my_id=abc-something_123&anything#;?lcdkahck;my_id%3Dkckdkkj_bcjc"
if "my_id=" in mystring:
    mystring = mystring[mystring.index("my_id=") + 6 : len(mystring)][0:100]
    mystring = re.split('[;&#]', mystring)[0]
    print(mystring)

What happens in this, I get string correctly where ;&# is coming, but my data can have any unpredictable character put of ;&#.

What i tried drive out these characters

allowable_character = '-' + '_' + string.ascii_letters + string.digits
mystring = re.sub('[^%s]' % allowable_character, '', mystring)
print(mystring)

But this just filters the string with characters that are not in 'allowable_character'.

What i am trying to achieve is to split string once the character which is not in 'allowable_character' and return that string.

So I want expected output as 'abc-something_123'

Any help is appreciated here


Solution

  • You could just use re.findall here:

    mystring = "my_id=abc-something_123&anything#;?lcdkahck;my_id%3Dkckdkkj_bcjc"
    match = re.findall(r'^my_id=([\w-]*).*$', mystring)[0]
    print(match)
    

    This prints:

    'abc-something_123'