pythonstringsnakecasing

Convert string to snake-case based on meaningful word boundaries


I have a concatenated string which I need to convert into snake case. For e.g;

converttobigendian -> convert_to_big_endian

Can this be achieved using a regular expression? If not, how can this be achieved?


Solution

  • The most robust and straightforward solution is to use the wordsegment library. It's built for this exact task and comes with a large dictionary derived from English text corpora, allowing it to calculate the most probable segmentation.

    pip install wordsegment
    

    Use it in your code: The library does all the heavy lifting. You just need to import it and call the segment function.

    import wordsegment
    
    # This loads the internal dictionary and language model
    wordsegment.load()
    
    input_string = "converttobigendian"
    segmented_words = wordsegment.segment(input_string)
    
    snake_case_string = "_".join(segmented_words)
    
    print(snake_case_string)
    # Output: convert_to_bigendian