pythonstringreformatting

Reformatting Strings from Scraped Data in order to Satisfy Keyword Argument


I am working on a baseball analysis project where I web-scrape the real-time lineups for a given team, on a given date.

I am currently facing an issue with the names that I receive in the scraped dataframe -- in random cases, the player names will come in a different format and are unusable (I take the player name and pass it into a statistics function which will only work if I have the players name formatted correctly.)

Example:

     Freddie Freeman
     Ozzie Albies
     Ronald Acuna
     Austin RileyA. A.Riley 
     Dansby Swanson
     Adam Duvall
     Joc PedersonJ. J.Pederson

As you can see, most of the names are formatted normally, however, In a few cases, the player name is displayed, along with the first letter of their first name added onto their last name, followed by a period, and then their First initial and last name. If I could turn: Austin RileyA. A.Riley, into Austin Riley, then everything would work.

This is a consistent theme throughout all teams and data that I pull -- sometimes there a few players whos names are formatted in this exact way -- FirstName + LastName+First letter of First Name. + First initial. + Last Name

I am trying to figure out a way to re-format the names so that they are usable and doing so in a way that is generalized/applicable to any possible names.


Solution

  • If the theme is really consistent you could do something like this:

    name_list = ['Freddie Freeman',
             'Ozzie Albies',
             'Ronald Acuna',
             'Austin RileyA. A.Riley ',
             'Dansby Swanson',
             'Adam Duvall',
             'Joc PedersonJ. J.Pederson']
    new_list = []
    for n in name_list:
        new_list.append(n[:n.find('.')-1])
    new_list
    

    There are several methods to achieve this (also using regex which I would not reccomend). The example I have posted is the best in my opinion ( find() documentation)