pythonpandasreplace

How to remove "http://", "https://", or "www." from string in pandas


New to Python/pandas.

Within a column called "URL", I am trying to replace any URLs that have "http://", "https://", or "www." and just keep everything after it.

For example,

http://www.jhu.edu
http://www.brown.edu
http://https://www.amherst.edu
http://www.usc.edu

Should look like:

jhu.edu
brown.edu
amherst.edu
usc.edu

Solution

  • # example
    import pandas as pd
    data = {'colA': ['http://www.jhu.edu', 'http://www.brown.edu', 'http://https://www.amherst.edu', 'http://www.usc.edu']}
    df = pd.DataFrame(data)
    

    use str.replace with regex

    out = df['colA'].str.replace(r'https?://|www\.', '', regex=True)