pythonpandasdataframeuser-defined-functionsvalueerror

Trying get a table from a website (ValueError: If using all scalar values, you must pass an index)


I'm trying to make a function that automatically takes a table from a website(Wikipedia) cleans it a bit and than displays it, everything worked well with my first 2 tables but the third one is giving me some troubles.

This is the code to define the function:

def createTable(url, match):
    data= pd.read_html(url, match= match)
    name= data[0]["Name"]
    origin= data[0]["Origin"]
    type_= data[0]["Type"]
    number= data[0]["Number"]
    df= pd.DataFrame({"Name": name, "Origin": origin, "Type": type_, "Number": number})
    df.replace("?", np.nan, inplace=True)
    df['Number']= df['Number'].replace(to_replace={r"\(.*\)": "", r"\[.*\]": ""}, regex=True)
    return df

and this is the function at work:

df_avIT= pd.DataFrame()
df_avIT= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_Italian_Army",
            "125 To be upgraded and remain in service until 2035")
df_avUK= pd.DataFrame()
df_avUK= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_British_Army",
            "Challenger 2")
df_avFR= pd.DataFrame()
df_avFR= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_French_Army",
            "AMX Leclerc")

As I said the first 2 give me no problem at all but when I tried on the third it returns, ValueError: If using all scalar values, you must pass an index. I know well the code isn't great I'm trying to improve it but this problem is stopping me and I can't find a valid solution, even though I scearched for similar problem to mine in various forums. (I'm sorry if my English is bad, if you didn't understand something tell me I'm gonna try to explain more).


Solution

  • Your script does not consistently yield Series for name/origin/type_/number, you sometimes have DataFrames, you can try to squeeze:

    name= data[0]["Name"].squeeze()
    origin= data[0]["Origin"].squeeze()
    type_= data[0]["Type"].squeeze()
    number= data[0]["Number"].squeeze()
    

    Side note: df_avIT = pd.DataFrame() is useless, you don't need to initialize empty DataFrames since the variable will be overwritten by df_avIT = createTable(...)