I'm trying to make a function that automatically takes a table from a website(Wikipedia) cleans it a bit and than displays it, everything worked well with my first 2 tables but the third one is giving me some troubles.
This is the code to define the function:
def createTable(url, match):
data= pd.read_html(url, match= match)
name= data[0]["Name"]
origin= data[0]["Origin"]
type_= data[0]["Type"]
number= data[0]["Number"]
df= pd.DataFrame({"Name": name, "Origin": origin, "Type": type_, "Number": number})
df.replace("?", np.nan, inplace=True)
df['Number']= df['Number'].replace(to_replace={r"\(.*\)": "", r"\[.*\]": ""}, regex=True)
return df
and this is the function at work:
df_avIT= pd.DataFrame()
df_avIT= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_Italian_Army",
"125 To be upgraded and remain in service until 2035")
df_avUK= pd.DataFrame()
df_avUK= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_British_Army",
"Challenger 2")
df_avFR= pd.DataFrame()
df_avFR= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_French_Army",
"AMX Leclerc")
As I said the first 2 give me no problem at all but when I tried on the third it returns, ValueError: If using all scalar values, you must pass an index. I know well the code isn't great I'm trying to improve it but this problem is stopping me and I can't find a valid solution, even though I scearched for similar problem to mine in various forums. (I'm sorry if my English is bad, if you didn't understand something tell me I'm gonna try to explain more).
Your script does not consistently yield Series for name
/origin
/type_
/number
, you sometimes have DataFrames, you can try to squeeze
:
name= data[0]["Name"].squeeze()
origin= data[0]["Origin"].squeeze()
type_= data[0]["Type"].squeeze()
number= data[0]["Number"].squeeze()
Side note: df_avIT = pd.DataFrame()
is useless, you don't need to initialize empty DataFrames since the variable will be overwritten by df_avIT = createTable(...)