I have created a dash app and am trying to deploy the app to render. Within this app, it reads multiple excel files. On my personal laptop, it reads the files locally. But I cannot seem to get render to read the files from github correctly. I don't know if it is a path issue, or what the underlying issue is.
I have encountered a variety of errors and attempted the solutions as follows. I tried passing a relative path such as .\assets"file_name.xlsx". However, render returned an error saying no file or path existed.
I then tried to pass the url and file name:
pd.read_excel("https://github.com/mtdewrocks/matchup/tree/072ac999722ded50e8b2eeb649c75f091a8ecbcb/assets/2024_Pitching_Logs.xlsx", usecols=["Name", "Date", "Opp", "W", "L", "IP", "BF", "H", "R", "ER", "HR", "BB", "SO","Pit"])
.
However, while it does not throw a file not found error, I now get a value error saying the Excel file format cannot be determined, you must specify an engine manually.
Therefore I specified the engine as openpyxl and got a BadZipFile - file is not a zip file. I understand this error is typically associated with a corrupt file; however, it happens with multiple files and the files are all fine locally so I don't know that this is the issue.
Lastly, I found a question that suggested this approach:
url = "https://github.com/mtdewrocks/matchup/tree/072ac999722ded50e8b2eeb649c75f091a8ecbcb/assets/Pitcher_Season_Stats.xlsx"
data = requests.get(url).content
df = pd.read_excel(BytesIO(data))
Use the URL to the raw file rather than the URL to the page on Github.
In this case, the url you want to specify is: "https://github.com/mtdewrocks/matchup/raw/072ac999722ded50e8b2eeb649c75f091a8ecbcb/assets/Pitcher_Season_Stats.xlsx"
This is the minimal code needed to read the file to a dataframe:
import pandas as pd
df = pd.read_excel("https://github.com/mtdewrocks/matchup/raw/072ac999722ded50e8b2eeb649c75f091a8ecbcb/assets/Pitcher_Season_Stats.xlsx")