I am fetching the data of the S&P index for every day and I want to insert the close price and the date into my database, however, I am new to Python and I have no idea how to navigate this weird data structure. This is what print(spxHistoricalData) returns:
Price Close High Low Open Volume
Ticker ^GSPC ^GSPC ^GSPC ^GSPC ^GSPC
Date
2000-01-03 1455.219971 1478.000000 1438.359985 1469.250000 931800000
2000-01-04 1399.420044 1455.219971 1397.430054 1455.219971 1009000000
2000-01-05 1402.109985 1413.270020 1377.680054 1399.420044 1085500000
2000-01-06 1403.449951 1411.900024 1392.099976 1402.109985 1092300000
2000-01-07 1441.469971 1441.469971 1400.729980 1403.449951 1225200000
Now I tried doing this:
print(spxHistoricalData[["Date", "Close"]])
But I just get an error KeyError: "['Date'] not in index"
Currently my code looks like this:
import yfinance as yf
import mysql.connector
import pandas as pd
spxHistoricalData = yf.download("^GSPC", start="2000-01-01", end="2025-01-01")
print(spxHistoricalData)
print(spxHistoricalData[["Date", "Close"]])
connection = mysql.connector.connect(
host="127.0.0.1",
port=3306,
user="root",
password="password",
database="bitcoin_and_sp500_price_prediction"
)
cursor = connection.cursor()
values = [(row["Date"], row["Close"]) for index, row in spxHistoricalData.iterrows()]
print(values)
# Use executemany for batch insertion
try:
sql_query = "INSERT INTO spx_historical_data (date, close_price) VALUES (%s, %s)"
cursor.executemany(sql_query, values)
connection.commit()
print("All data inserted successfully using batch insertion.")
except mysql.connector.Error as err:
print(f"Error: {err}")
finally:
cursor.close()
connection.close()
The problem you have encountered is that the dataframe returned from yf.download is a multi-index dataframe, and the column you are trying to access ('Date') does not exist in current state of the dataframe. There are many ways to resolve this (look up pandas MultiIndex), but a straightforward method is to do this:
df_spx = spxHistoricalData.reset_index(['Date'])
This will produce a dataframe with 'Date' as a column:
Price Date Close ... Open Volume
Ticker ^GSPC ... ^GSPC ^GSPC
0 2000-01-03 1455.219971 ... 1469.250000 931800000
1 2000-01-04 1399.420044 ... 1455.219971 1009000000
2 2000-01-05 1402.109985 ... 1399.420044 1085500000
3 2000-01-06 1403.449951 ... 1402.109985 1092300000
4 2000-01-07 1441.469971 ... 1403.449951 1225200000