pythonyfinance

How am I supposed to get just the close price and date from this data returned by yfinance?


I am fetching the data of the S&P index for every day and I want to insert the close price and the date into my database, however, I am new to Python and I have no idea how to navigate this weird data structure. This is what print(spxHistoricalData) returns:

Price             Close         High          Low         Open      Volume
Ticker            ^GSPC        ^GSPC        ^GSPC        ^GSPC       ^GSPC
Date
2000-01-03  1455.219971  1478.000000  1438.359985  1469.250000   931800000
2000-01-04  1399.420044  1455.219971  1397.430054  1455.219971  1009000000
2000-01-05  1402.109985  1413.270020  1377.680054  1399.420044  1085500000
2000-01-06  1403.449951  1411.900024  1392.099976  1402.109985  1092300000
2000-01-07  1441.469971  1441.469971  1400.729980  1403.449951  1225200000

Now I tried doing this:

print(spxHistoricalData[["Date", "Close"]])

But I just get an error KeyError: "['Date'] not in index"

Currently my code looks like this:

import yfinance as yf
import mysql.connector
import pandas as pd

spxHistoricalData = yf.download("^GSPC", start="2000-01-01", end="2025-01-01")
print(spxHistoricalData)
print(spxHistoricalData[["Date", "Close"]])

connection = mysql.connector.connect(
    host="127.0.0.1",
    port=3306,
    user="root",
    password="password",
    database="bitcoin_and_sp500_price_prediction"
)

cursor = connection.cursor()

values = [(row["Date"], row["Close"]) for index, row in spxHistoricalData.iterrows()]

print(values)

# Use executemany for batch insertion
try:
    sql_query = "INSERT INTO spx_historical_data (date, close_price) VALUES (%s, %s)"
    cursor.executemany(sql_query, values)
    connection.commit()
    print("All data inserted successfully using batch insertion.")

except mysql.connector.Error as err:
    print(f"Error: {err}")

finally:
    cursor.close()
    connection.close()

Solution

  • The problem you have encountered is that the dataframe returned from yf.download is a multi-index dataframe, and the column you are trying to access ('Date') does not exist in current state of the dataframe. There are many ways to resolve this (look up pandas MultiIndex), but a straightforward method is to do this:

    df_spx = spxHistoricalData.reset_index(['Date'])
    

    This will produce a dataframe with 'Date' as a column:

    Price        Date        Close  ...         Open      Volume
    Ticker                   ^GSPC  ...        ^GSPC       ^GSPC
    0      2000-01-03  1455.219971  ...  1469.250000   931800000
    1      2000-01-04  1399.420044  ...  1455.219971  1009000000
    2      2000-01-05  1402.109985  ...  1399.420044  1085500000
    3      2000-01-06  1403.449951  ...  1402.109985  1092300000
    4      2000-01-07  1441.469971  ...  1403.449951  1225200000