pythonpandasstockquotes

How to use 'pd.melt()' function in a 'pd.MultiIndex' dataframe?


I'm pulling stock quote data from Yahoo Finance for multiple stocks in 'acoes', but the pd.melt() function is not returning the stock symbols, only the other data columns (result shown in the image).

import pandas as pd
import datetime
import yfinance as yf

acoes = ["PETR4.SA", "BOVA11.SA", "VALE3.SA"]

hoje = datetime.datetime.now()
data_final = hoje - datetime.timedelta(days=30)

dados_acoes = yf.download(acoes, start=data_final, end=hoje)

dados_acoes = dados_acoes.reset_index()

volume_financeiro = dados_acoes['Volume'] * dados_acoes['Close']
volume_financeiro_label = "Volume Financeiro"

volume_financeiro.columns = pd.MultiIndex.from_product([[volume_financeiro_label],volume_financeiro.columns.unique()])

final_df = pd.concat([dados_acoes,volume_financeiro],axis=1)

dados_acoes_melzinho = pd.melt(final_df, id_vars='Date', var_name='Ativo', value_name='VF')

display(dados_acoes_melzinho)

Dataframe Result

How can I display the stock codes in the 'Ativo' column?


Solution

  • Use index operations to reshape your dataframe and keep multi-index information:

    >>> (final_df.set_index('Date').rename_axis(columns=[None, 'Symbol'])
                 .stack(level='Symbol').reset_index())
    
             Date     Symbol   Adj Close       Close        High         Low  \
    0  2023-03-20   PETR4.SA   22.930000   22.930000   23.650000   22.889999   
    1  2023-03-20   VALE3.SA   83.410004   83.410004   83.989998   81.809998   
    2  2023-03-21   PETR4.SA   23.400000   23.400000   23.600000   23.080000   
    3  2023-03-21   VALE3.SA   82.709999   82.709999   83.440002   81.540001   
    4  2023-03-22   PETR4.SA   23.330000   23.330000   23.620001   23.040001   
    ..        ...        ...         ...         ...         ...         ...   
    48 2023-04-17   PETR4.SA   26.709999   26.709999   26.799999   26.209999   
    49 2023-04-17   VALE3.SA   77.870003   77.870003   80.440002   77.449997   
    50 2023-04-18  BOVA11.SA  102.699997  102.699997  103.050003  101.739998   
    51 2023-04-18   PETR4.SA   27.389999   27.389999   27.620001   26.520000   
    52 2023-04-18   VALE3.SA   78.540001   78.540001   78.930000   77.660004   
    
              Open      Volume  Volume Financeiro  
    0    23.510000  57575200.0       1.320199e+09  
    1    82.720001  19650900.0       1.639082e+09  
    2    23.200001  43391200.0       1.015354e+09  
    3    83.220001  15692200.0       1.297902e+09  
    4    23.389999  46435600.0       1.083343e+09  
    ..         ...         ...                ...  
    48   26.240000  49427100.0       1.320198e+09  
    49   79.480003  21479200.0       1.672585e+09  
    50  102.989998   7350557.0       7.549022e+08  
    51   26.709999  99279800.0       2.719274e+09  
    52   78.790001  15115300.0       1.187156e+09  
    
    [53 rows x 9 columns]
    

    Note: you can remove .reset_index() if you want to keep Date and Symbol columns as index.

    EDIT

    I tried to use resample() to get the monthly average on 'Volume Financeiro'

    Try:

    >>> final_df.resample('M', on='Date')['Volume Financeiro'].mean()
                   BOVA11.SA      PETR4.SA      VALE3.SA
    Date                                                
    2023-03-31  7.735287e+08  1.167753e+09  1.512644e+09
    2023-04-30  7.122048e+08  1.594721e+09  1.747704e+09