pythonpandasscikit-learnminmax

MinMaxScaler with range from multiple columns in dataframe


I have an OHLC dataframe (Open, High, Low, Close) for sensor data on a per minute basis. I need to scale the values but all with the same scale. The scale needs to use the minimum and maximum of any of the four columns. For example, the minimum could be in column 'Low' and the maximum could be in the column 'High'. Based on that range (min(df['low']) - max(df['high'])), I want to fit the scaler.

I am currently using the MinMaxScaler from sklearn.preprocessing. However, I can only fit it to one column. So if I fit it to column df['open'] and transform another column, the values are no longer between 0 and 1 but can be < 0 and > 1.

How can I use the full range of all columns in the scaler?


Solution

  • If anybody ends up on this page, I actually found another way of doing this, which involves reshaping the data using Numpy and feeding that into the scaler. Reshaping back and creating a new dataframe from that sorted my issue:

    import pandas as pd
    import numpy as np
    from sklearn.preprocessing import MinMaxScaler
    
    #kudo's to Nick, I used his df to illustrate my example.
    df = pd.DataFrame({
      'Open': [1, 1.1, 0.9, 0.9],
      'High': [1.2, 1.2, 1.1, 1.3],
      'Low': [1, 1.0, 0.8, 0.7],
      'Close': [1.1, 1.2, 0.8, 1.2] 
    })
    
    scaler = MinMaxScaler()
    df_np = scaler.fit_transform(df.to_numpy().reshape(-1,1))
    df = pd.DataFrame(df_np.reshape(4,-1), columns=df.columns)
    
    #   Open    High    Low Close
    # 0 0.500000    0.833333    0.500000    0.666667
    # 1 0.666667    0.833333    0.500000    0.833333
    # 2 0.333333    0.666667    0.166667    0.166667
    # 3 0.333333    1.000000    0.000000    0.833333