pythonpandasdataframeopenai-gym

gym anytrading python dataframe format


I am very new to gym anytrading i have this pandas dataframe where there is a column with a list of lists that are different lengths I am trying to figure out how to put that into the gym anytrading environment. Below is a link to a csv of the sample data in the dataframe and a code snippet. I keep getting this error TypeError: cannot unpack non-iterable NoneType object

import gym
import pandas as pd
import numpy as np
from gym_anytrading.envs import TradingEnv

    
class CustomTradingEnv(TradingEnv):
    def __init__(self, df):
        super().__init__(df, window_size=10)
        self.reward_range = (0, 1)

    def _process_data(self):
        # Process your DataFrame to ensure it's in the correct format
        # Here, you can perform any necessary preprocessing steps
        pass

    def reset(self):
        # Initialize the environment with the data from the DataFrame
        self._process_data()
        return super().reset()
    
    

    env = CustomTradingEnv(df)

    observation = env.reset()
    for _ in range(100):  # Run for 100 steps
    action = env.action_space.sample()  # Sample a random action
    observation, reward, done, info = env.step(action)
    if done:
        break

https://docs.google.com/spreadsheets/d/1-LFNzZKXUG44smSYOy2rgVVnqiygLfs00lAl2vFdsxM/edit?usp=sharing


Solution

  • First of all, you are not using the right gym package:

    import gym
    

    needs to be

    import gymnasium as gym
    

    since gym_anytrading also uses gymnasium (which is subtly different from the no-longer-maintained, older gym package).

    Then the indentation in your code is incorrect - I assume this is just a typo. The iteration should be:

    for _ in range(100):  # Run for 100 steps
        action = env.action_space.sample()  # Sample a random action
        observation, reward, done, info = env.step(action)
        if done:
            break
    

    The actual error is coming from the __init__ function and the _process_data function. The full traceback is:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "...", line 24, in test
        env = CustomTradingEnv(df)
      File "...", line 9, in __init__
        super().__init__(df, window_size=10)
      File "/.../anaconda3/envs/py310/lib/python3.10/site-packages/gym_anytrading/envs/trading_env.py", line 35, in __init__
        self.prices, self.signal_features = self._process_data()
    TypeError: cannot unpack non-iterable NoneType object
    

    To fix this you need to modify process_data so that it returns a tuple of numpy arrays, prices (a 1-dim array of floats) and signal_features (a 2-dim array). This is also clearly explained in the gym_anytrading README. With your dataframe you could do something like:

        def _process_data(self):
            env = self.unwrapped        
            start = 0
            end = len(env.df)
            prices = env.df.loc[:, 'low'].to_numpy()[start:end]
            signal_features = env.df.loc[:, ['close', 'open', 'high', 'low']].to_numpy()[start:end]
            return prices, signal_features
    

    When you do that your test loop will work fine, apart from the fact that you didn't provide any reward function. A custom env needs to implement a _calculate_reward function that returns a reward signal for any of the possible actions (in this case there are only two). So that should look similar to:

        def _calculate_reward(self, action):
            # I'm assuming 0 and 1 stand for sell and buy (or viceversa)
            match action:
                 case 0: return 0.1 # or some other more suitable value
                 case 1: return -0.1 # or something more suitable
                 case _: raise Exception("bug")
    

    Also the _update_profit(self, action) function needs to be implemented (it needs to update the internal env state, self.unwrapped._total_profit, but you can let it pass as well).

    Finally, the env.step function in your code is incorrect (for the latest version of gymnasium). It should be used as:

       observation, reward, done, terminated, info = env.step(action)