pythonnumpyopenai-gymray

Is there a way to define a Gym action space where N values must sum to a constant?


Is there a way to define a Gym action space where N values (each on [-1, 1], inclusive) must sum to a specified constant? In my case, I want N = 13 and the constant c = 0.0, so valid agent 'actions' should look like a Numpy array with dimensions (1, 13) where all the elements of the array must sum to zero.

The context is that I am writing a Python class to create a custom Gym environment to train a reinforcement learning (RL) agent to learn the best way to allocate a limited resource. The agent's actions (exploratory guesses) should look like a Numpy array with dimensions (1, 13) (with each element on [-1, 1]), and all the elements of the array should sum to zero. During a simulation, the agent will have to explore the feasible action space to learn which allocation of resources (i.e., 'weights' in a separate 'target' array--with dimension (1, 13) and elements on [0, 1]) is optimal, resulting in the highest cumulative reward over a number of training episodes. The idea is that the agent--through it's actions at each step (successive Numpy arrays)--will adjust the allocation of resources until it is optimal (returns the highest reward). So for example, if the current target weights (fractional allocation of resources) was initially:

np.array([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.3, 0.5])

(which sums to 1), the agent's 'action array' (added to the target) might be:

np.array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.2, -0.3, -0.5])

(elements sum to zero), resulting in a better (or worse) allocation of resources:

np.array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])

(still sums to 1). The agent's next feasible action (reallocation) might be:

np.array([-1.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.5, 0.0, 0.0, 0.0])

(sums to zero), which when added to the target, results in a (hopefully better) set of weights:

np.array([0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.5, 0.0, 0.0, 0.0])

(still sums to 1), etc.

I am new to defining Gym action spaces, but as I understand it, one way to define an action space of the type I think I need is as follows:

import gym
from gym.spaces import Box

action_space = gym.spaces.Box(low=-1., high=1., shape=(1,13))

# or maybe:

action_space = gym.spaces.Box(low = np.array([-1.]*13), high = np.array([1.]*13))

The problem with either one of these definitions (I believe) is that it would allow agent actions which are infeasible (in that the elements of the returned 'action array' will usually not sum to zero). I could be wrong, but I think the constraint should be imposed through the design of the environment (and the action space) rather than through the design of an agent. My intent is to design a custom environment and leverage Ray solution algorithms in a simulation to play the role of the agent.

Perhaps there is a way to define a Box action space whose elements must sum to zero? Is there a better way to approach this?

For compatibility reasons, in my development environment, I am using Python 3.9.15, Gym 0.21.0, and Ray 1.11.0.

References:


Solution

  • A solution that worked for me was to define a Gym action space using Box and have an agent learn a set of target weights directly (rather than by having an agent learn which changes to a set of existing target weights would result in a better set). This reduced the amount of variation in the problem and improved numerical stability during training.

    Also, in a standard 'step' function (within a class used to create a custom Gym environment)--used by an agent to submit actions (guesses) during training, the environment effectively imposed a summation constraint on the agent's guesses by normalizing (scaling) the submitted arrays to sum to 1.0 (or any other chosen constant, c). In the same step function, rewards were constructed to incentivize an agent to submit better and better sets of weights, which were used to achieve a goal.

    For example:

    import gym
    from gym.spaces import Box
    import numpy as np
    
    # In custom class defining a Gym environment:
    
    def __init__(self, config=None):
        # Action space:  a set of target weights guessed by the agent
        # (in this case, 13 cont. values, each on [0, 1]).  
        # Array sum exists on [0., 13.] until normalized to equal 1.0
        ...
        action_space = Box(low = 0., high = 1., shape=(13,))
        ...
    
    def step(self, action):
        # Given the agent's action (an array of proposed weights), returns 
        # the next obs, reward, done (true/false), and optionally, other info
        ...
        # Desired summation constant c = 1.0: divide each
        # weight element-wise by the sum of all weights
        normalized_weights = action/(c * np.sum(action))
        ...
        # rewards, etc., set as desired
        ...
        return new_observation, reward, done, {}