optimizationreinforcement-learningreward

can we get 'good' values of predefined constants in a cost function using reinforcement learning?


I am new to reinforcement learning and I know the basic theory behind it. However, I could not map the problem to the existing frameworks. The problem is as follows:

  1. Given an environment with resources: X, Y, and Z

  2. Given a set of items I, each with (x, y, z, r), where x, y, and z are required resources for the item to serve, and r is the reward the agent receives if the item is served, (X, Y, Z) >> (x, y, z)

  3. To select the items from the set to serve, I am using a cost function f = ax + by + cz, where a, b, and c are predefined constants.

  4. The items are prioritized for selection based on the ratio r/f

  5. Objective: select items to serve so that the total reward (sum of r for all selected items) is maximum considering x, y, and z for each item and resources X, Y, and Z

  6. Problem: how to tune the values of a, b, and c, so that total reward is maximized?

Can you please suggest to me the following?

a) whether I can use reinforcement learning to tune the 'good' values of constants a, b, and c

b) If YES, how can I do that?

c) If NO, any suggestions for appropriate solution approaches?

Thank you.


Solution

  • What you're looking to do is a hyperparameter sweep, not a RL problem. This is at least how I interpret your post.

    To do a sweep you have a few possibilities: Grid Search, Random Search or advanced search methods suchs as Asynchronous Successive Halving Algorithm (ASHA). Grid search is worse at finding an optimum than random search and ASHA is more resource efficient than random search.

    To do a efficient sweep I suggest Ray Tune. There is a rally great example on how to use Tune in the PyTorch Documentation. It includes using ASHA as a simple, imported object instead of implementing a distributed sweep yourself.

    https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html