I'm using the tensorflow DQN Agent with a Simulink Environment. While calling the agents collect policy
agent.collect_policy.action(time_step)
I get the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node
__wrapped__Select_device_/job:localhost/replica:0/task:0/device:CPU:0}} 'then' and 'else' must have the same size. but received: [1] vs. [] [Op:Select] name:
calling the standard policy is working
agent.policy.action(time_step)
I double checked the wether my TimeStep
matches my TimeStepSpec
and it matches.
(I guess the agent.policy wouldn't if it wouldn't match)
As far as I know the call of both policies is pretty similar in tf_policy.py
so I have no idea what's causing the problem.
If anybody has an idea what causes the error feel free to help :)
Heres a code snippet of my agent, etc. I hope this will help
the specification:
discount = 0.95
reward = 0.0
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
time_step_spec = TimeStep(step_type = tensor_spec.BoundedTensorSpec(shape=(1,), dtype=tf.int32, minimum=0, maximum=2),
reward = tensor_spec.TensorSpec(shape=(1,), dtype=tf.float32),
discount = tensor_spec.TensorSpec(shape=(1,), dtype=tf.float32), #fix
observation = tensor_spec.TensorSpec(shape=(1,amountMachines), dtype=tf.int32)
)
num_possible_actions = 729
action_spec = tensor_spec.BoundedTensorSpec(
shape=(), dtype=tf.int32, minimum=0, maximum=num_possible_actions - 1)
agent = dqn_agent.DqnAgent(
time_step_spec,
action_spec,
q_network=model,
optimizer=optimizer,
epsilon_greedy= 1.0,
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter)
agent.initialize()
the call:
current_state = get_states() #gets a np.array looking like this [4,4,4,4,4,6]
current_state_batch = tf.expand_dims( tf.convert_to_tensor(current_state, dtype=tf.int32), axis=0
time_step = TimeStep(step_type=tf.convert_to_tensor([step_type], dtype=tf.int32),
reward=tf.convert_to_tensor([reward], dtype=tf.float32),
discount=tf.convert_to_tensor([discount], dtype=tf.float32),
observation= current_state_batch)
action_step = agent.collect_policy.action(time_step)
This is the whole error code:
Traceback (most recent call last): File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\runpy.py", line 198, in _run_module_as_main
return _run_code(code, main_globals, None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\runpy.py", line 88, in _run_code
exec(code, run_globals) File "c:\Users\STestUser\.vscode\extensions\ms-python.python-2023.20.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy\__main__.py", line 39, in <module>
cli.main() File "c:\Users\STestUser\.vscode\extensions\ms-python.python-2023.20.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 430, in main
run() File "c:\Users\STestUser\.vscode\extensions\ms-python.python-2023.20.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__") File "c:\Users\STestUser\.vscode\extensions\ms-python.python-2023.20.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\STestUser\.vscode\extensions\ms-python.python-2023.20.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals, File "c:\Users\STestUser\.vscode\extensions\ms-python.python-2023.20.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals) File "d:\Hochschule\Master\Masterarbeit\energy-efficiency-optimation\RL-Modell\OP10_QLearning.py", line 449, in <module>
action_step = agent.collect_policy.action(time_step = time_step_t)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tf_agents\policies\tf_policy.py", line 333, in action
step = action_fn(time_step=time_step, policy_state=policy_state, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tf_agents\utils\common.py", line 193, in with_check_resource_vars
return fn(*fn_args, **fn_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tf_agents\policies\epsilon_greedy_policy.py", line 141, in _action
action = tf.nest.map_structure(
^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tensorflow\python\util\nest.py", line 629, in map_structure
return nest_util.map_structure(
^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tensorflow\python\util\nest_util.py", line 1168, in map_structure
return _tf_core_map_structure(func, *structure, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tensorflow\python\util\nest_util.py", line 1208, in _tf_core_map_structure
[func(*x) for x in entries],
^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tensorflow\python\util\nest_util.py", line 1208, in <listcomp>
[func(*x) for x in entries],
^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tf_agents\policies\epsilon_greedy_policy.py", line 142, in <lambda>
lambda g, r: tf.compat.v1.where(cond, g, r),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None File "C:\Users\STestUser\AppData\Local\anaconda3\Lib\site-packages\tensorflow\python\framework\ops.py", line 5888, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node
__wrapped__Select_device_/job:localhost/replica:0/task:0/device:CPU:0}} 'then' and 'else' must have the same size. but received: [1] vs. [] [Op:Select] name:
UPDATE: Found the error on my own: The problem was the batch_size
.
I'm currently working with batch_size = 1
. So I have to give the vars time_step
like this:
reward=tf.convert_to_tensor([reward], dtype=tf.float32)
BUT for the time_step_spec
I need to define it like this:
reward = tensor_spec.TensorSpec(shape=(), dtype=tf.float32)
So its shape()
in the spec and the shape in time_step
is shape(1,)
which means 1D= BatchSize, 0D =actual Data