Why does the position and newposition give the same output and update together in the next loop?
for game in range(nr_of_games):
# Initialize the player at the start position and store the current position in position
position=np.array([0,19])
status = -1
# loop over steps taken by the player
while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)
# Find out what move to make using
q_in=Q[position[0],position[1]]
move, action = action_fcn(q_in,epsilon,wind)
# update location, check grid,reward_list, and status_list
newposition[0] = position[0] + move[0]
newposition[1] = position[1] + move[1]
print('new loop')
print(newposition)
print(position)
grid_state = grid[newposition[0]][newposition[1]]
reward = reward_list[grid_state]
status = status_list[grid_state]
status = int(status)
if status == 1:
Q[position[0],position[1],action]= reward
break #Game over
else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
position = newposition
print out:
new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]
that is because you trying to copy one list to another list with =
operator; used with lists it assigns the pointer stored in right variable to the left variable, so physically the point to the same memory cells.
To copy a list truly, use the list.copy()
method.