4

I can't figure out why this code isn't working. When I make rewards into a list, I get an error telling me that the dimensions are incorrect. I'm not sure what to do.

I am implementing a reinforcement deep q network. r is a numpy 2d array giving 1 divided by the distance between stops. This is so that closer stops have a higher reward.

No matter what I do, I can't get rewards to run smoothly. I am new to Tensorflow, so it may just be a result of my inexperience with things like Tensorflow placeholders and feed dicts.

Thanks in advance for your help.

observations = tf.placeholder('float32', shape=[None, num_stops])

game states : r[stop], r[next_stop], r[third_stop]

actions = tf.placeholder('int32',shape=[None]) 

rewards = tf.placeholder('float32',shape=[None])  # +1, -1 with discounts

Y = tf.layers.dense(observations, 200, activation=tf.nn.relu)
Ylogits = tf.layers.dense(Y, num_stops)

sample_op = tf.random.categorical(logits=Ylogits, num_samples=1)

cross_entropies = tf.losses.softmax_cross_entropy(onehot_labels=tf.one_hot  (actions,num_stops), logits=Ylogits)

loss = tf.reduce_sum(rewards * cross_entropies)


optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001, decay=.99)
train_op = optimizer.minimize(loss)




visited_stops = []
steps = 0

with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    # Start at a random stop, initialize done to false
    current_stop = random.randint(0, len(r) - 1)
    done = False

    # reset everything    
    while not done: # play a game in x steps   

        observations_list = []
        actions_list = []
        rewards_list = []

        # List all stops and their scores
        observation = r[current_stop]

        # Add the stop to a list of non-visited stops if it isn't
        # already there
        if current_stop not in visited_stops:
            visited_stops.append(current_stop)

        # decide where to go
        action = sess.run(sample_op, feed_dict={observations: [observation]})

        # play it, output next state, reward if we got a point, and whether the game is over
        #game_state, reward, done, info = pong_sim.step(action)
        new_stop = int(action)


        reward = r[current_stop][action]

        if len(visited_stops) == num_stops:
            done = True

        if steps >= BATCH_SIZE:
            done = True

        steps += 1

        observations_list.append(observation)
        actions_list.append(action)
        rewards.append(reward)



        #rewards_list = np.reshape(rewards, [-1, 25])
        current_stop = new_stop

    #processed_rewards = discount_rewards(rewards, args.gamma)
    #processed_rewards = normalize_rewards(rewards, args.gamma)

    print(rewards)
    sess.run(train_op, feed_dict={observations: [observations_list],
                             actions: [actions_list],
                             rewards: [rewards_list]})

1 Answer 1

2

the row rewards.append(reward) causes the error, an it is because your rewards variable is a Tensor, as you defined it in rewards = tf.placeholder('float32',shape=[None]) and you can not append values to tensor like that. You probably wanted to call rewards_list.append(reward).

Also, you are initializing variables

observations_list = []
actions_list = []
rewards_list = []

inside the loop, so in each iteration, ols values will be overwritten by empty list. You probably want to have those 3 lines before the while not done: line.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.