Given a state vector we can recursively decode a sequence in a greedy manner by generating each output successively, where each prediction is conditioned on the previous output. I read a paper recently that described using beam search during decoding with a beam size of 1 (k=1). If we are only retaining the best output at each step, isn't this just the same thing as greedy decoding, and offers none of the benefits typically provided by beam search?
Finally found an answer: beam size of 1 is the same as greedy search.
From "Abstractive Sentence Summarization with Attentive Recurrent Neural Networks":
"k refers to the size of the beam for generation; k = 1 implies greedy generation."