Have a look at the original paper, section 5. Orientation assignment:
An orientation histogram is formed from the gradient orientations of sample points within a region around the keypoint [...] Peaks in the orientation histogram correspond to dominant directions of local gradients. The highest peak in the histogram is detected, and then any other local peak that is within 80% of the highest peak is used to also create a keypoint with that orientation. Therefore, for locations with multiple peaks of similar magnitude, there will be multiple keypoints created at the same location and scale but different orientations.
This is also explained by the VLFeat implementation (see sift.c):
This histogram is then smoothed and the maximum is selected. In addition to the biggest mode, up to other three modes whose amplitude is within the 80% of the biggest mode are retained and returned as additional orientations.