I'm reading jurafsky's book for NLP and it mentioned for training MaxEnt we need to regularize weights to prevent overfitting. But I can't understand why this occurs. Can anyone explain how overfitting occurs during training maxent if we don't use regularization?
I have not read that particular book, but as a practitioner of machine learning, I can tell you that overfitting is possible with any type of model. MaxEnt would not be an exception.
The question you probably should be asking is, "What is overfitting, and what causes it?"
Check out: Distinguishing overfitting vs good prediction
Overfitting tends to occur when you are trying to estimate too may coefficients, or more generally have a model that is too flexible, given the amount of training data you're working with. The result is that your model will "learn" the noise in the data, reducing its predictive accuracy out of sample.
There are two ways of dealing with overfitting. Either (1) get more training data, or (2) reduce the complexity of your model. Regularization falls into category (2), and works by penalizing "complex" solutions, thereby reducing variance. What "complex" means differs depending on the model type.
There is tons of material on overfitting and regularization online and in intro ML textbooks. If you want an accessible explanation, I'd recommend Learning from Data by Abu-Mostafa.