I am using ktrain package to classify text. My experiment is shown as:
lr_find and lr_plot are functions in ktrain. They can be used to highlight the best learning rate, which is shown as the red dot in the plot.
I do not understand how to understand this plot:
As the text from the lr_find
method says, you can visually inspect the plot and choose a learning rate in a range where the loss is falling prior to divergence. A higher learning rate in this range will converge faster. This is an idea called an "LR range test" from Leslie Smith's paper that became popular through the fastai library and was later adopted by other libraries like ktrain and Amazon's Gluon library. The red dot in this plot is just a numerical approximation of where the loss is dramatically falling that may be useful for automated scenarios, but not necessarily the best. In this plot, the red dot represents the steepest part of the curve, which is one strategy to automatically select a learning rate from the plot (without visual inspection). Other automated strategies include taking the learning rate associated with the minimum loss and dividing by 10, and finding the learning rate associated with the longest valley.