nlpstanford-nlpcrfnamed-entity-recognition

Training Stanford-NER-CRF, control number of iterations and regularisation (L1,L2) parameters


I was looking through StanfordNER documentation/FAQ but I can't find anything related to specifying the maximum number of iterations in training and also the value of the regularisation parameters L1 and L2.

I saw an answer on which is suggested to set, for instance:

maxIterations=10

in the properties file, but that did not gave any results.

Is it possible to set these parameters?


Solution

  • I had to dig in the code but found it, so basically StanfordNER supports many different numerical optimization algorithms. One can see which ones are implemented and can be used to train the CRF by looking into the getMinimizer() method in the CRFClassifier.java file.

    I configured my properties file to use the Orthant-Wise Limited-memory Quasi-Newton, by setting:

    useOWLQN = true

    The L1-prior can be set with:

    priorLambda = 10

    An useful trick is to play with the convergence tolerance parameter TOL, which is checked at each iteration: |newest_val - previous_val| / |newestVal| < TOL, the TOL is controlled by:

    tolerance = 0.01

    Yet another useful parameter is to explicitly control the maximum number of iterations for which the learning algorithm should run:

    maxQNItr = 100