machine-learningnlpdata-miningdata-modelingcrf

can I use numerical features in crf model


Is it possible/good to add numerical features in crf models? e.g. position in the sequence.

I'm using CRFsuite. It seems all the features will be converted to string, e.g. 'pos=0', 'pos=1', which then lose it's meaning as euclidean distance.

Or should I use them to train another model, e.g. svm, then ensemble with crf models?


Solution

  • I figured out that CRFsuite does handle numerical features, at least according to this documentation:

    • {“string_key”: float_weight, ...} dict where keys are observed features and values are their weights;
    • {“string_key”: bool, ...} dict; True is converted to 1.0 weight, False - to 0.0;
    • {“string_key”: “string_value”, ...} dict; that’s the same as {“string_key=string_value”: 1.0, ...}
    • [“string_key1”, “string_key2”, ...] list; that’s the same as {“string_key1”: 1.0, “string_key2”: 1.0, ...}
    • {“string_prefix”: {...}} dicts: nested dict is processed and “string_prefix” s prepended to each key.
    • {“string_prefix”: [...]} dicts: nested list is processed and “string_prefix” s prepended to each key.
    • {“string_prefix”: set([...])} dicts: nested list is processed and “string_prefix” s prepended to each key.

    As long as:

    1. I keep the input properly formatted;
    2. I use float vs string of float;
    3. I normalize it.