[SOLVED] What do the parameters of DBpedia Spotlight mean?

What do the parameters of DBpedia Spotlight mean?

I am interested in using DBpedia Spotlight. However, we need to insert a value to the two parameters confidence and support. What do these two parameters really mean?

I want to identify the significant, prominent n-grams in the text. In that case, what is the usual recommendation for confidence and support parameters (rule of thumb)?

Solution

When you ask DBpedia Spotlight to annotate text (finding entities/topics), it searches for n-grams that have URIs in DBpedia (n-grams that are Wikipedia titles). Those n-grams are called DBpedia resources.

Support: this is the Resource Prominence parameter, it helps you to ignore unimportant or uninformative resources. When you set a value X to it, this means resources that have a number of Wikipedia in-links smaller than X will be ignored and not returned to you.

Confidence: this is the Disambiguation Confidence parameter, it is a threshold which takes a value between 0 and 1. When you set a high value to it, you get better and more trustworthy annotations but you risk losing some correct ones.

Choosing values of those (or any other) parameters depends on your use case.

Examples:

If you have some test set or gold standard for the type of n-grams you are interested in, you can tune your choice until you get good enough results satisfied by your gold standard.
If you care about retrieving the top-N n-grams only to infer the topic of text, you can tune your parameters choosing high values to get few (mostly) correct n-grams and sort them by Confidence.
If you want to get as many n-grams as possible and your task won't get affected or biased by mistakes, you can set low values.