How can I increase default value in max_components
variable?
By default max_components
is set to 30000. I need increase this limit because every time I do deduplications (using the same datasets) I have different results.
I think that the total amount of clusters in my data is bigger than 30000.
Answer from Github
Issue in dedupe github Increase max_components = 30000
If you are getting different results using same saved settings file, then what you reporting is a bug. If you are getting different results from different training data (or even the same training data), that's expected as at various points dedupe uses a random sample to learn good rules.
In either case, I doubt that max_components is related. But, if you want to change it, fork the code and change it.