pythonsympysymfit

Preventing symfit models from sharing parameter objects


Multiple symfit model instances share parameter objects with the same name. I'd like to understand where this behaviour comes from, what it's intent is and if it's possible to deactivate.

To illustrate what I mean, a minimial example:

import symfit as sf
# Create Parameters and Variables
a = sf.Parameter('a',value=0)
b = sf.Parameter('b',value=1,fixed=True)
x, y = sf.variables('x, y')

# Instanciate two models
model1=sf.Model({y:a*x+b})
model2=sf.Model({y:a*x+b})

# They are indeed not the same
id(model1) == id(model2)
>>False

# There are two parameters
print(model1.params)
>>[a,b]
print(model1.params[1].name, model1.params[1].value)
>>b 1
print(model2.params[1].name, model2.params[1].value)
>>b 1
#They are initially identical

# We want to manually modify the fixed one in only one model
model1.params[1].value = 3
# Both have changed
print(model1.params[1].name, model1.params[1].value)
>>b 3
print(model2.params[1].name, model2.params[1].value)
>>b 3
id(model1.params[1]) == id(model2.params[1])
>>True
# The parameter is the same object

I want to fit multiple data streams with different models, but different fixed paramter values dependent on the data stream. Renaming the parameters in each instance of the model would work, but is ugly given that the paramter represents the same quantity. Processing them sequentially and modifying the parameters in between is possible, but I worry about unintended interactions between steps.

PS: Can someone with sufficient reputation please create the symfit tag


Solution

  • Excellent question. In principle this is because Parameter objects are a subclass of sympy.Symbol, and from its docstring:

    Symbols are identified by name and assumptions:
    
    >>> from sympy import Symbol
    >>> Symbol("x") == Symbol("x")
    True
    >>> Symbol("x", real=True) == Symbol("x", real=False)
    False
    

    This is fundamental to the inner working of sympy, and therefore something we also use in symfit. But the value and fixed arguments are not viewed as assumptions, so they are not used to distinguish parameters.

    Now, to your question on how this would affect fitting. Like you say, working sequentially is a good solution, and one that will not have any side effects:

    model = sf.Model({y:a*x+b})
    b.fixed = True
    fit_results = []
    
    for b_value, xdata, ydata in datastream:
        b.value = b_value
        fit = Fit(model, x=xdata, y=ydata)
        fit_results.append(fit.execute())
    

    So there is no need to define a new Parameter every iteration, the b.value attribute will be the same within each loop so there is no way this can go wrong. The only way I can imagine this going wrong is if you use threading, that will probably create some race conditions. But threading is not desirable for CPU bound tasks anyway, multiprocessing is the way to go. And in that case, separate processes will be spawned, creating separate microcosms, so there should be no problem there either.

    I hope this answers your question, if not let me know.

    p.s. I'm slowly answering my way up to 1500 to make that tag, but if someone beats me to it I'd be all the happier for it of course ;)