python optimization scipy non-convex mystic

Non-convex optimization with linear constraints

I'm trying to solve an optimization question similar to the toy example described below. As noted in the comment, my current implementation with scipy is too slow and doesn't seem to converge. How do I get a decent solution? You can use scipy, mystic, or whatever library as you see fit.

Note that I don't need a global minimum, and the search can stop as soon as loss(X) <= 1. The real-world objective is mostly written in SQL and thus absurdly slow, so I also want the optimization to terminate when loss has been evaluated ~200 times. (This is not a hard requirement, you can also terminate after the optimization has been running for say 5 minutes.)

While this question resembles Minimizing non-convex function with linear constraint and bound in mystic, it's definitely not a duplicate. These two questions aren't even treating the same objective.

import numpy as np
from scipy.optimize import differential_evolution, LinearConstraint


# Inhabitants in a rural city are voting for the name of a newborn. The city houses 40 dwarves
# and 10 humans in total, and there are 100 candidate names to vote for.
dwarf_population, human_population = 40, 10
population = dwarf_population + human_population
candidate_count = 100

# Each inhabitant has different number of votes in their hand.
scores_per_citizen = np.random.randint(15, 20, population)

# Let's say the original result looks like this. (Yes, one can cast a fraction of their votes)
alpha = np.abs(np.random.normal(size=candidate_count))
scores = np.diag(scores_per_citizen).dot(np.random.dirichlet(alpha, population))
assert np.allclose(scores.sum(1), scores_per_citizen)


# Here is how the votes are counted.
def count(scores_: np.ndarray) -> np.ndarray:
    # Dwarves have a weird tradition: the top 10 popular names chosen by dwarves will get all their votes.
    # (I guess this is what makes our objective non-convex.)
    scores_by_dwarves = scores_[0:40, :]
    score_per_candidate_by_dwarves_raw = scores_by_dwarves.sum(1)
    top_10_popular_name_indices = np.argsort(-score_per_candidate_by_dwarves_raw)[:10]
    score_per_candidate_by_dwarves = np.zeros(candidate_count)
    score_per_candidate_by_dwarves[top_10_popular_name_indices] = score_per_candidate_by_dwarves_raw[
        top_10_popular_name_indices]
    score_per_candidate_by_dwarves = scores_by_dwarves.sum() \
                                     * score_per_candidate_by_dwarves \
                                     / score_per_candidate_by_dwarves.sum()
    assert np.allclose(score_per_candidate_by_dwarves.sum(), score_per_candidate_by_dwarves_raw.sum())

    # Humans, on the other hand, just adds everyone's votes together.
    scores_by_humans = scores_[40:, :]
    scores_per_candidate_by_humans = scores_by_humans.sum(0)

    # The final result is the sum of the scores by dwarves and humans.
    return score_per_candidate_by_dwarves + scores_per_candidate_by_humans


# So this is the legitimate result.
scores_per_candidate = count(scores)

# Now, you want to cheat a bit and make your proposal (assuming it's the first one) the most popular one.
target_scores_per_candidate = scores_per_candidate.copy()
argmax = scores_per_candidate.argmax()
target_scores_per_candidate[argmax] = scores_per_candidate[0]
target_scores_per_candidate[0] = scores_per_candidate[argmax]
assert np.allclose(scores_per_candidate.sum(), target_scores_per_candidate.sum())

# However, you cannot just manipulate the result, otherwise the auditors will find out!
# Instead, the raw scores must be manipulated such that your submission ranks top among others.
# You decide to solve for a multiplier to the original scores.
init_coef = np.ones_like(scores).reshape(-1)


# In other words, your goal is to find the minimum of the following objective.
def loss(coef_: np.ndarray) -> float:
    scores_ = scores * coef_.reshape(scores.shape)
    scores_per_candidate_ = count(scores_)
    return ((scores_per_candidate_ - scores_per_candidate) ** 2).sum()


# This is a helper constant matrix. Ignore it for now.
A = np.concatenate([np.tile(np.concatenate([np.repeat(1, candidate_count),
                                            np.repeat(0, population * candidate_count)]),
                            population - 1),
                    np.repeat(1, candidate_count)])
A = A.reshape((population, population * candidate_count))


# Meanwhile, some constraints must hold.
def constraints(coef_: np.ndarray):
    # The total votes of each citizen must not change.
    coef_reshaped = coef_.reshape((population, candidate_count))
    assert np.allclose((coef_reshaped * scores).sum(1), scores_per_citizen)

    # Another way to express the same thing with matrices.
    assert np.allclose(A.dot(np.diag(scores.reshape(-1))).dot(coef_), scores_per_citizen)

    # Additionally, all scores must be positive.
    assert np.all(coef_reshaped * scores >= 0)


# Let's express the constraint with a LinearConstraint.
score_match_quota = LinearConstraint(A.dot(np.diag(scores.reshape(-1))), scores_per_citizen, scores_per_citizen)

# Run optimization (Spoiler: this is extremely slow, and doesn't seem to converge)
rv = differential_evolution(loss,
                            bounds=[(0, 1000)] * init_coef.size,  # the 1000 here is superficial and can be replaced by other large numbers
                            init=np.vstack((init_coef, init_coef, init_coef, init_coef, init_coef)),
                            constraints=score_match_quota)

# Sanity check
constraints(rv.x)

Solution

The answer is almost the same as the question you referenced... however, there are some constraints that I'd have to think a bit further to demonstrate that. Let me rewrite your code -- just using some shorter names.

>>> import mystic as my
>>> import numpy as np
>>> 
>>> # Inhabitants in a rural city are voting for the name of a newborn.
... # The city houses 40 dwarves and 10 humans in total, and there are
... # 100 candidate names to vote for.
... dwarves, humans = 40, 10
>>> citizens = dwarves + humans
>>> names = 100
>>> # Each inhabitant has different number of votes in their hand.
... votes_per_citizen = np.random.randint(15, 20, citizens)
>>> 
>>> # Let's say the original result looks like this.
... # (Yes, one can cast a fraction of their votes)
... alpha = np.abs(np.random.normal(size=names))
>>> votes = np.diag(votes_per_citizen).dot(np.random.dirichlet(alpha, citizens))
>>> # NOTE: assert np.allclose(votes.sum(1), votes_per_citizen)
... 
>>> # Here is how the votes are counted.
... def count(votes): #NOTE: votes.shape is (citizens, names)
...     # Dwarves have a weird tradition: the top 10 popular names chosen
...     # by dwarves will get all their votes.
...     # (I guess this is what makes our objective non-convex.)
...     dwarf_votes = votes[:dwarves]
...     dwarf_votes_per_name_ = dwarf_votes.sum(1)
...     top_10_idx = np.argsort(-dwarf_votes_per_name_)[:10]
...     dwarf_votes_per_name = np.zeros(names)
...     dwarf_votes_per_name[top_10_idx] = dwarf_votes_per_name_[top_10_idx]
...     dwarf_votes_per_name = \
...         dwarf_votes.sum() * dwarf_votes_per_name / dwarf_votes_per_name.sum()
...     #NOTE: assert np.allclose(dwarf_votes_per_name.sum(), dwarf_votes_per_name_.sum())
...     # Humans, on the other hand, just add everyone's votes together.
...     human_votes = votes[dwarves:]
...     human_votes_per_name = human_votes.sum(0)
...     # The final result is the sum of the scores by dwarves and humans.
...     return dwarf_votes_per_name + human_votes_per_name #NOTE: shape = (names,)
... 
>>> # So this is the legitimate result.
... votes_per_name = count(votes)
>>> 
>>> # Now, you want to cheat a bit and make your proposal
... # (assuming it's the first one) the most popular one.
... votes_per_name_ = votes_per_name.copy()
>>> argmax = votes_per_name.argmax()
>>> votes_per_name_[argmax] = votes_per_name[0]
>>> votes_per_name_[0] = votes_per_name[argmax]
>>> #NOTE: assert np.allclose(votes_per_name.sum(), votes_per_name_.sum())
... 
>>> # However, you cannot just manipulate the result, otherwise the auditors
... # will find out! Instead, the raw scores must be manipulated such that your
... # submission ranks top among others.  You decide to solve for a multiplier
... # to the original scores.
... coef = np.ones_like(votes).reshape(-1)
>>> 
>>> # In other words, your goal is to find the minimum of the following objective.
... def loss(coef): #NOTE: coef.shape is (citizens*votes,)
...     votes_ = votes * coef.reshape(votes.shape)
...     votes_per_name_ = count(votes_)
...     return ((votes_per_name_ - votes_per_name)**2).sum()
... 
>>> 
>>> # This is a helper constant matrix. Ignore it for now.
... A = np.concatenate([np.tile(np.concatenate([np.repeat(1, names), np.repeat(0, citizens * names)]), citizens - 1), np.repeat(1, names)]).reshape((citizens, citizens * names))
>>> A_ = A.dot(np.diag(votes.reshape(-1)))
>>>

The code is identical with your question up to this point. Now, let's use some of mystic's tools.

First, let's put it in the context of the question you referenced.

>>> # Build constraints
... cons = my.symbolic.linear_symbolic(A_, votes_per_citizen)
>>> cons = my.symbolic.solve(cons) #NOTE: this may take a while...
>>> cons = my.symbolic.generate_constraint(my.symbolic.generate_solvers(cons))
>>> 
>>> def cons_(x): #NOTE: preserve x
...     return cons(x.copy())
... 
>>>

Now, as in the referenced question, we can use a simple solver like fmin_powell with constraints=cons_ to solve it. However, I've found that mystic can struggle a bit here as I'll demonstrate.

>>> bounds = [(0, 1000)] * coef.size
>>> x0 = np.random.randint(0, 1000, coef.shape).astype(float).tolist()
>>>
>>> stepmon = my.monitors.VerboseMonitor(1)
>>> rv = my.solvers.fmin_powell(loss, x0=x0, bounds=bounds, itermon=stepmon,
...                             constraints=cons_,
...                             disp=1, maxfun=20000, gtol=None, ftol=500)
Generation 0 has ChiSquare: inf
Generation 1 has ChiSquare: inf
Generation 2 has ChiSquare: inf
Generation 3 has ChiSquare: inf
Generation 4 has ChiSquare: inf
Generation 5 has ChiSquare: inf
Generation 6 has ChiSquare: inf
Generation 7 has ChiSquare: inf
...

I've killed the optimization here... when you see inf it means that mystic is struggling to solve the constraints.

The problem is that there should be additional constraints to help it out, namely that A_.dot(cons_(x0)) should be integers. Additionally, it'd be nice to better control the presence of negatives.

>>> np.round(A_.dot(cons_(x0)), 10) - votes_per_citizen
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0.])
>>> cons_(x0)[:5]
[-574537.1464429945, 98.0, 326.0, 114.0, 694.0]

Matter of fact, every citizens-th element is negative.

So, I tried using penalties instead.

>>> def penalty1(x):
...     return np.abs((np.array(x).reshape(citizens, names) * votes).sum(1) - votes_per_citizen).sum()
... 
>>> def penalty2(x):
...     return np.abs(A_.dot(x) - votes_per_citizen).sum()
... 
>>> def penalty3(x):
...     return np.abs((np.array(x).reshape(citizens, names) * votes).min(1).sum())
... 
>>> #FIXME: utilize that A_.dot(x) should be integer-valued
... 
>>> @my.penalty.quadratic_equality(penalty2)
... @my.penalty.quadratic_equality(penalty3)
... def penalty(x):
...     return 0.0
... 
>>> 
>>> bounds = [(0, 1000)] * coef.size
>>> x0 = np.random.randint(0, 1000, coef.shape).astype(float).tolist()
>>> 
>>> stepmon = my.monitors.VerboseMonitor(1)
>>> 
>>> rv = my.solvers.fmin_powell(loss, x0=x0, bounds=bounds, itermon=stepmon,
...                             penalty=penalty, # constraints=cons_,
...                             disp=1, maxfun=20000, gtol=None, ftol=500)
Generation 0 has ChiSquare: 4316466725165.325684
Generation 1 has ChiSquare: 97299808.307906
Generation 2 has ChiSquare: 1125.438322
Generation 3 has ChiSquare: 1116.735393
Warning: Maximum number of function evaluations has been exceeded.
STOP("EvaluationLimits with {'evaluations': 20000, 'generations': 500000}")
>>>

and that seems to work just fine (note I used 20000 evals instead of 200, and ftol to stop when loss is 500 instead of 1). rv is pretty close, and the optimization was fairly fast.

>>> penalty(rv)
4.979032041874226
>>> np.round(A_.dot(rv), 2) - votes_per_citizen
array([0.  , 0.22, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.  , 0.  ])
>>> 
>>> penalty(x0)
4313023244843.466
>>> np.round(A_.dot(x0), 2) - votes_per_citizen
array([ 6996.61,  5872.2 ,  6398.82,  7593.65,  9137.81, 10347.84,
        9204.44,  6160.77,  9626.64,  7572.4 , 10771.24,  8673.7 ,
       10212.18,  7581.45,  5097.13,  7631.2 ,  8274.92,  9684.09,
        9504.27,  9067.73,  7332.77, 10214.02,  8255.38,  9853.74,
        6613.19])

Here, A_.dot(rv) isn't as accurate (round is to 2 places instead of 10)... and would again benefit from a constraint making A_.dot(rv) integers.

I'll leave that to a future example.