[SOLVED] lightfm error: Not all estimated parameters are finite, your model may have diverged

I'm running this very simple code:

def csr_values_analysis(values):
   num_zeros = 0
   num_ones = 0
   num_other = 0

   for v in values:
       if v == 0:
           num_zeros += 1
       elif v == 1:
           num_ones += 1
       else:
           num_other += 1

   return num_zeros, num_ones, num_other


print("Reading user_features.npz")
with open("/path/to/user_features.npz", "rb") as in_file:
    user_features_csr = sp.load_npz(in_file)
    print("User features read, shape: {}".format(user_features_csr.shape))
    print("Data values analysis: zeros: %i, ones: %i, other: %i" % csr_values_analysis(user_features_csr.data))

print("Reading item_features.npz")
with open("/path/to/item_features.npz", "rb") as in_file:
    item_features_csr = sp.load_npz(in_file)
    print("Item features read, shape: {}".format(item_features_csr.shape))
    print("Data values analysis: zeros: %i, ones: %i, other: %i" % csr_values_analysis(item_features_csr.data))

print("Reading interactions.npz")
with open("/path/to/interactions.npz", "rb") as in_file:
    interactions_csr = sp.load_npz(in_file)
    print("Interactions read, shape: {}".format(interactions_csr.shape))
    print("Data values analysis: zeros: %i, ones: %i, other: %i" % csr_values_analysis(interactions_csr.data))
    interactions_coo = interactions_csr.tocoo()

# Run lightfm

print("Running lightfm...")
model = LightFM(loss='warp')
model.fit(interactions_coo, user_features=user_features_csr, item_features=item_features_csr, epochs=20, num_threads=2, verbose=True)

With the following output:

Reading user_features.npz
User features read, shape: (827568, 105)
Data values analysis: zeros: 0, ones: 3153032, other: 0
Reading item_features.npz
Item features read, shape: (67339359, 36)
Data values analysis: zeros: 0, ones: 25259081, other: 0
Reading interactions.npz
Interactions read, shape: (827568, 67339359)
Data values analysis: zeros: 0, ones: 172388, other: 0
Running lightfm...
Epoch 0
Traceback (most recent call last):
  File "training.py", line 92, in <module>
    model.fit(interactions_coo, user_features=user_features_csr, item_features=item_features_csr, epochs=20, num_threads=2, verbose=True)
  File "/usr/lib64/python3.6/site-packages/lightfm/lightfm.py", line 479, in fit
    verbose=verbose)
  File "/usr/lib64/python3.6/site-packages/lightfm/lightfm.py", line 578, in fit_partial
    self._check_finite()
  File "/usr/lib64/python3.6/site-packages/lightfm/lightfm.py", line 413, in _check_finite
    raise ValueError("Not all estimated parameters are finite,"
ValueError: Not all estimated parameters are finite, your model may have diverged. Try decreasing the learning rate or normalising feature values and sample weights

All my Scipy sparse matrices are normalized (i.e. the values are 0 or 1).

I've tried to change the learning schedule and the learning rate with no results.

I've checked this only occurs when I add the item features to the equation. There is no error when running lightfm only with interactions, or intereactions + user features.

AFAIK, I've installed the latest version:

$ pip freeze | grep lightfm
lightfm==1.15

Any idea? Thanks!

UPDATE 1

I was wondering if my sparse matrices were too much sparse... Nevertheless, I've tried with extremely very little shapes, and the same error arises:

>>> import scipy.sparse as sp
>>> import numpy as np
>>> import lightfm
>>> uf_row = np.array([2,4,9])
>>> uf_col = np.array([4,9,3])
>>> uf_data = np.array([1,1,1])
>>> if_row = np.array([0,3])
>>> if_col = np.array([9,7])
>>> if_data = np.array([1,1])
>>> i_row = np.array([1])
>>> i_col = np.array([8])
>>> i_data = np.array([1])
>>> uf_csr = sp.csr_matrix((uf_data, (uf_row, uf_col)), shape=(10, 10))
>>> if_csr = sp.csr_matrix((if_data, (if_row, if_col)), shape=(10, 10))
>>> i_csr = sp.csr_matrix((i_data, (i_row, i_col)), shape=(10, 10))
>>> model = lightfm.LightFM(loss='warp')
>>> model.fit(i_csr.tocoo(), user_features=uf_csr, item_features=if_csr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/site-packages/lightfm/lightfm.py", line 479, in fit
    verbose=verbose)
  File "/usr/lib64/python3.6/site-packages/lightfm/lightfm.py", line 578, in fit_partial
    self._check_finite()
  File "/usr/lib64/python3.6/site-packages/lightfm/lightfm.py", line 413, in _check_finite
    raise ValueError("Not all estimated parameters are finite,"
ValueError: Not all estimated parameters are finite, your model may have diverged. Try decreasing the learning rate or normalising feature values and sample weights

Definitely, I'm doing something wrong...

UPDATE 2

I think I found the problem... I did the following experiment:

>>> uf_csr = sp.csr_matrix((np.array([1]),(np.array([0]), np.array([0]))),shape=(20,20))
>>> if_csr = sp.csr_matrix((np.array([1]),(np.array([0]), np.array([0]))),shape=(20,20))
>>> i_csr = sp.csr_matrix((np.array([1]),(np.array([1]), np.array([1]))),shape=(20,20))
>>> model = lightfm.LightFM(loss='warp')
>>> model.fit(i_csr.tocoo(), user_features=uf_csr, item_features=if_csr, epochs=20)
Traceback (most recent call last):
  ...
ValueError: Not all estimated parameters are finite, your model may have diverged. Try decreasing the learning rate or normalising feature values and sample weights

I.e. I had the Exception as usual. Now, if you observe the interaction matrix, it has an interaction regarding a user and an item having all their features set to 0 in the user and item feature matrices, respectively. So, let's change this, in the user features matrix, for instance:

>>> uf_csr = sp.csr_matrix((np.array([1,1]),(np.array([0,1]), np.array([0,0]))),shape=(20,20))
>>> model.fit(i_csr.tocoo(), user_features=uf_csr, item_features=if_csr, epochs=20)
<lightfm.lightfm.LightFM object at 0x7f2d39ea3490>

Et voilà!

We can do the same with the item features matrix:

>>> uf_csr = sp.csr_matrix((np.array([1]),(np.array([0]), np.array([0]))),shape=(20,20))
>>> if_csr = sp.csr_matrix((np.array([1,1]),(np.array([0,1]), np.array([0,0]))),shape=(20,20))
>>> model.fit(i_csr.tocoo(), user_features=uf_csr, item_features=if_csr, epochs=20)
<lightfm.lightfm.LightFM object at 0x7f2d39ea3490>

So, I'll try to find a way of filtering interactions related to all-zero user and item features and I'll post it ;)