I'm fitting a LDA model with lots of data using scikit-learn. Relevant code piece looks like this:
lda = LatentDirichletAllocation(n_topics = n_topics,
max_iter = iters,
learning_method = 'online',
learning_offset = offset,
random_state = 0,
evaluate_every = 5,
n_jobs = 3,
verbose = 0)
lda.fit(X)
(I guess the only possibly relevant detail here is that I'm using multiple jobs.)
After some time I'm getting "No space left on device" error, even though there is plenty of space on the disk and plenty of free memory. I tried the same code several times, on two different computers (on my local machine and on a remote server), first using python3, then using python2, and each time I ended up with the same error.
If I run the same code on a smaller sample of data everything works fine.
The entire stack trace:
Failed to save <type 'numpy.ndarray'> to .npy file:
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 271, in save
obj, filename = self._write_array(obj, filename)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 231, in _write_array
self.np.save(filename, array)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 491, in save
pickle_kwargs=pickle_kwargs)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 584, in write_array
array.tofile(fp)
IOError: 275500 requested and 210934 written
IOErrorTraceback (most recent call last)
<ipython-input-7-6af7e7c9845f> in <module>()
7 n_jobs = 3,
8 verbose = 0)
----> 9 lda.fit(X)
/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/decomposition/online_lda.pyc in fit(self, X, y)
509 for idx_slice in gen_batches(n_samples, batch_size):
510 self._em_step(X[idx_slice, :], total_samples=n_samples,
--> 511 batch_update=False, parallel=parallel)
512 else:
513 # batch update
/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/decomposition/online_lda.pyc in _em_step(self, X, total_samples, batch_update, parallel)
403 # E-step
404 _, suff_stats = self._e_step(X, cal_sstats=True, random_init=True,
--> 405 parallel=parallel)
406
407 # M-step
/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/decomposition/online_lda.pyc in _e_step(self, X, cal_sstats, random_init, parallel)
356 self.mean_change_tol, cal_sstats,
357 random_state)
--> 358 for idx_slice in gen_even_slices(X.shape[0], n_jobs))
359
360 # merge result
/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
808 # consumption.
809 self._iterating = False
--> 810 self.retrieve()
811 # Make sure that we get a last message telling us we are done
812 elapsed_time = time.time() - self._start_time
/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in retrieve(self)
725 job = self._jobs.pop(0)
726 try:
--> 727 self._output.extend(job.get())
728 except tuple(self.exceptions) as exception:
729 # Stop dispatching any new job in the async callback thread
/home/ubuntu/anaconda2/lib/python2.7/multiprocessing/pool.pyc in get(self, timeout)
565 return self._value
566 else:
--> 567 raise self._value
568
569 def _set(self, i, obj):
IOError: [Errno 28] No space left on device
Had the same problem with LatentDirichletAllocation
. It seems, that your are running out of shared memory (/dev/shm
when you run df -h
). Try setting JOBLIB_TEMP_FOLDER
environment variable to something different: e.g., to /tmp
. In my case it has solved the problem.
Or just increase the size of the shared memory, if you have the appropriate rights for the machine you are training the LDA on.