I am trying to fit a GMM in sklearn and i see that the model converges at around epoch 3 but i cannot seems to access the log-likelihood score computed at each epoch.
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=4, tol=1e-8).fit(data)
Is there a way to do access the log-likelihood scores somehow for each epoch?
If you just want to look at the loglik scores, you can set verbose=2
to print the change in loglik and verbose_interval=1
to capture it at every step:
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=3, tol=1e-8,verbose=2,verbose_interval=1)
gmm.fit(data)
Initialization 0
Iteration 1 time lapse 0.00560s ll change inf
Iteration 2 time lapse 0.00134s ll change 0.03655
Iteration 3 time lapse 0.00119s ll change 0.00867
Iteration 4 time lapse 0.00118s ll change 0.00619
Iteration 5 time lapse 0.00116s ll change 0.00612
Iteration 6 time lapse 0.00125s ll change 0.00647
Iteration 7 time lapse 0.00128s ll change 0.00700
Iteration 8 time lapse 0.00127s ll change 0.00727
Iteration 9 time lapse 0.00126s ll change 0.00673
Iteration 10 time lapse 0.00117s ll change 0.00604
Iteration 11 time lapse 0.00109s ll change 0.00530
Iteration 12 time lapse 0.00125s ll change 0.00431
Iteration 13 time lapse 0.00121s ll change 0.00366
Iteration 14 time lapse 0.00123s ll change 0.00404
Iteration 15 time lapse 0.00130s ll change 0.00361
Iteration 16 time lapse 0.00118s ll change 0.00157
Iteration 17 time lapse 0.00124s ll change 0.00048
Iteration 18 time lapse 0.00126s ll change 0.00015
Iteration 19 time lapse 0.00115s ll change 0.00005
Iteration 20 time lapse 0.00116s ll change 0.00001
Iteration 21 time lapse 0.00124s ll change 0.00000
Iteration 22 time lapse 0.00122s ll change 0.00000
Iteration 23 time lapse 0.00142s ll change 0.00000
Iteration 24 time lapse 0.00126s ll change 0.00000
Iteration 25 time lapse 0.00124s ll change 0.00000
Iteration 26 time lapse 0.00122s ll change 0.00000
Iteration 27 time lapse 0.00120s ll change 0.00000
Initialization converged: True time lapse 0.03765s ll -1.20124
To actually capture this value, depending on what you are using, you either write it to a log using logging
, or for example below, in a jupyter notebook, this might work:
%%capture cap --no-stderr
gmm.fit(data)
Then we pass it into a dataframe and try to back calculate the likelihood:
res = pd.DataFrame([i.split() for i in cap.stdout.split("\n")]).iloc[:,[1,7]]
res.columns = ['iteration','change']
res.change = res.change.astype('float64')
res = res[np.isfinite(res.change)]
res['logLik'] = res['change'].values[-1]
res.loc[:len(res),['logLik']] = -res.change[:-1][::-1].cumsum()[::-1] + res.change.values[-1]
res
iteration change logLik
2 2 0.03655 -1.31546
3 3 0.00867 -1.27891
4 4 0.00619 -1.27024
5 5 0.00612 -1.26405
6 6 0.00647 -1.25793
7 7 0.00700 -1.25146
8 8 0.00727 -1.24446
9 9 0.00673 -1.23719
10 10 0.00604 -1.23046
11 11 0.00530 -1.22442
12 12 0.00431 -1.21912
13 13 0.00366 -1.21481
14 14 0.00404 -1.21115
15 15 0.00361 -1.20711
16 16 0.00157 -1.20350
17 17 0.00048 -1.20193
18 18 0.00015 -1.20145
19 19 0.00005 -1.20130
20 20 0.00001 -1.20125
21 21 0.00000 -1.20124
22 22 0.00000 -1.20124
23 23 0.00000 -1.20124
24 24 0.00000 -1.20124
25 25 0.00000 -1.20124
26 26 0.00000 -1.20124
27 27 0.00000 -1.20124
28 converged: -1.20124 -1.20124