scikit-learn histogram svm scikit-image lbph-algorithm

Lenght of histogram is differ in case

I'm running LBP algorithm to classify images by their texture features. Classifying method is LinearSVC in sklearn.svm package.

Getting histogram and fitting by SVM is done, but sometimes length of histogram varies depending on image.

Example is below:

from skimage import feature
from scipy.stats import itemfreq
from sklearn.svm import LinearSVC
import numpy as np
import cv2
import cvutils
import csv
import os

def __get_hist(image, radius):
    NumPoint = radius*8
    lbp = feature.local_binary_pattern(image, NumPoint, radius, method="uniform")
    x = itemfreq(lbp.ravel())
    hist = x[:,1]/sum(x[:,1])
    return hist
def get_trainHist_list(train_txt):
    train_dic = {}
    with open(train_txt, 'r') as csvfile:
        reader = csv.reader(csvfile, delimiter = ' ')
        for row in reader:
            train_dic[row[0]] = int(row[1])

    hist_list=[]
    key_list=[]
    label_list=[]
    for key, label in train_dic.items():
        img = cv2.imread("D:/Python36/images/texture/%s" %key, cv2.IMREAD_GRAYSCALE)
        key_list.append(key)
        label_list.append(label)
        hist_list.append(__get_hist(img,3))
    bundle = [np.array(key_list), np.array(label_list), np.array(hist_list)]
    return bundle

train_txt = 'D:/Python36/images/class_train.txt'
train_hist = get_trainHist_list(train_txt)
model = LinearSVC(C=100.0, random_state=42)
model.fit(train_hist[2], train_hist[1])
for i in train_hist[2]:
    print(len(i))

test_img = cv2.imread("D:/Python36/images/texture_test/flat-3.png", cv2.IMREAD_GRAYSCALE)
hist= np.array(__get_hist(test_img, 3))
print(len(hist))
prediction = model.predict([hist])
print(prediction)

result

26
26
26
26
26
26
25
Traceback (most recent call last):
  File "D:\Python36\texture.py", line 44, in <module>
    prediction = model.predict([hist])
  File "D:\Python36\lib\site-packages\sklearn\linear_model\base.py", line 324, in predict
    scores = self.decision_function(X)
  File "D:\Python36\lib\site-packages\sklearn\linear_model\base.py", line 305, in decision_function
    % (X.shape[1], n_features))
ValueError: X has 25 features per sample; expecting 26

As you can see, length of histogram for training images is all 26, but test_img's is 25. For this reason, predict in SVM doesn't work.

I guess test_img has empty parts in the histogram, and that empty parts could have skipped. (I'm not sure)

Someone have idea to fix it?

Solution

There are 59 different uniform LBPs for a neighbourhood of 8 points. This should be the dimension of your feature vectors, but it is not because you used itemfreq to compute the histograms (as a side note, itemfreq is deprecated). The length of the histograms obtained throug itemfreq is the number of different uniform LBPs in the image. If some uniform LBPs are not present in the image the number of bins of the resulting histogram will be lower than 59. This issue can be easily fixed by utilizing bincount as demonstrated in the toy example below:

import numpy as np
from skimage import feature
from scipy.stats import itemfreq

lbp = np.array([[0, 0, 0, 0],
                [1, 1, 1, 1],
                [8, 8, 9, 9]])

hi = itemfreq(lbp.ravel())[:, 1]  # wrong approach
hb = np.bincount(lbp.ravel(), minlength=59)  # proposed method

The output looks like this:

In [815]: hi
Out[815]: array([4, 4, 2, 2], dtype=int64)

In [816]: hb
Out[816]: 
array([4, 4, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0], dtype=int64)