Assume we have the following picture that contains a PCB defect ( called missing hole defect ) :
The defects list i need to identify in my project are :
For this purpose , i need to extract colors related to defects categories.
I know that using R , we can do :
library(colorfindr)
img_path="C:/Users/Rayane_2/Desktop/Data/PCB1/PCB/images/Mouse_bite/01_mouse_bite_04.jpg"
colorfindr::get_colors(img_path,top_n=20)
# A tibble: 20 × 3
col_hex col_freq col_share
<chr> <int> <dbl>
1 #005B0C 31106 0.00646
2 #005B0E 29117 0.00605
3 #01590B 27768 0.00577
4 #005A0B 24135 0.00502
5 #015C0D 23771 0.00494
6 #00580A 22397 0.00465
7 #005B0B 21529 0.00447
8 #00560F 21476 0.00446
9 #005A0D 21324 0.00443
10 #025A0C 21191 0.00440
11 #01590D 21026 0.00437
12 #005709 20009 0.00416
13 #00580E 19063 0.00396
14 #005909 18666 0.00388
15 #015C0F 18450 0.00383
16 #025A0E 17979 0.00374
17 #015710 16621 0.00345
18 #01590F 16619 0.00345
19 #00580C 16546 0.00344
20 #005A0A 15614 0.00324
From the defects type picture , i see there are 3 colors that allows to distinct those defects.
I need to identify those 3 and extract from tibble dataset.
The problem is interesting and we can try different features, starting from manual, handcrafted (from simple to complex features and different machine learning models) to automatically extracted features (e.g., with deep learning deep neural net models).
Let's try a very simple feature based on colors only - the feature we shall use will be color cluster proportion.
We shall first cluster the image RGB color values into k
groups (e.g., k=3
) using kmeans
clustering algorithm and obtain k
color cluster centers using the function get.color.clusters()
, as shown below (we need to extract red, green, blue values from the hex color values).
Then we shall use the kmeans
model to predict the color cluster each pixel of an image belongs to and then compute the proportion of pixels in an image belonging to a color cluster as features (hence we shall have k
features). Hence, our data frame will look like the following for k=3
clusters:
cluster1 cluster2 cluster3 class (label)
image1 0.6 0.3 0.1 missing holes
which means we have 60%, 30% and 10% pixels belonging to cluster 1, 2 and 3, respectively, for the missing hole image1.
Now this dataset will be used to train a (binary) classifier and classifier will do a descent job if our assumption that the color cluster proportions for the same defect class has similar pattern.
Here are the two sets of images we shall use for only 2 classes:
Now, let's extract the color cluster proportion features and try SVM classifier with RBF kernel for the classification and prediction of the defect classes.
find_cluster_kmeans <- function(cl, x) { # predict the color cluster a pixel belongs to
return (which.min(apply(cl$centers, 1, function(y) sum((y-x)^2))))
}
extract.color.features <- function(img_path, cl) {
col_df <- colorfindr::get_colors(img_path, top_n=20)
cols <- as.data.frame(t(do.call(rbind, lapply(col_df['col_hex'], col2rgb))))
col_cluster <- apply(cols, 1, function(x) find_cluster_kmeans(cl, x))
col_df <- cbind(col_df, cols, col_cluster=col_cluster)
col_df <- col_df[c('col_cluster', 'col_share')]
df_feat <- aggregate(col_df$col_share, list(col_df$col_cluster), FUN=sum) # group by color clusters and sum proportions
names(df_feat) <- c('col_clust', 'prop')
for (i in 1:(nrow(cl$centers))) { # ensure that all color clusters are present
if (nrow(df_feat[df_feat$col_clust == i,]) == 0) {
df_feat <- rbind(df_feat, data.frame(col_clust=i, prop=0))
}
}
df_feat$prop <- df_feat$prop / sum(df_feat$prop) # normalize
return(df_feat)
}
get.color.clusters <- function(k=3, top_n=50) {
col_df <- NULL
for (folder in c('missing_hole', 'Mouse_bite')) {
img_path <- list.files(folder,".png", full.names = T)
cdf <- do.call(rbind, lapply(img_path, function(p) colorfindr::get_colors(p,top_n=top_n)))
col_df <- rbind(col_df, cdf)
}
cols <- as.data.frame(t(do.call(rbind, lapply(col_df['col_hex'], col2rgb))))
cl <- kmeans(cols, k)
#print(cl$center)
return (cl)
}
library(colorfindr)
set.seed(12)
k <- 3 # 3 color clusters
cl <- get.color.clusters(k)
df <- NULL
for (cls in c('missing_hole', 'Mouse_bite')) {
img_path <- list.files(cls,".png", full.names = T)
df_feat <- NULL
for (img in img_path) {
#print(img)
df_feat <- rbind(df_feat, extract.color.features(img, cl)$prop)
}
df_feat <- as.data.frame(df_feat)
df_feat$class <- cls
df <- rbind(df, df_feat)
}
names(df)[1:k] <- paste0('cluster', 1:k)
df$class <- as.factor(df$class)
df # each row corrspeonds to an image and each column to a color cluster
# cluster1 cluster2 cluster3 class
#1 0.318473896 0.68152610 0.00000000 missing_hole
#2 0.984514797 0.01548520 0.00000000 missing_hole
#3 0.967479675 0.03252033 0.00000000 missing_hole
#4 0.010911326 0.80282772 0.18626095 Mouse_bite
#5 0.008364049 0.96257443 0.02906153 Mouse_bite
#6 0.446066380 0.55393362 0.00000000 Mouse_bite
library(e1071)
svmfit = svm(class ~ ., data = df, kernel = "radial", cost = 1, scale = FALSE, type='C')
#print(svmfit)
plot(svmfit, df, cluster1 ~ cluster2, fill=TRUE, alpha=0.2)
df$prdicted <- predict(svmfit, df)
df
# cluster1 cluster2 cluster3 class prdicted
#1 0.318473896 0.68152610 0.00000000 missing_hole Mouse_bite
#2 0.984514797 0.01548520 0.00000000 missing_hole missing_hole
#3 0.967479675 0.03252033 0.00000000 missing_hole missing_hole
#4 0.010911326 0.80282772 0.18626095 Mouse_bite Mouse_bite
#5 0.008364049 0.96257443 0.02906153 Mouse_bite Mouse_bite
#6 0.446066380 0.55393362 0.00000000 Mouse_bite Mouse_bite
Ideally we should train on a proportion of dataset and evaluate the classifier on a held-out dataset to achieve generalizability.
Now the color cluster proportion feature is quite naive and is likely not preform that good, then you can try to extract shape features and features like HOG, SIFT, SURF, BRISK, BRIEF and use the corresponding descriptors as feature vectors for the ML classifiers.
Finally, in order to get the best performance we can use deep neural nets to enable automatic feature generation at different layers, but in this case we need to have reasonably large number of training images (increase training dataset size with data augmentation) or use transfer learning on top of some standard pretrained network (e.g., Vgg-16 or ResNet-150).