I'm trying to create a model in CPLEX OPL Studio for clustering with an additive criterion, but I have a number of errors that I don't know how to fix correctly, because I'm very bad at OPL Studio Initially there was such a loss function to calculate the deviation from the cluster center Next, I substituted the values into the general loss function and as a result I get the following formula There is also a formula for calculating the center of clusters
` // Number of clients, number of features, and number of clusters
int n = ...; // Number of clients
int m = ...; // Number of features
int k = ...; // Number of clusters
// Client data: feature values for each client
float data[i in 1..n][j in 1..m] = ...;
// Binary variables: x[i][c] = 1 if client i is assigned to cluster c
dvar boolean x[1..n][1..k];
// Variables for the center of each cluster for each feature
dvar float mu[1..k][1..m];
// Model
minimize
sum(c in 1..k, i in 1..n, j in 1..m) x[i][c] * (data[i][j] - mu[c][j])^2;
// Constraints
subject to {
// Each client belongs to exactly one cluster
forall(i in 1..n)
sum(c in 1..k) x[i][c] == 1;
// Definition of cluster centers
forall(c in 1..k, j in 1..m)
mu[c][j] == sum(i in 1..n) x[i][c] * data[i][j] / sum(i in 1..n) x[i][c];
}`
I tried to write code for the following formulas, but ran into syntax problems. For example, like this: CPLEX (default) failed to parse expression: forall(c in 1..3, j in 1..4) mu[c][j] == sum(i in 1..5) (x[ i][c]*data[i][j]) / (sum(i in 1..5) x[i][c]) It might be worth adding more restrictions, but I'm a little confused
Within CPLEX I would rather use the Constraint Programming algorithm.
using CP;
// Number of clients, number of features, and number of clusters
int n = 3; // Number of clients
int m = 4; // Number of features
int k = 2; // Number of clusters
// Client data: feature values for each client
float data[i in 1..n][j in 1..m] = i*j;
// Binary variables: x[i][c] = 1 if client i is assigned to cluster c
dvar boolean x[1..n][1..k];
// Variables for the center of each cluster for each feature
dexpr float mu[c in 1..k][j in 1..m]=
sum(i in 1..n) x[i][c] * data[i][j] / sum(i in 1..n) x[i][c];
// Model
minimize
sum(c in 1..k, i in 1..n, j in 1..m) x[i][c] * (data[i][j] - mu[c][j])^2;
// Constraints
subject to {
// Each client belongs to exactly one cluster
forall(i in 1..n)
sum(c in 1..k) x[i][c] == 1;
// Definition of cluster centers
forall(c in 1..k, j in 1..m)
mu[c][j] == sum(i in 1..n) x[i][c] * data[i][j] / sum(i in 1..n) x[i][c];
}
works fine
Or if you use a better formulation
using CP;
// Number of clients, number of features, and number of clusters
int n = 3; // Number of clients
int m = 4; // Number of features
int k = 2; // Number of clusters
// Client data: feature values for each client
float data[i in 1..n][j in 1..m] = i*j;
// Which cluster x[i]
dvar int x[1..n] in 1..k;
// Variables for the center of each cluster for each feature
dexpr float mu[c in 1..k][j in 1..m]=
sum(i in 1..n) (x[i]==c) * data[i][j] / sum(i in 1..n) (x[i]==c);
// Model
minimize
sum(c in 1..k, i in 1..n, j in 1..m) (x[i]==c) * (data[i][j] - mu[c][j])^2;
// Constraints
subject to {
// Definition of cluster centers
forall(c in 1..k, j in 1..m)
mu[c][j] == sum(i in 1..n) (x[i]==c) * data[i][j] / sum(i in 1..n) (x[i]==c);
}
See https://github.com/AlexFleischerParis/opltipsandtricks/blob/master/kmeans.mod