I abstracted a real-world problem to a graph-theory-problem. Currently, I try to solve this problem by making use of a clique-algorithm from networkx library. I need ideas how my algorithm can be adapted to improve run-time performance.
Problem:
Let G be a undirected, colored graph and let R be a mapping, where each color is mapped to a given quantity, e.g.
R = {"red": 3, "blue": 1, "green": 1, "magenta": 1, "orange": 1}
I need to find a set of nodes V, with following properties (or determine that no such set exists):
For each (color,amount) in R, we need at least the amount of colored nodes in V.
(with above R for example, we need at least three red nodes, one blue node, etc. in V)
The nodes in V are not allowed to be neighbors in G.
Simple Example:
With this colored graph and the above given requirement R,
one possible solution is: V = {0, 1, 2, 8, 13, 15, 16}
, because all the color-amounts required by R are contained in V and no node is adjacent to the other.
Current Approach:
import networkx as nx
def find_result(G, R):
CG = nx.complement(G)
for V in nx.find_cliques(CG):
color_count = get_color_count(G, V)
if all([(color_count.get(color, 0) >= R[color]) for color in R.keys()]):
return V
# Helper method to count colors
def get_color_count(G, V):
colors = [G.nodes[v]["color"] for v in V]
count = dict()
for color in colors:
count[color] = count.get(color, 0) + 1
return count
Performance Issues:
With this trivial approach, I run into performance problems (i.e. the algorithm takes way too long for particular configurations of R). My graphs can get as worse as seen below, with 282 nodes, 4640 edges and 7 different colors.
So here is my question: how can the networkx-clique-algorithm be customized to make use of color information, clique-size, etc.? I'm glad for any ideas/hints regarding improvements or new approaches.
EDIT 1
Here is the graph-content (sparse6-format) from the big graph above. The content can be pasted into a file and then read with networkx.
>>sparse6<<:~?CY_A??@_??OA_??OA?M??@?G?oC_??OA?K@?D_??OA?K@?D?Y??@?G?oC?S@_F_??OA?K@?D?W@oG_??OA?K@?D?W@oG?e??@?G?oC?S@_F?_AOI`CCGO@ED?R`WBOT`_B_V`gEW[@mEo[`qF?]`}DWV`eGwYAKHG[@wFw\AYI_??C?_B?O@OE?[A?H?gAwk???OA?K@?D?W@oG?cA_J_??OA?K@?D?W@oG?cA_Ja{FOf_??OA?K@?D?W@oG?cA_J_??OA?K@?D?W@oG?cA_JbO??@?G?oC?S@_F?_AOI?mM?__??OA?K@?D?W@oG?cA_JbkBw|AQB_Wc?Nw~CANp?CE??@?G?oC?UPOK?sLwK?sLpD_oBOvB{OPDCYB?LB[O@ACSP`F_oBPDCWPpG_oBPDCWPpGCeB?LB{OPDCWPpGCcQgK?sO@ACSP`FC_QPICmB?LB{O@DCWPpGCcQ`JCqB?LB{O@DCWPpGCcQ`JCoRWK?sNp?CCPPEC[Q@HCgQpKCsRgK?sNp?CGPPEC[Q@HCgQpKCsR`N_oBP@CGPPEC[Q@HCgQpKCsR`NDAB?LCCO`DCWPpGCcQ`JCoRPMC{S@P_oBO~CCO`DCWPpGCcQ`JCoRPMC{S@PDIB?LC?OPACSP`FC_QPICkR@LCwRpODCS`RdWTW~C?RPMC{S@TDYNp@C[QpNDKTPUD]TPUD[UHTDWTpWDeNp?CsR`ND?TPUD[U@XDiNp@C[QpNDKTPUD[U@XDgUwq_??OA?K@?D?W@oG?cA_JBCK`\bGVP]dsV`^_??OA?K@?D?W@oG?cA_JBCVP]D{WH\DwVp_EE??@?G?oC?SOw??C?_B?O@PBEM??@?G?oC?SOpbEQOpbEOXXbEOXPeeKX@dEWXxBEKX@dEWXpgeKX@dEWXpgEeWpcESX`fE_YPi_??OA?K@?D?W@oG?cA_JBKLG??C?_B?O@OE?[A?H?gAosEq??@?G?oC?S@_F?_AOI?kL@dE_YpkEu??@?G?oC?S@_F?_AOI?kL@kEsZg??C?_B?O@OE?[A?H?gAosC?O`GCoS@SEoZPmE}KpkEsZ`nFAZ@lEwZpoFEXPgEkZ@lEwZpoFC[hkEsZ`nF?[PqFMO@AC_R@ODOZ@lEwZpoFC[`rFQKphEgYpkEsZ`nF?[PqFK\@tecY`jEoZPmE{[@pFG[psFS\hdE_YPiEkZ@lEwZpoFC[`rFO\PuF]YPiEkZ@lEwZpoFC[`rFO\PuF[]H?CGQ@KD?T@hEgYpkEsZ`nF?[PqFK\@tFW\pwFeKpkEsZ`nF?[PqFK\@tFW\pwFc]hkEsZ`nF?[PqFK\@tFW\pwFc]`zeSY@jEoZPmE{[@pFG[psFS\`vF_]PyFk^HkEsZ`nF?[PqFK\@tFW\pwFc]`zFo^X?CGQ@KD?T@kEsZ`nF?[PqFK\@tFW\pwFc]`zFo^P}bKOPADCS`RDOZ@lEwZpoFC[`rFO\PuF[]@xFg]p{Fs^`~cCO`PDGSpSEoZPmE{[@pFG[psFS\`vF_]PyFk^@|Fw^q?cCO`PDGSpSESY@jEoZPmE{[@pFG[psFS\`vF_]PyFk^@|Fw^q?GEOPADCS`RDOZ@lEwZpoFC[`rFO\PuF[]@xFg]p{Fs^`~G?_QAc?OPAC_R@ODCS`RDOZ@lEwZpoFC[`rFO\PuF[]@xFg]p{Fs^`~G?_QAGM??@?G?oE?[A?H_??OA?W@oGGU??@?G@_F?_Np?CsR`ND?TpZGS`g??K@_HGS`aFgS`aFGaNp?CsR`ND?TpZGS`aFG_aW??K@_HGS`aFG_aQIgS`aFG_aQIGmNp?CsR`ND?TpZGS`aFG_aQIGkbIN?G@OG?kbiMG}?_D?_AqMG{cGM?{E?zByB_WBwcgM@_NaQHMB_WBwcaRHQBozHGcqSHUcaRHOdQUhGcqSHSdaVhGcqSHSdaVHaBozCOcaRHOdQUH[eAXcOcaRHOdQUH[eAXHiPAQHKdATHWdqWHceaZcOcaRHOdQUH[eAXHgeq[_{MqQHKdATHWdqWHceaZHofYQHKdATHWdqWHceaZHofQ]hGcqSHSdaVH_eQYHkfA\HwfyQHKdATHWdqWHceaZHofQ]H{gIbEs[PqFK\@tF[^A@e{[PqFK\@tFc^aBIMO@AC_R@ODO[@pFG[psFS]`~GOgqces[`vFk^@|Fw^q@IKhAde{\@xFk^@|Fw^qBIKhAdIYO@AC_R@ODO[@tFg]p{Fs^`~GOgqcIShafcCO`PDGSpSEs[`vFo_A@GG_qCIKhAdIWhqgcCO`PDGSpSE{\@xFw_A@GG_qCIKhAdIWhqgIeO@@CGQ@KD?SPQDKT@oFS]`~G?_QAGK`AbIOhQeI[iAhIi??@?G?oE?[A?HGS`aFG_aw??C?_E?[A?~C?RPMC{S@VDk`QEG[aaLIq??B?WAQDG_aqKGsjAlb{O@LCwRpOD[UqFGgaqKGsjAlIy??@?G?oC?S@_F?_AOI?kIOkBY??@?G?oC?S@_F?_AOI?kIOkJA??@?G?oC?S@_F?_AOI?kIOkHwfq_ICkApacLaoJCkgE?[A?H?gAohJ?kQqJMIQ]H{gA`J?kQqJKlGhBWaqKGsjanJ?kQqJKlAt_W@oG?cA_JAcaqKGsjanJ?kQqJKlAtJYIQJGobQ]H{gA`IwjqoJCkarJOlQuJ]??@?G?oC?S@_F?_AOI?kJ?uH?cQoJCkarJOlQuJ[mG??C?_B?O@OE?[A?H?gAokH?cQoJCkarJOlQuJ[mAx_??OA?K@?D?W@oG?cA_JAocAPHwfq_ICkApJGkqsJSlavJ_mQybWcAPJ?kQqJKlAtJWlqwJcmaz_W@oG?cA_JH?cQoJCkarJOlQuJ[mAxJgmq{h?cQ]H{gA`J?kQqJKlAtJWlqwJcmazJonWuGkbALH?cQmI{kApJGkqsJSlavJ_mQyJknA|Jy@_F?_AOI?kaqKGscAPIwjqoJCkarJOlQuJ[mAxJgmq{Jsna~gkbALH?cQ]H{gA`IwjqoJCkarJOlQuJ[mAxJgmq{Jsna~KA??@?G?oC?S@_F?_AOI?kJ?uHSeQ\ICkApJGkqsJSlavJ_mQyJknA|Jwnr?KE??@?G?oC?S@_F?_AOI?kJATHcfQ`J?kQqJKlAtJWlqwJcmazJonQ}J{oB@KI??@?G?oC?S@_F?_AOI?kJATHcfQ]H{gA`J?kQqJKlAtJWlqwJcmazJonQ}J{oB@KGowuHSeQ\ICkApJGkqsJSlavJ_mQyJknA|Jwnr?KCobBKQ@_F?_AOI?kdQXHsgQoJCkarJOlQuJ[mAxJgmq{Jsna~K?oRAKKpBDhSeQ\Hwfq_ICkApJGkqsJSlavJ_mQyJknA|Jwnr?KCobBKOpREbWaqKGsdQXHsgQmI{kApJGkqsJSlavJ_mQyJknA|Jwnr?KCobBKOpREK]@_F?_AOI?kaqKGsdQXHsgQmI{kApJGkqsJSlavJ_mQyJknA|Jwnr?KCobBKOpREK[qIJGobQTHcfQ]H{gA`IwjqoJCkarJOlQuJ[mAxJgmq{Jsna~K?oRAKKpBDKWprGKe??@?G?oC?S@_F?_AOI?kG?oB_OpbEWYW??C?_B?O@OE?[A?H?gAo_B?MADG_aqkIwqw??C?_B?O@OE?[A?H?gAo_B?M@UDgqrK_??OA?K@?D?W@oG?cA_JA?K?wEs[`vFo_QbIWiRJKorW??C?_B?O@OE?[A?H?gAo_B?MBJKorRM_??OA?K@?D?W@oG?cA_JA?M?xCKWpeEcqrKKsrbN_??OA?K@?D?W@oG?cA_JA?M?xGSaAJIojbJKorRMK{sG??C?_B?O@OE?[A?H?gAo_B_MPUDgqrKKsrbNL?sW??C?_B?O@OE?[A?H?gAo_B_MPlFG\p{GCgqeIcqrKKsrbNL?sRQ_??OA?K@?D?W@oG?cA_JA?M?xKkrBLKwrrOLCsbRa?M@BDST`VD_WpeEcqrKKsrbNL?sRQLKtG??K@_HA?M@TDWTpWGSaAJIojbJKorRMK{sBPLGsrSLUG?wDST`VD_UbJKorRMK{sBPLGsrSLStg_B_TPUD[U@lFG\p{GCgqeIcqrKKsrbNL?sRQLKtBTLWtw_B_TPUD[UBJKorRMK{sBPLGsrSLStbVLa??@?G?oC?S@_F?_AOI?kK@BDSUPbEWYRJKorRMK{sBPLGsrSLStbVL_uW??C?_B?O@OE?[A?H?gAooDSUQDG_aqkIwqrKKsrbNL?sRQLKtBTLWtrWLcug??C?_B?O@OE?[A?H?gAooDST`XDgqrKKsrbNL?sRQLKtBTLWtrWLcubZ_??OA?K@?D?W@oG?cA_JB?TPXEs[`vFo_QbIWiRJKorRMK{sBPLGsrSLStbVL_uRYLkvG??C?_B?O@OE?[A?H?gAooDSURJKorRMK{sBPLGsrSLStbVL_uRYLkvB\_??OA?K@?D?W@oG?cA_JBcOpTDcWpeEcqrKKsrbNL?sRQLKtBTLWtrWLcubZLovR]_??OA?K@?D?W@oG?cA_JBcTPXGSaAJIojbJKorRMK{sBPLGsrSLStbVL_uRYLkvB\Lwvw??C?_B?O@OE?[A?H?gAoxDST`XDgqrKKsrbNL?sRQLKtBTLWtrWLcubZLovR]L{wG??C?_B?O@OE?[A?H?gAoxDSUPlFG\p{GCgqeIcqrKKsrbNL?sRQLKtBTLWtrWLcubZLovR]L{wB`_??OA?K@?D?W@oG?cA_JBcTPXKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRacKTPUD[U@XEKX`hKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRaMM??B?WAPTDWTpWDc`QGGkjAmKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRaMKxHTDWTpWDcUbJKorRMK{sBPLGsrSLStbVL_uRYLkvB\Lwvr_MCwbbMOxXTDWTpWDcZPqF[^A@IKhahKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRaMKxBdMYTPUD[U@XKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRaMKxBdMWxxdE_YPiEkZ`rFW\pwFc]`|GIYPiEk\`vF_]PyHwfq_ICkatJ_mq}KCpBFKgyXdE_YPiEkZ`rFW\pwFc]`|GG`aHGoyRiecY`jFW\pwFc]aEGcbA]H{gA`JGlQwJknb@KOprIMcybjeSY@hEgYpmFK\`vF_]PyFs_aJGobQmI{lavJ_nr?KCqBHKgyRiMkzHhEgYpuF[]@xFgaqKGsfa^I?gQmI{katJWlqwJkna~K?oRCK[qBHKgyRiMkzBleSY@jEw[pwFs_aTHcfQ`KGorCKSpbFK_qRIMcybjMozRmhSeQ\Hwfq_ICkatJ_mq}KCobBKOpREK[qBHKgyRiMkzBlMwzxdE_YpmFK]@|GG`aHGodQXHsgRAKKpBDKWprGKcqbhMgyrkMszbnNA`aHGodQXHsfa^I?gQqJSmAzJwoRAKKpBDKWprGKcqbhMgyrkMszbnN?{XdE_YpmFK]@|GGaqKGsdQXHsgQmI{lavJ_nr?KCobBKOpREK[qBHKgyRiMkzBlMwzroNC{iJGobQTHcfQ]H{gA`IwjqqJSlavJ_mq}J{oB@KGorCKSpbFK_qRIMcybjMozRmM{{BpNG{x\EA|X\E?|RunS|bvdsWBtNW|rwnS|bvN_}X\E?|RuN[}BxNi|RuN[}BxNg}wNAOM_zBscaUHgfgNAOI?zBscaUHgfb|_??OA?K@?D?W@oG?cA_JAcJ?uJ?kQqJKlAtJWlqwJcmazJonrAKKpBDKa??@?G?oC?S@_F?_AOI?kIOkJ?kQqJKlAtJWlqwJcmazKGorCN}??@?G?oC?S@_F?_AOI?kIOkHwfq_ICkApJGkqsJSlavJ_mQyJknb@KGorCK[qbiMozboNG|B~OA?_D?_AohBWbqPJ?kQqJKlAtJWlqwJcnA~KGpRGN|?C@_G?oC?SA?H?gAohG{cQoJCkarJOlQuJ[mB~O@?SA_G@OG?kIQNHCfa^I?gQoJCkarJOlQuJ[mAzJwoRCK[qbiMozboNG|B~O@?SAOMIOuGkbALIwjqoJCkarJOlQuJ[mAxJonr?KCobDK_qRIMszbrNO~s?OD?cBOQ?oC?SAOI?kIQJGobQmI{kApJGkqsJSlavJ_nr?KCqBHKgzRmNK|B~O@?SAOL@CDacaqKGsfa^I?gQmI{kApJGkqsJSlavJ_mq}J{oB@KOprGKcqbiMozRmN?{brNO~s?OD?cBOP@SE_??OA?K@?D?W@oG?cA_JAoLaoJCkarJWmQyJknA~KGorCKSqB~O@?SAOL@CDOX@w??C?_B?O@OE?[A?H?gAokJ?kQqJcmazKGorCN|?C@OH?sCOT@cFOa??@?G?oC?S@_F?_AOI?kJA]H{gA`J?kQqJSmAxJgmq}KCobBKOprIMgzBmN?{bsN|?C@OH?sCOT@cFO`AW@?G@?D?[A?I?kLaNHCkArJWmQ{J{obDK_~s?OD?cBOP@SEO\ACHOi?OA?K@?D?[A?H?gAqNHC~s?OD?cBOP@SEO\ACHOhAw@?G@?D?[A?I?kbqPHwfq_ICkatJ_mq}KCpBFKgybkMw{BqNO~s?OD?cBOP@SEO\ACHOhAsK_C@?F?gLaJGobQmI{kArJWlqwJcnA~K?oRAKSqBHKgzRmNK|B~O@?SAOL@CDOX@sGOdAcJOpBW@?K@?D?[AOI?kaqKGsjanJWlqwJ{oB@K_qRIMszbrNO~s?OD?cBOP@SEO\ACHOhAsKOtBg@?O@oIGkbALHwfq_ICjanJGlQuJ[mAzJwnr?KCpBFK_qRIMgzBlMw{BqNK|B~O@?SAOL@CDOX@sGOdAcJOpBSMO}??@?G?oC?S@_F?_AOI?kJ?uHSeQ\ICkApJGkquJcmazJonrAKKpBDKWprGKcqbnN?{RqNK|B~O@?SAOL@CDOX@sGOdAcJOpBSMO|CG??C?_B?O@OE?[A?H?gAokHSeQ\ICkApJGmQyJkobBKOpREK[qBHKgzroNC{brNO~s?OD?cBOP@SEO\ACHOhAsKOtBcNP@CW??C?_B?O@OE?[A?H?gAokHSeQ\Hwfq_ICkApJGlQwJcmazJwoRAKKpBDKWprGKcqbiMozbnN?{RqNK|B~O@?SAOL@CDOX@sGOdAcJOpBSMO|CCPPI?_D?_AouG{cQTHcfQ`J?kquJcnA~KGorCKSpbFK_qRIM{{BpNG{rsN|?C@OH?sCOT@cFO`ASIOlBCLOxBsOPDCcR_G?oC?SA?H?gAqNHCdQXHsgRAKKpBDKWprGKcqbnN?{RqNK|B~O@?SAOL@CDOX@sGOdAcJOpBSMO|CCPPHCsS_G@OG?kbqPHSeQ\Hwfq_ICkatJ_mq}KCobBKOpREK[qBHKgybkMwzroNC{brNO~s?OD?cBOP@SEO\ACHOhAsKOtBcNP@CSQPLDCTbWaqKGsdQXHsgQmI{kArJWlqwJcnA~K?oRAKKpBDKWprGKcqblMwzroNC{brNO~s?OD?cBOP@SEO\ACHOhAsKOtBcNP@CSQPLDCTPY?oC?SAOI?kaqKGsdQXHsgQmI{lavJ_nr?KCobBKOpREK[qBHKgzRmM{{BpNG{rsN|?C@OH?sCOT@cFO`ASIOlBCLOxBsOPDCcRPPDSUP]aqKGsdQXHsfa^I?gQmI{katJWlqwJkna~K?oRAKKpBDKWprGKcqbiMozRmM{{BpNG{rsN|?C@OH?sCOT@cFO`ASIOlBCLOxBsOPDCcRPPDSUP\EN
To colorize the graph, use following implementation:
G = nx.read_sparse6("graph_file")
for i in range(0,12):
G.nodes[i]["color"] = "yellow"
for i in range(12,40):
G.nodes[i]["color"] = "red"
for i in range(40,63):
G.nodes[i]["color"] = "turquoise"
for i in range(63,85):
G.nodes[i]["color"] = "green"
for i in range(85,163):
G.nodes[i]["color"] = "blue"
for i in range(163, 176):
G.nodes[i]["color"] = "magenta"
for i in range(176, 282):
G.nodes[i]["color"] = "orange"
Formally, we have an independent
set
problem with some covering constraints on the side. Integer programming
seems well suited, though I don’t know which R
specifically were
giving you trouble.
The straightforward integer programming formulation of independent set
has a 0–1 variable for each node and a constraint for each edge. The
linear relaxation is poor, however, since (for example) a triangle graph
has a fractional solution where each node is 0.5, for a “1.5-node”
fractional independent set. Since G
has only a few hundred maximal
cliques, however, we can use a better formulation with a constraint for
each maximal clique (for each clique, the independent set contains at
most one clique node).
Implementation using NetworkX and OR-Tools below:
import networkx as nx
G = nx.read_sparse6("graph_file")
for i in range(0, 12):
G.nodes[i]["color"] = "yellow"
for i in range(12, 40):
G.nodes[i]["color"] = "red"
for i in range(40, 63):
G.nodes[i]["color"] = "turquoise"
for i in range(63, 85):
G.nodes[i]["color"] = "green"
for i in range(85, 163):
G.nodes[i]["color"] = "blue"
for i in range(163, 176):
G.nodes[i]["color"] = "magenta"
for i in range(176, 282):
G.nodes[i]["color"] = "orange"
R = {
"red": 11,
"turquoise": 11,
"blue": 8,
"orange": 6,
"green": 4,
"magenta": 2,
"yellow": 1,
}
import collections
from ortools.linear_solver import pywraplp
solver = pywraplp.Solver.CreateSolver("SCIP")
x = {v: solver.BoolVar(str(v)) for v in G.nodes}
for K in nx.find_cliques(G):
solver.Add(sum(x[v] for v in K) <= 1)
nodes_with_color = collections.defaultdict(list)
for v in G.nodes:
nodes_with_color[G.nodes[v]["color"]].append(v)
for color, bound in R.items():
solver.Add(sum(x[v] for v in nodes_with_color[color]) >= bound)
solver.Maximize(sum(x_v for x_v in x.values()))
status = solver.Solve()
if status == pywraplp.Solver.OPTIMAL:
solution = [v for (v, x_v) in x.items() if x_v.SolutionValue()]
print(*solution)
print(collections.Counter(G.nodes[v]["color"] for v in solution))
elif status == pywraplp.Solver.INFEASIBLE:
print("no solution")
else:
print("unknown status", status)