Does anyone know of a good program to convert CNF files with any number of variables per clause to CNF files with exactly 3 variables per clause (3-CNF)? I've seen this algorithm in computer science books but can't find an implementation anywhere and would hate to waste time implementing it myself if others have already done it. Thanks!
I didn't know any program to do that either, but the algorithm is really simple so just I wrote the following python script (download) that reads a general CNF in DIMACS format and writes the CNF of an equivalent 3-SAT problem in DIMACS format:
from __future__ import print_function
import fileinput
cnf = list()
cnf.append(list())
maxvar = 0
for line in fileinput.input():
tokens = line.split()
if len(tokens) == 0 or tokens[0] == "p" or tokens[0] == "c":
continue
for tok in tokens:
lit = int(tok)
maxvar = max(maxvar, abs(lit))
if lit == 0:
cnf.append(list())
else:
cnf[-1].append(lit)
assert len(cnf[-1]) == 0
cnf.pop()
new_cnf = list()
for clause in cnf:
while len(clause) > 3:
new_clause = list()
for i in range(0, len(clause), 2):
if i+1 < len(clause):
new_cnf.append(list())
new_cnf[-1].append(clause[i])
new_cnf[-1].append(clause[i+1])
maxvar += 1
new_cnf[-1].append(-maxvar)
new_clause.append(maxvar)
else:
new_clause.append(clause[i])
clause = new_clause
new_cnf.append(clause)
print("p cnf %d %d" % (maxvar, len(new_cnf)))
for clause in new_cnf:
print(" ".join([ "%d" % lit for lit in clause ]) + " 0")
The interesting bit is of course the for clause in cnf:
loop that takes the general sat problem stored in cnf
and transforms it into a 3-sat instance stored in new_cnf
. It does this by translating a clause such as
(A[1] or A[2] or A[3] or A[4] or A[5] or A[6] or A[7])
into the following set of clauses.
(A[1] or A[2] or ~X[1])
(A[3] or A[4] or ~X[2])
(A[5] or A[6] or ~X[3])
(X[1] or X[2] or X[3] or A[7])
The first three clauses are added to new_cnf
. The last clause is not 3-sat so the algorithm is re-run on this last clause, yielding the following new clauses:
(X[1] or X[2] or ~Y[1])
(X[3] or A[7] or ~Y[2])
(Y[1] or Y[2])
This are all 3-sat clauses, so they are added to new_cnf
and the algorithm continues with the next clause from cnf
. (If the last clause were not 3-sat, the algorithm would keep working on it until only 3-sat clauses are left. The length of the last clause approximately halves with each iteration.)