djangodjango-querysetdjango-q

How to add parentheses to build a complicated dynamic Django filter with Q()?


I want to build a complicated filter:

queryset.filter(
  (Q(k__contains=“1”) & Q(k__contains=“2”) & (~Q(k__contains=“3”))) | 
  (Q(k1__contains=“2”) & (~Q(k4__contains=“3”)))
)

The structure is fixed, but the query is dynamic and depends on a case specified by given input.

Tthe input could be for example :

(k=1&k=2&~k=3) | (k1=1&~k4=3)

or

(k=1&~k=3) | (k1=1&~k4=3) | (k=4&~k=3)

How to add parentheses to build this query to make it run as expected?


Solution

  • Two answers are presented: one simple and one perfect

    1) A simple answer for similarly simple expressions:

    It your expressions are simple and it seems better to read it by a short Python code. The most useful simplification is that parentheses are not nested. Less important simplification is that the operator OR ("|") is never in parentheses. The operator NOT is used only inside parentheses. The operator NOT is never double repeated ("~~").

    Syntax: I express these simplifications in a form of EBNF syntax rules that could be useful later in a discussion about Python code.

    expression = term, [ "|", term ];
    term       = "(", factor, { "&", factor }, ")";
    factor     = [ "~" ], variable, "=", constant;
    
    variable   = "a..z_0..9";           # anything except "(", ")", "|", "&", "~", "="
    constant   = "0-9_a-z... ,'\"";     # anything except "(", ")", "|", "&", "~", "="
    

    Optional white spaces are handled easily by .strip() method to can freely accept expressions like in mathematics. White spaces inside constants are supported.

    Solution:

    def compile_q(input_expression):
        q_expression = ~Q()  # selected empty
        for term in input_expression.split('|'):
            q_term = Q()     # selected all
            for factor in term.strip().lstrip('(').rstrip(')').split('&'):
                left, right = factor.strip().split('=', 1)
                negated, left = left.startswith('~'), left.lstrip('~')
                q_factor = Q(**{left.strip() + '__contains': right.strip()})
                if negated:
                    q_factor = ~q_factor
                q_term &= q_factor
            q_expression |= q_term
        return q_expression
    

    The superfluous empty and full Q expressions ~Q() and Q() are finally optimized and eliminated by Django.

    Example:

    >>> expression = "(k=1&k=2&~k=3) | ( k1 = 1 & ~ k4 = 3 )"
    >>> qs = queryset.filter(compile_q(expression))
    

    Check:

    >>> print(str(qs.values('pk').query))        # a simplified more readable sql print
    SELECT id FROM ... WHERE
    ((k LIKE %1% AND k LIKE %2% AND NOT (k LIKE %3%)) OR (k1 LIKE %1% AND NOT (k4 LIKE %3%)))
    >>> sql, params = qs.values('pk').query.get_compiler('default').as_sql()
    >>> print(sql); print(params)                # a complete parametrized escaped print
    SELECT... k LIKE %s ...
    [... '%2%', ...]
    

    The first "print" is a Django command for simplified more readable SQL without apostrophes and escaping, because it is actually delegated to the driver. The second print is a more complicated parametrized SQL command with all possible safe escaping.


    2) Perfect, but longer solution

    This answer can compile any combination of boolean operators "|", "&", "~", any level of nested parentheses and a comparision operator "=" to a Q() expression:

    Solution: (not much more complicated)

    import ast    # builtin Python parser
    from django.contrib.auth.models import User
    from django.db.models import Q
    
    
    def q_combine(node: ast.AST) -> Q:
        if isinstance(node, ast.Module):
            assert len(node.body) == 1 and isinstance(node.body[0], ast.Expr)
            return q_combine(node.body[0].value)
        if isinstance(node, ast.BoolOp):
            if isinstance(node.op, ast.And):
                q = Q()  # select all rows initially
                for val in node.values:
                    q &= q_combine(val)
                return q
            if isinstance(node.op, ast.Or):
                q = ~Q()  # select none row initially
                for val in node.values:
                    q |= q_combine(val)
                return q
        if isinstance(node, ast.UnaryOp):
            assert isinstance(node.op, ast.Not)
            return ~q_combine(node.operand)
        if isinstance(node, ast.Compare):
            assert isinstance(node.left, ast.Name)
            assert len(node.ops) == 1 and isinstance(node.ops[0], ast.Eq)
            assert len(node.comparators) == 1 and isinstance(node.comparators[0], ast.Constant)
            return Q(**{node.left.id + '__contains': str(node.comparators[0].value)})
        raise ValueError('unexpected node {}'.format(type(node).__name__))
    
    def compile_q(expression: str) -> Q:
        std_expr = (expression.replace('=', '==').replace('~', ' not ')
                    .replace('&', ' and ').replace('|', ' or ').lstrip())
        return q_combine(ast.parse(std_expr))
    

    Example: the same as in my previous answer a more complex following:

    >>> expression = "~(~k=1&(k1=2|k1=3|(k=5 & k4=3)))"
    >>> qs = queryset.filter(compile_q(expression))
    

    The same example gives the same result, a more nested example gives a correct more nested result.

    EBNF syntax rules are not important in this case, because no parser is implemented in this solution and the standard Python parser AST is used. It is little different with recursion.

    expression = term, [ "|", term ];
    term       = factor, { "&", factor };
    factor     = [ "~" ], variable, "=", constant | [ "~" ],  "(", expression, ")";
    
    variable   = "a..z_0..9";  # any identifier acceptable by Python, e.g. not a Python keyword
    constant   = "0-9_a-z... ,'\"";    # any numeric or string literal acceptable by Python