sqlitesqlalchemymany-to-manyself-referencing-table

sqlalchemy self-referencing many-to-many with "select" as association table


Problem description

I'm using sqlalchemy (v1.2) declarative, and I have a simple class Node with an id and a label. I would like to build a self-referencing many-to-many relationship where the association table is not a database table, but a dynamic select statement. This statement selects from two joined aliases of Node and returns rows of the form (left_id, right_id), defining the relationship. The code I have so far works if I access the relationship through an instance object, but when I try to filter by the relationship the joins are messed up.

The "classical" self-referential many-to-many relation

For reference, let's start with the example from the documentation on Self-Referential Many-to-Many Relationship, which uses an association table:

node_to_node = Table(
    "node_to_node", Base.metadata,
    Column("left_node_id", Integer, ForeignKey("node.id"), primary_key=True),
    Column("right_node_id", Integer, ForeignKey("node.id"), primary_key=True)
)

class Node(Base):
    __tablename__ = 'node'
    id = Column(Integer, primary_key=True)
    label = Column(String, unique=True)
    right_nodes = relationship(
        "Node",
        secondary=node_to_node,
        primaryjoin=id == node_to_node.c.left_node_id,
        secondaryjoin=id == node_to_node.c.right_node_id,
        backref="left_nodes"
    )

    def __repr__(self):
        return "Node(id={}, Label={})".format(self.id, self.label)

Joining Node to itself through this relationship:

>>> NodeAlias = aliased(Node)
>>> print(session.query(Node).join(NodeAlias, Node.right_nodes))
SELECT node.id AS node_id, node.label AS node_label 
FROM node JOIN node_to_node AS node_to_node_1 
    ON node.id = node_to_node_1.left_node_id
JOIN node AS node_1
    ON node_1.id = node_to_node_1.right_node_id

Everything looks well.

The many-to-many relation through an association select statement

As an example we implement a relationship next_two_nodes which connects a node to the two nodes with id+1 and id+2 (if existent). The complete code for testing.

Here is a function which generates the select statement for the "dynamic" association table:

_next_two_nodes = None
def next_two_nodes_select():
    global _next_two_nodes
    if _next_two_nodes is None:
        _leftside = aliased(Node, name="leftside")
        _rightside = aliased(Node, name="rightside")
        _next_two_nodes = select(
            [_leftside.id.label("left_node_id"),
             _rightside.id.label("right_node_id")]
        ).select_from(
            join(
                _leftside, _rightside,
                or_(
                    _leftside.id + 1 == _rightside.id,
                    _leftside.id + 2 == _rightside.id
                )
            )
        ).alias()
    return _next_two_nodes

Note that the function caches the result in a global variable, so that successive calls always return the same object instead of using new aliases. Here is my attempt to use this select in a relationship:

class Node(Base):
    __tablename__ = 'node'
    id = Column(Integer, primary_key=True)
    label = Column(String, unique=True)

    next_two_nodes = relationship(
        "Node", secondary=next_two_nodes_select,
        primaryjoin=(lambda: foreign(Node.id) 
                     == remote(next_two_nodes_select().c.left_node_id)),
        secondaryjoin=(lambda: foreign(next_two_nodes_select().c.right_node_id)
                       == remote(Node.id)),
        backref="previous_two_nodes",
        viewonly=True
    )

    def __repr__(self):
        return "Node(id={}, Label={})".format(self.id, self.label)

Some test data:

nodes = [
    Node(id=1, label="Node1"),
    Node(id=2, label="Node2"),
    Node(id=3, label="Node3"),
    Node(id=4, label="Node4")
]
session.add_all(nodes)
session.commit()

Accessing the relationship through an instance works as expected:

>>> node = session.query(Node).filter_by(id=2).one()
>>> node.next_two_nodes
[Node(id=3, Label=Node3), Node(id=4, Label=Node4)]
>>> node.previous_two_nodes
[Node(id=1, Label=Node1)]

However, filtering on the relationship does not give the expected result:

>>> session.query(Node).join(NodeAlias, Node.next_two_nodes).filter(NodeAlias.id == 3).all()
[Node(id=1, Label=Node1),
 Node(id=2, Label=Node2),
 Node(id=3, Label=Node3),
 Node(id=4, Label=Node4)]

I would expect only Node1 and Node2 to be returned. And indeed, the SQL statement of the join is wrong:

>>> print(session.query(Node).join(NodeAlias, Node.next_two_nodes))
SELECT node.id AS node_id, node.label AS node_label 
FROM node JOIN (SELECT leftside.id AS left_node_id, rightside.id AS right_node_id 
    FROM node AS leftside JOIN node AS rightside
    ON leftside.id + 1 = rightside.id OR leftside.id + 2 = rightside.id) AS anon_1
ON anon_1.left_node_id = anon_1.left_node_id
JOIN node AS node_1 ON anon_1.right_node_id = node_1.id

Comparing with the working example above, instead of ON anon_1.left_node_id = anon_1.left_node_id it should clearly read ON node.id = anon_1.left_node_id. My primaryjoin seems to be wrong, but I cannot figure out how to connect the last dots.


Solution

  • After more debugging I found that "Clause Adaption" is replacing my ON clause. I'm not sure about the details, but for some reasen sqlalchemy thinks that I am referring to the node.id from the select rather than from the original Node table. The only way I found to suppress clause adaption was to select in text form:

    select(
        [literal_column("leftside.id").label("left_node_id"),
         literal_column("rightside.id").label("right_node_id")]
    )...
    

    This way the relationship to Node is broken and filtering works as expected. It feels like a hack with unforeseeable side effects, maybe someone knows a cleaner way...