pythonmysqlsqlalchemycross-database

How to join tables from two different databases using sqlalchemy expression language / sqlalchemy core?


I am using MySql. I was however able to find ways to do this using sqlalchemy orm but not using expression language.So I am specifically looking for core / expression language based solutions. The databases lie on the same server

This is how my connection looks like:

connection = engine.connect().execution_options(
schema_translate_map = {current_database_schema: new_database_schema})
engine_1=create_engine("mysql+mysqldb://root:user@*******/DB_1")
engine_2=create_engine("mysql+mysqldb://root:user@*******/DB_2",pool_size=5)

metadata_1=MetaData(engine_1)
metadata_2=MetaData(engine_2)

metadata.reflect(bind=engine_1)
metadata.reflect(bind=engine_2)

table_1=metadata_1.tables['table_1']
table_2=metadata_2.tables['table_2']
query=select([table_1.c.name,table_2.c.name]).select_from(join(table_2,table_1.c.id==table_2.c.id,'left')

result=connection.execute(query).fetchall()

However, when I try to join tables from different databases it throws an error obviously because the connection belongs to one of the databases. And I haven't tried anything else because I could not find a way to solve this. Another way to put the question (maybe) is 'how to connect to multiple databases using a single connection in sqlalchemy core'.


Solution

  • Applying the solution from here to Core only you could create a single Engine object that connects to your server, but without defaulting to one database or the other:

    engine = create_engine("mysql+mysqldb://root:user@*******/")
    

    and then using a single MetaData instance reflect the contents of each schema:

    metadata = MetaData(engine)
    metadata.reflect(schema='DB_1')
    metadata.reflect(schema='DB_2')
    
    # Note: use the fully qualified names as keys
    table_1 = metadata.tables['DB_1.table_1']
    table_2 = metadata.tables['DB_2.table_2']
    

    You could also use one of the databases as the "default" and pass it in the URL. In that case you would reflect tables from that database as usual and pass the schema= keyword argument only when reflecting the other database.

    Use the created engine to perform the query:

    query = select([table_1.c.name, table_2.c.name]).\
        select_from(outerjoin(table1, table_2, table_1.c.id == table_2.c.id))
    
    with engine.begin() as connection:
        result = connection.execute(query).fetchall()