pythonmongodboracle-database

MongoDB to Oracle insert runs in reverse date order


I have Python code in which I am reading data from MongoDB and insert it into a table in Oracle database. I have removed some parts of the code (importing libraries, connecting to MongoDB and Oracle, etc.) for the sake of simplicity:

from_date_sms_journals = " select nvl(max(s.creation_date_ms),1710793800939)\
                           from sms_journals s "
oracle_cursor.execute(from_date_sms_journals)
for row in oracle_cursor.fetchall():
    from_date = int(row[1])


# today
to_date_sms_journals = 1764102600000
oracle_cursor.execute(to_date_sms_journals)
for row in oracle_cursor.fetchall():
    to_date = int(row[0])


fields = {"_id": 1,
          "body": 1,
          "recipient": 1,
          }


sms_insert_query = """ insert /*+ APPEND */ into sms_journals \
                      (S_ID,S_BODY,RECIPIENT,gd_creation_date)\
              values (:1, :2, :3)"""


batch_size = 10000
counter = 0


print(f'-----------------------------New 2 - Insert Data to Oracle : START at {update}')
sms_journals_query = mycol_act.find({"date": {"$gt": from_date, "$lte": to_date}}, projection=fields,batch_size=batch_size)


rows_to_insert = []
for doc in sms_journals_query:
    s_id = str(doc["_id"])
    s_body = str(doc.get("body", ''))
    recipient = str(doc.get("recipient"))
    gd_creation_date = str(datetime.datetime.fromtimestamp(float(doc["date"]) / 1000).astimezone(pytz.timezone('Asia/Tehran')).strftime('%d-%b-%y'))
    rows_to_insert.append((s_id, s_body, recipient,gd_creation_date ))
 
    counter += 1
    if counter % batch_size == 0:
        oracle_cursor.executemany(sms_insert_query, rows_to_insert)
        oracleDB.commit()
        rows_to_insert = []
if rows_to_insert:
    oracle_cursor.executemany(sms_insert_query, rows_to_insert)
    oracleDB.commit()
print("Insert data finished!")

The code works correctly and insert data from date > from_date to date <= to_date and there are no duplicates and the count of records match the source table. The weird thing is, it seems like this code inserts data from to_date to from_date, by which I mean it starts from last date (to_date). When I check the destination table in Oracle (during the execution of the code with select max(gd_creation_date) from table), the first date that is inserted is not the 'from_date', it is 'to_date'. Imagine the start is 2025/01/01 and the end is 2025/01/15. Then the code works in reverse order. The first date that is being inserted is 2025/01/15.

What is happening?


Solution

  • As mentioned, like you need an ORDER BY to guarantee sorted results, you need a .sort() in MongoDB if you want guaranteed results.

    Now, if you don't need specific ordering and you just wonder why, I guess you have an index on {from_date: -1}. In the execution plan you probably see something like:

    direction: 'forward', indexBounds: { from_date: [ '[inf, 1710793800939)' ] }
    

    If you add a sort ({from_date: 1}), you will see:

    direction: 'backward', indexBounds: { from_date: [ '(1710793800939, inf]' ] }
    

    The performance of forward scan is slightly better, so that's who the query planner chooses that.

    Additional note: you are using the /*+ APPEND */ hint to insert into Oracle. I think this hint applies only to the INSERT ... SELECT statements. If you want to do that here, you should use /*+ APPEND_VALUES */ but be careful that it locks the table - may be ok if there's no concurrent inserts - and may waste some space