I have Python code in which I am reading data from MongoDB and insert it into a table in Oracle database. I have removed some parts of the code (importing libraries, connecting to MongoDB and Oracle, etc.) for the sake of simplicity:
from_date_sms_journals = " select nvl(max(s.creation_date_ms),1710793800939)\
from sms_journals s "
oracle_cursor.execute(from_date_sms_journals)
for row in oracle_cursor.fetchall():
from_date = int(row[1])
# today
to_date_sms_journals = 1764102600000
oracle_cursor.execute(to_date_sms_journals)
for row in oracle_cursor.fetchall():
to_date = int(row[0])
fields = {"_id": 1,
"body": 1,
"recipient": 1,
}
sms_insert_query = """ insert /*+ APPEND */ into sms_journals \
(S_ID,S_BODY,RECIPIENT,gd_creation_date)\
values (:1, :2, :3)"""
batch_size = 10000
counter = 0
print(f'-----------------------------New 2 - Insert Data to Oracle : START at {update}')
sms_journals_query = mycol_act.find({"date": {"$gt": from_date, "$lte": to_date}}, projection=fields,batch_size=batch_size)
rows_to_insert = []
for doc in sms_journals_query:
s_id = str(doc["_id"])
s_body = str(doc.get("body", ''))
recipient = str(doc.get("recipient"))
gd_creation_date = str(datetime.datetime.fromtimestamp(float(doc["date"]) / 1000).astimezone(pytz.timezone('Asia/Tehran')).strftime('%d-%b-%y'))
rows_to_insert.append((s_id, s_body, recipient,gd_creation_date ))
counter += 1
if counter % batch_size == 0:
oracle_cursor.executemany(sms_insert_query, rows_to_insert)
oracleDB.commit()
rows_to_insert = []
if rows_to_insert:
oracle_cursor.executemany(sms_insert_query, rows_to_insert)
oracleDB.commit()
print("Insert data finished!")
The code works correctly and insert data from date > from_date to date <= to_date and there are no duplicates and the count of records match the source table. The weird thing is, it seems like this code inserts data from to_date to from_date, by which I mean it starts from last date (to_date). When I check the destination table in Oracle (during the execution of the code with select max(gd_creation_date) from table), the first date that is inserted is not the 'from_date', it is 'to_date'. Imagine the start is 2025/01/01 and the end is 2025/01/15. Then the code works in reverse order. The first date that is being inserted is 2025/01/15.
What is happening?
As mentioned, like you need an ORDER BY to guarantee sorted results, you need a .sort() in MongoDB if you want guaranteed results.
Now, if you don't need specific ordering and you just wonder why, I guess you have an index on {from_date: -1}. In the execution plan you probably see something like:
direction: 'forward', indexBounds: { from_date: [ '[inf, 1710793800939)' ] }
If you add a sort ({from_date: 1}), you will see:
direction: 'backward', indexBounds: { from_date: [ '(1710793800939, inf]' ] }
The performance of forward scan is slightly better, so that's who the query planner chooses that.
Additional note: you are using the /*+ APPEND */ hint to insert into Oracle. I think this hint applies only to the INSERT ... SELECT statements. If you want to do that here, you should use /*+ APPEND_VALUES */ but be careful that it locks the table - may be ok if there's no concurrent inserts - and may waste some space