pythonmultithreadingpandasdenodo

Multiprocessing/multithreading for database query in Python


I have millions of records in database and I want to read it through Python and store it in pandas data frame . The problem is the select query processing time is very high. To reduce the query processing time I try to perform multi threading on it I created 3 threads and make the query on basis of each thread like

Select * from ( select *,rownum over (order by col1) rn from table) where rn%3=0 


Select * from ( select *,rownum over (order by col1) rn from table) where rn%3=1


Select * from ( select *,rownum over (order by col1) rn from table) where rn%3=2

Then I run the each query with threading in Python by threading package.

But it also not reducing the time much

Is there any other approach I can take to reduce the query reading time. Note- I have used both jdbc and odbc connection


Solution

  • The below link helped me Multiprocessing with JDBC connection and pooling I can get around 25% gain on my local.machine.