symfonydoctrine-ormsymfony-3.4symfony3

How can I improve the performance of a large data import in symfony?


I have to do a import in db from large csv file, about 100,000 records.

To do this I have tried dql and with the orm, but with the two options it takes me about 9 hours to complete the process.

I made the same load with Node.js and it was going much faster, about 5 minutes.

So I do not know if there is any option.

I have tried to clean and flush after every 20 lines of the file but it is still slow

Any idea how to improve this performance.

Thank you.


Solution

  • Depending on how your import looks, you might want to bypass the Doctrine ORM entirely and get the Connection from the Entity Manager to work with the DBAL. This way you can just retrieve the array from the CSV and then insert them.

    $dbal= $this->entityManager->getConnection();
    $handle = fopen('exported_data.csv', 'r');
    while (($row = fgetcsv($handle)) !== false)) {
        if (null === $row) {
            // Deal with invalid csv data
        }
        // maybe map row names to column names
        $dbal->insert('table_name', $row);
    }
    fclose($handle);
    

    This should already improve things a bit. Next you have to see if transactions and inserting in small batches makes sense. Fortunately since you don't have to deal with object hydration and the unit of work from the ORM you don't have to clean every time.

    You might also want to look at the Doctrine Batch Utils from ocramius, one of the Doctrine contributors, for your task.

    If afterwards you still have performance problems you should start profiling your application with tools like blackfire or xhprof. This will tell you, which part of your code is actually causing the performance issue. It could be disk I/O and even fgetcsv() can sometimes be slow, because of the way it reads in the data chunks, but that is a performance impact on such a low level that I wouldn't bother with it, unless a profiler tells me to.